Skip to Content
BlogResearch & WritingTools & Workflow

Research Tools and Workflow

Setting up the right tools and workflow early saves countless hours during research. This guide covers essential tools for ML research: reference management, writing, experiment tracking, version control, and computational infrastructure.

Reference Management

Organize papers and citations systematically from day one.

Why Zotero:

  • Free and open source
  • Browser extension for one-click paper saving
  • Automatic citation generation (BibTeX, APA, etc.)
  • PDF annotation and note-taking
  • Cloud sync across devices
  • Integrates with Word and LaTeX

Installation:

# macOS brew install --cask zotero # Linux sudo apt-get install zotero # Windows: Download from https://www.zotero.org/

Setup Workflow:

  1. Install components:

    • Zotero desktop app
    • Browser connector extension (Chrome/Firefox)
    • Better BibTeX plugin (for LaTeX users)
  2. Create collections:

    • “Must Read” - High-priority papers
    • “Related Work” - Papers to cite
    • “Baselines” - Methods to compare against
    • “Background” - Foundational papers
    • Project-specific collections
  3. Workflow:

    • Click browser extension while on arXiv, Google Scholar, or journal site
    • Paper saved with metadata, PDF attached
    • Tag with keywords: “attention”, “multimodal”, “must-cite”
    • Add notes and annotations directly in PDF
    • Export to BibTeX for LaTeX papers

Better BibTeX Configuration:

{ "citekeyFormat": "[auth:lower][year]", "autoExport": true, "exportPath": "~/research/papers.bib" }

This generates citation keys like vaswani2017 for “Attention Is All You Need”.

Alternative: Mendeley

  • Similar features to Zotero
  • Owned by Elsevier
  • Better integration with Word
  • Good mobile app

Paper Reading Workflow

Combine with three-pass reading method:

  1. First pass: Save to Zotero, add to “To Read” collection
  2. Second pass: Annotate PDF, take notes in Zotero
  3. Third pass: Move to “Must Cite” or “Background” collection

LaTeX for Academic Writing

LaTeX is the standard for ML/AI papers and theses.

Why LaTeX?

  • Professional typesetting - Beautiful equations and formatting
  • Version control friendly - Plain text, works with Git
  • Reference management - Automatic bibliography with BibTeX
  • Conference templates - Required by most ML venues
  • Reproducibility - Same output on any system

Installation Options

Option 1: Overleaf (Recommended for Beginners)

  • Online LaTeX editor (no installation needed)
  • Real-time collaboration (like Google Docs)
  • Built-in templates for conferences
  • Automatic compilation
  • Version history and Git integration
  • Free tier sufficient for most users

Sign up at overleaf.com 

Option 2: Local Installation

# macOS - Full TeX distribution (~4GB) brew install --cask mactex # Linux - Full texlive sudo apt-get install texlive-full # Windows - MiKTeX from https://miktex.org/

Recommended Local Editor: VS Code with LaTeX Workshop extension

# Install VS Code brew install --cask visual-studio-code # Install LaTeX Workshop extension code --install-extension James-Yu.latex-workshop

Basic Paper Template

Create paper.tex:

\documentclass{article} % Essential packages \usepackage{amsmath, amssymb} % Math symbols \usepackage{graphicx} % Include figures \usepackage{hyperref} % Hyperlinks \usepackage{booktabs} % Professional tables \usepackage{algorithm} % Algorithms \usepackage{algorithmic} % Document metadata \title{Your Paper Title} \author{Your Name \\ Your Institution} \date{\today} \begin{document} \maketitle \begin{abstract} Your abstract goes here. \end{abstract} \section{Introduction} Introduction content... \section{Related Work} Related work... \section{Method} Your method... \section{Experiments} Results and analysis... \section{Conclusion} Conclusions... \bibliographystyle{plain} \bibliography{references} % references.bib file \end{document}

Compile:

pdflatex paper.tex bibtex paper pdflatex paper.tex pdflatex paper.tex # Run twice for references

Conference Templates

Most conferences provide LaTeX templates:

NeurIPS:

# Download from https://neurips.cc/Conferences/2024/PaperInformation/StyleFiles wget https://media.neurips.cc/Conferences/NeurIPS2024/Styles/neurips_2024.zip unzip neurips_2024.zip

ICML/ICLR: Similar templates available on conference websites

Use Overleaf templates:

  • New Project → Templates → Search for “NeurIPS” or “ICML”

Essential LaTeX Commands

Math equations:

% Inline math The loss function is $\mathcal{L} = -\log p(y|x)$. % Display math \begin{equation} \text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V \end{equation}

Figures:

\begin{figure}[t] \centering \includegraphics[width=0.8\linewidth]{architecture.pdf} \caption{Model architecture.} \label{fig:architecture} \end{figure} % Reference: See Figure~\ref{fig:architecture}

Tables:

\begin{table}[t] \centering \caption{Results on benchmark datasets.} \label{tab:results} \begin{tabular}{lcc} \toprule Method & Accuracy & F1 Score \\ \midrule Baseline & 85.3 & 83.1 \\ Our Method & \textbf{92.7} & \textbf{91.4} \\ \bottomrule \end{tabular} \end{table}

Citations:

% In text Transformers \cite{vaswani2017} revolutionized NLP. % Multiple citations Recent work \cite{vaswani2017, devlin2018, brown2020} has shown...

Experiment Tracking

Track ML experiments systematically to avoid losing results.

Why W&B:

  • Automatic experiment logging
  • Hyperparameter tracking
  • Real-time metric visualization
  • Model versioning
  • Collaborative experiment tracking
  • Free for academic use

Installation:

pip install wandb

Basic Usage:

import wandb # Initialize experiment wandb.init( project="my-thesis", name="transformer-baseline", config={ "learning_rate": 1e-4, "batch_size": 32, "epochs": 100 } ) # Training loop for epoch in range(epochs): train_loss = train_epoch(model, train_loader) val_loss = validate(model, val_loader) # Log metrics wandb.log({ "train_loss": train_loss, "val_loss": val_loss, "epoch": epoch }) # Save model wandb.save("model.pt")

Features:

  • Automatic hyperparameter tracking
  • Compare across runs
  • Generate reports for papers
  • Track system metrics (GPU, CPU, memory)

Alternative: MLflow

pip install mlflow

Usage:

import mlflow mlflow.start_run() mlflow.log_param("learning_rate", 1e-4) mlflow.log_metric("accuracy", 0.92) mlflow.end_run()

Manual Experiment Tracking

If not using tracking tools, maintain a structured log:

Create experiments.md:

# Experiment Log ## Experiment 1: Baseline Transformer **Date**: 2025-11-11 **Goal**: Establish baseline performance **Config**: - Model: Transformer (6 layers, 512 dim) - LR: 1e-4 - Batch size: 32 - Dataset: IMDB **Results**: - Test Accuracy: 85.3% - Training time: 2.5 hours - GPU: A100 **Notes**: Overfitting after epoch 50, try dropout ## Experiment 2: Add Dropout **Date**: 2025-11-12 [...]

Version Control with Git

Track code changes, collaborate, and never lose work.

Git Basics

Installation:

# macOS brew install git # Linux sudo apt-get install git

Initialize repository:

cd my-thesis git init git add . git commit -m "Initial commit"

Daily workflow:

# See what changed git status git diff # Commit changes git add file1.py file2.py git commit -m "Add attention mechanism" # View history git log --oneline

GitHub for Collaboration

Create repository:

  1. Go to github.com, create new repository
  2. Link local repository:
git remote add origin https://github.com/username/my-thesis.git git push -u origin main

Collaboration workflow:

# Create feature branch git checkout -b add-bert-baseline # Make changes, commit git add bert.py git commit -m "Add BERT baseline" # Push and create pull request git push origin add-bert-baseline

.gitignore for ML Projects

Create .gitignore:

# Python __pycache__/ *.pyc .ipynb_checkpoints/ # Data (too large for Git) data/ *.csv *.hdf5 # Models (use model versioning instead) models/ checkpoints/ *.pt *.pth *.h5 # Logs logs/ wandb/ # Environment venv/ .env # OS .DS_Store

Git Large File Storage (Git LFS)

For tracking large model files:

# Install brew install git-lfs git lfs install # Track large files git lfs track "*.pt" git add .gitattributes git commit -m "Track model files with LFS"

Code Organization

Structure projects for reproducibility and collaboration.

Project Structure

my-thesis/ ├── README.md # Project overview ├── requirements.txt # Python dependencies ├── environment.yml # Conda environment (if using) ├── .gitignore ├── data/ │ ├── raw/ # Original data (never modify) │ ├── processed/ # Cleaned data │ └── README.md # Data documentation ├── src/ │ ├── __init__.py │ ├── data.py # Data loading │ ├── models.py # Model architectures │ ├── train.py # Training loop │ ├── eval.py # Evaluation │ └── utils.py # Utilities ├── scripts/ │ ├── preprocess.py # Data preprocessing │ ├── train.sh # Training script │ └── eval.sh # Evaluation script ├── notebooks/ │ ├── 01_eda.ipynb # Exploratory analysis │ └── 02_viz.ipynb # Visualization ├── configs/ │ ├── baseline.yaml # Baseline config │ └── best_model.yaml # Best model config ├── tests/ │ └── test_model.py # Unit tests └── paper/ ├── paper.tex # LaTeX source ├── references.bib # Bibliography └── figures/ # Paper figures

Configuration Files

Use YAML for configs:

configs/transformer.yaml:

model: type: transformer num_layers: 6 d_model: 512 num_heads: 8 dropout: 0.1 training: batch_size: 32 learning_rate: 1e-4 epochs: 100 warmup_steps: 1000 data: train_path: data/processed/train.csv val_path: data/processed/val.csv

Load in code:

import yaml with open('configs/transformer.yaml') as f: config = yaml.safe_load(f) model = Transformer(**config['model'])

Reproducibility Checklist

  • Pin dependencies: pip freeze > requirements.txt
  • Set random seeds: torch.manual_seed(42)
  • Document data preprocessing: How was data cleaned?
  • Save model configs: YAML files for each experiment
  • Track hyperparameters: W&B or manual logs
  • Version control code: Git with meaningful commits
  • README with instructions: How to reproduce results

Computational Infrastructure

Local Development

Recommended Setup:

  • GPU: NVIDIA RTX 3090/4090 for development (24GB VRAM)
  • RAM: 32GB+ for data processing
  • Storage: 1TB+ SSD for datasets and models

Environment Setup:

# Create conda environment conda create -n thesis python=3.10 conda activate thesis # Install PyTorch with CUDA conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia # Install other dependencies pip install transformers datasets wandb numpy pandas matplotlib

Cloud Computing

Google Colab (Free/Pro):

  • Free: T4 GPU (16GB), time limits
  • Pro ($10/month): Better GPUs, longer runtime
  • Good for prototyping, not production training

Lambda Labs (Recommended for GPU rentals):

  • A100 (40GB): ~$1.10/hour
  • H100 (80GB): ~$2/hour
  • No complex setup, pay-as-you-go

AWS/GCP/Azure:

  • More complex, but scalable
  • Good for large-scale experiments
  • Consider credits for students (AWS Educate, GCP Education Grants)

Remote Development

SSH into server:

# Connect to server ssh user@server.university.edu # Run training in background with tmux tmux new -s training python train.py # Detach: Ctrl+B, then D # Reattach later: tmux attach -t training

VS Code Remote Development:

Install “Remote - SSH” extension, connect to server, edit code as if local.

Jupyter Notebooks

For exploration and visualization.

Installation:

pip install jupyter jupyter notebook

Best Practices:

  1. Use for exploration only - Don’t put training loops in notebooks
  2. Export to scripts - Convert final code to .py files
  3. Clear outputs before committing - Avoid large Git diffs
  4. Name cells - Use markdown headers to organize
  5. Restart kernel regularly - Ensure reproducibility

Convert notebook to script:

jupyter nbconvert --to script notebook.ipynb

Paper Writing Workflow

Combine tools for efficient writing:

  1. Organize papers: Zotero collections
  2. Draft in LaTeX: Overleaf or local editor
  3. Manage references: Zotero → BibTeX export
  4. Generate figures: Matplotlib/Seaborn → PDF
  5. Track versions: Git for LaTeX source
  6. Collaborate: Overleaf sharing or Git branches

Example Figure Generation:

import matplotlib.pyplot as plt import seaborn as sns plt.style.use('seaborn-v0_8-paper') sns.set_palette("colorblind") fig, ax = plt.subplots(figsize=(6, 4)) ax.plot(epochs, train_loss, label='Train') ax.plot(epochs, val_loss, label='Validation') ax.set_xlabel('Epoch') ax.set_ylabel('Loss') ax.legend() plt.tight_layout() plt.savefig('paper/figures/training_curve.pdf', dpi=300, bbox_inches='tight')

Productivity Tips

Time Management

  • Pomodoro Technique: 25 min focused work, 5 min break
  • Deep Work Blocks: 2-4 hour uninterrupted periods for coding/writing
  • Meeting-Free Days: Dedicate specific days for deep research

Documentation

  • Document as you go: Don’t wait until the end
  • README for every project: Explain setup and usage
  • Code comments: Explain why, not what
  • Lab notebook: Daily progress log

Backup Strategy

  • 3-2-1 Rule:
    • 3 copies of data
    • 2 different storage types (local + cloud)
    • 1 off-site backup

Automated Backup:

# Sync code to cloud rclone sync ~/research/ gdrive:research/ --exclude "data/" --exclude "models/" # Schedule with cron (daily at 2am) 0 2 * * * rclone sync ~/research/ gdrive:research/

Tool Summary

PurposeRecommended ToolAlternative
Reference ManagementZoteroMendeley
Paper WritingOverleafLocal LaTeX
Experiment TrackingWeights & BiasesMLflow
Version ControlGit + GitHubGitLab
Code EditorVS CodePyCharm
NotebooksJupyterLabGoogle Colab
Cloud GPULambda LabsGoogle Colab Pro
CollaborationSlack/DiscordEmail
Diagrammingdraw.ioPowerPoint

Getting Started Checklist

Set up your research environment:

  • Install Zotero + browser extension
  • Create Overleaf account
  • Set up Git and GitHub
  • Create Weights & Biases account
  • Install Python/PyTorch environment
  • Structure project directory
  • Write initial README
  • Set up backup system
  • Join research community (Slack/Discord)
  • Schedule regular supervisor meetings

Resources

Tool Documentation

Learning Resources

Summary

Key Takeaways:

  1. Set up tools early: Don’t wait until you need them
  2. Automate everything: Experiment tracking, backups, figure generation
  3. Version control religiously: Git for code, configs, and papers
  4. Document continuously: README files, code comments, lab notebook
  5. Organize systematically: Consistent project structure, naming conventions
  6. Backup redundantly: Multiple copies in different locations

Good tools and workflow multiply your research productivity. Invest time upfront to set them up properly, and you’ll save hundreds of hours over the course of your thesis or research career.