My Long-Running Notebook of AI & LLM

“I never start a project without skimming my own notes first.”
— Every engineer the night before a deadline

I’ve been writing down my AI/LLM notes in last 6 months—what started as a few scrappy records about Transformer quirks has turned into a 75 pages of tips that spans everything from model tweaks to prod inference hacks. I’m sharing it publically so one can:

Refresh the basics
Jump into a new corner new concept fast
Prepare for an AI fundamentals interview without opening 50 tabs

Link to my notes

Scroll down for a quick tour and tips on getting the most out of it.

1. Architectures: Where It All Begins

Transformers, RoPE, and friends – A concise recap of attention basics and how to scale up rotary position embeddings when your context grows past the original context the model was trained on.
Beyond vanilla – Notes on Mixture‑of‑Experts routing, grouped‑query attention, and “switch” layers for bandwidth‑friendly scaling.
Long‑context toolkit – FlashAttention‑2, context‑window grafting, and knob‑turning for retrieval‑augmented pipelines.

2. Training at Scale: Beyond Data Parallelism

The techniques below kick in once plain data parallelism tops out—think training runs on 512+ GPUs or context windows stretching past 64 K tokens.

Technique	TL;DR in the notes
Tensor Parallelism	Shard massive weight matrices column‑wise or row‑wise so each GPU hosts a slice; synchronise activations over high‑bandwidth links such as NVLink.
Pipeline Parallelism	Classic assembly line—partition layers into stages and hide latency with micro‑batches.
Sequence Parallelism	Give each worker a chunk of the token stream and overlap compute/communication to keep GPUs busy.

3. Datasets: Building the Fuel

Source Mixes – OpenWeb, Common Crawl, curated corpora, code, synthetic dialogues; pros/cons and licensing notes.
Cleaning & Deduplication – Detect near‑duplicates with MinHash/SimHash, strip profanity, and nuke boilerplate.
Domain Balancing – Up‑weight niche domains (medical, legal) without starving general language coverage.
Synthetic & Augmented Data – Self‑instruct, RAG‑generated Q&A, tool‑augmented reasoning traces.
Evaluation Splits – Leak‑proof dev/test and benchmark alignment tips.

4. Data & Scaling Laws: Feeding the Beast

Dataset Curation – Practical heuristics for filtering web text, multilingual balancing, and deduplication without blowing up compute.
Token Mixtures – Why code, math, and synthetic Q‑A boost downstream zero‑shot tasks; quick recipes for ratio tuning.
Chinchilla‑style Scaling Laws – The compute‑optimal sweet spot (≈20 tokens per parameter) and how to project loss at larger model sizes.
Dynamic Data Pacing – Curriculum vs self‑curriculum, temperature‑based sampling, and tricks like “token replay” for long‑tail skills.
Tracking Data Quality – Per‑source perplexity dashboards and the “marginal utility of another billion tokens” checklist.

5. Inference Tricks & Optimisations

Speculative Decoding – Pair a small draft model with a larger verifier to double effective tokens‑per‑second.
KV‑cache management – Bucketing, sliding windows, and other memory‑saver moves.
Quantisation cheatsheet – What survives INT8 vs FP8, and a one‑pager on GPTQ hyper‑params.

6. Alignment, Reasoning & RL

Reward Modeling 101 – From scalar preference labels to pairwise ranking loss; why good rewards beat sparse accuracy metrics.
RLHF & Friends – PPO vs. simpler DPO/IPO methods; where RLAIF (AI feedback) slots in when human labels run dry.
Constitutional & Rule‑based Policies – Self‑critique loops, safety layers, and how to encode “don’t do that” without killing creativity.
Reasoning Boosters – Chain‑of‑Thought, Self‑Consistency, Tree‑of‑Thought, ReAct, and graph‑based planners—what works when tokens are precious.
Benchmarks & Eval – GSM8K, MATH, BBH, AGIEval, and reward‑hacking pitfalls (the “Wireheading Watchlist”).
Practical Tips – Start with supervised fine‑tuning, scale label diversity, and monitor KL divergence to keep models “on policy”.

7. Safety & Governance

Prompt‑level Filtering – Llama Guard, OpenAI policy templates, regex and tree‑sitter tricks for fast rule checks.
System & Tooling – Safety layers that run after draft generation (re‑rankers) vs during (constrained decoding, refusal tokens).
Red‑Teaming Playbooks – Manual adversarial prompts, automated mutation (JailbreakGym), and cross‑model ensemble attacks.
Eval Suites – RealToxicityPrompts, HARM‑bench, HELM safety subset; measuring bias, toxicity, jailbreak rate.
Content Policies – How to encode “allow / safe‑complete / refuse” tiers and avoid loopholes.
Incident Response – Logging, canary prompts, and rollback plans when models misbehave in production.

8. Technical Report CliffsNotes

Gemini – Multimodal routing and that infamous planner‑solver split.
Llama 2/3 & Llama Guard – Safety alignment prompts and where RLHF still bites.
DeepSeek‑R1 – Retrieval‑augmented pre‑training that treats documents as first‑class citizens.
Plus quick hits on Falcon, Mistral, Phi‑3 and other crowd favourites.

Final Words

Whether you’re shipping models to prod, writing your first Transformer from scratch, or preparing before an onsite, I hope this bundle saves you a few hours of searching—and maybe sparks your next idea.