Eleven topics from the same beginner-friendly deck as our LLM and Prompt engineering tracks—why full fine-tuning breaks at LLM scale, the LoRA family and QLoRA/DoRA, building datasets for instruction tuning, SFT vs reward-driven RFT (GRPO), and how OpenEnv + ART standardize RL-style training for agents.
Adapt pretrained weights on new data—classic motivation before LLM-scale limits.
Billions of parameters, GPU RAM, multi-tenant storage—why full fine-tuning doesn’t scale for LLMs.
LoRA family, LoRA-drop, QLoRA, DoRA—parameter-efficient paths when full updates are prohibitive.
ΔW, frozen W, and why naïvely matching sizes defeats the point—setup for the low-rank trick.
Factor ΔW ≈ AB with rank r, train only A and B, interpret as a expressivity vs rank trade-off grid.
LoRAWeights, alpha scaling, attaching adapters to fc layers, and training only A/B.
Synthetic instruction–response pairs with Distilabel: multi-LLM pipelines, judges, and seed data.
Static labeled pairs vs online rewards—and a decision tree for when each fits.
Unsloth + TRL: LoRA, math data, deterministic reward hooks, and GRPOTrainer in action.
Containerized Gymnasium-style services—reset, step, state—so agents and envs compose cleanly.
OpenPipe’s client/server loop for agent trajectories, rewards, and GRPO-style updates.