← Home
Journal
Daily notes
08-31-2025 ‣ Worked on quantization-aware training (QAT) and LoRA on GPT-2, exploring training stability and performance across bit-widths. Cyclic precision training notably improved lower precision settings (likely by encouraging wider minima), with a 5-bit model outperforming the 8-bit baseline.
Repo
08-15-2025 ‣ Spent time revisiting
transformer math; clarified how feed-forward layers are independent per-token while attention introduces cross-token dependencies, and how multi-head attention improves expressiveness by learning multiple representations of token similarities.