- makemore by Andrej Karpathy
- minbpe by karpathy
- attention? attention! Lilian Weng ![[Attention_Attention.pdf]]
- gpt-2 again by karpathy
- llama3 from scratch by naklecha
- llm training in simple, raw by c/cuda karpathy
- decoding strategies in large language models mlabonne
- how to make llms go fast by vgel ![[How to make LLMs go fast.pdf]]
- a visual guide to quantization maarten![[A visual guide to quantization.pdf]]
- extending the RoPE by eleutherai ![[Extending the RoPE.pdf]]
- the novice's llm training guide by alpin ![[The Novice LLM Training Guide.pdf]]
- a survey on evaluation of large language models paper ![[2307.03109v9.pdf]]
- mixture of experts explained huggingface ![[Mixture of Experts Explained.pdf]]
- vision transformer by aman-arora ![[Vision Transformer.pdf]]
- clip, siglip and paligemma by umar-jamil