GEPA is Better Than You Think
GEPA is one of the most popular automatic prompt optimization methods right now. But it has a subtle flaw in how it selects the final candidate. Here's what's wrong and how to fix it.
GEPA is one of the most popular automatic prompt optimization methods right now. But it has a subtle flaw in how it selects the final candidate. Here's what's wrong and how to fix it.
A deep dive into modern transformer design choices: normalization strategies, positional embeddings from sinusoidal to RoPE, activation functions, attention variants, and hyperparameter scaling.
How to analyze the computation and memory cost of LLMs, covering computation graphs, FLOPs estimation, and memory requirements during training.