Transformer Architectures & Modern Variants

Deep dive into transformer designs from attention mechanisms to modern efficient variants.

12 items

URL

The Transformer Architecture Explained

Visual guide to transformers

jalammar.github.io
URL

Illustrated BERT, ELMo, and co.

BERT intuition and design

jalammar.github.io
URL

The Illustrated GPT-2

GPT-2 architecture walkthrough

jalammar.github.io
Efficient Transformers: A SurveyURL

Efficient Transformers: A Survey

Overview of efficient variants

arxiv.org
Flash-Attention: Fast and Memory-Efficient Exact AttentionURL

Flash-Attention: Fast and Memory-Efficient Exact Attention

Efficient attention mechanism

arxiv.org
Long Range Arena: A Benchmark for Efficient TransformersURL

Long Range Arena: A Benchmark for Efficient Transformers

Benchmarking long-context models

arxiv.org
Sparse TransformersURL

Sparse Transformers

Sparsity in transformer attention

arxiv.org
Switch Transformers: Scaling to Trillion Parameter ModelsURL

Switch Transformers: Scaling to Trillion Parameter Models

Mixture of Experts

arxiv.org
ALiBi: Train Short, Test LongURL

ALiBi: Train Short, Test Long

Positional interpolation

arxiv.org
RoPE: Rotary Position EmbeddingURL

RoPE: Rotary Position Embedding

Rope positional encoding

arxiv.org
Mamba: Linear-Time Sequence Modeling with Selective State SpacesURL

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

State space model alternative to transformers

arxiv.org
Mixture of Experts ExplainedURL

Mixture of Experts Explained

Comprehensive MoE guide

huggingface.co

Create your own collection

Start curating and sharing your links, files, and resources.

Get Started Free