Transformer Architectures & Modern Variants

Deep dive into transformer designs from attention mechanisms to modern efficient variants.

12 items

URL

The Transformer Architecture Explained

Visual guide to transformers

jalammar.github.io

URL

Illustrated BERT, ELMo, and co.

BERT intuition and design

jalammar.github.io

URL

The Illustrated GPT-2

GPT-2 architecture walkthrough

jalammar.github.io

URL

Efficient Transformers: A Survey

Overview of efficient variants

arxiv.org

URL

Flash-Attention: Fast and Memory-Efficient Exact Attention

Efficient attention mechanism

arxiv.org

URL

Long Range Arena: A Benchmark for Efficient Transformers

Benchmarking long-context models

arxiv.org

URL

Sparse Transformers

Sparsity in transformer attention

arxiv.org

URL

Switch Transformers: Scaling to Trillion Parameter Models

Mixture of Experts

arxiv.org

URL

ALiBi: Train Short, Test Long

Positional interpolation

arxiv.org

URL

RoPE: Rotary Position Embedding

Rope positional encoding

arxiv.org

URL

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

State space model alternative to transformers

arxiv.org

URL

Mixture of Experts Explained

Comprehensive MoE guide

huggingface.co

Create your own collection

Start curating and sharing your links, files, and resources.

Get Started Free