Transformer 架构原理图

Rxw

2026-05-31 12:33:38 2026-05-31 12:33:38 Created 2026-06-05 13:14:43 2026-06-05 13:14:43 Updated

人工智能

graph TB
    subgraph 输入阶段
        A[输入文本] --> B[Token Embedding]
        B --> C[Positional Encoding]
    end

    C --> D[多头自注意力
Multi-Head Self-Attention]
    D --> E[Add + LayerNorm]
    E --> F[前馈网络 FFN]
    F --> G[Add + LayerNorm]

    G -->|循环 N 层| D

    G --> H[输出 Embedding]

    subgraph 注意力计算
        I[Query Q] --> K[Attention = softmax QK/sqrt dk]
        J[Key K] --> K
        K --> M[输出 = Attention x V]
        L[Value V] --> M
    end

    D --> I
    D --> J
    D --> L

    style D fill:#ffd3b6
    style F fill:#dcedc1
    style A fill:#a8d8ea

Title: Transformer 架构原理图
Author: Rxw
Created at : 2026-05-31 12:33:38
Updated at : 2026-06-05 13:14:43
Link: https://rxw2023-github-io.pages.dev/2026/05/31/Transformer-架构原理图/
License: This work is licensed under CC BY-NC-SA 4.0.

Comments

Violet Evergarden

Transformer 架构原理图