Transformer 架构原理图

Rxw
graph TB
    subgraph 输入阶段
        A[输入文本] --> B[Token Embedding]
        B --> C[Positional Encoding]
    end

    C --> D[多头自注意力
Multi-Head Self-Attention] D --> E[Add + LayerNorm] E --> F[前馈网络 FFN] F --> G[Add + LayerNorm] G -->|循环 N 层| D G --> H[输出 Embedding] subgraph 注意力计算 I[Query Q] --> K[Attention = softmax QK/sqrt dk] J[Key K] --> K K --> M[输出 = Attention x V] L[Value V] --> M end D --> I D --> J D --> L style D fill:#ffd3b6 style F fill:#dcedc1 style A fill:#a8d8ea
  • Title: Transformer 架构原理图
  • Author: Rxw
  • Created at : 2026-05-31 12:33:38
  • Updated at : 2026-05-31 12:36:43
  • Link: https://rxw2023-github-io.pages.dev/2026/05/31/Transformer-架构原理图/
  • License: This work is licensed under CC BY-NC-SA 4.0.
Comments
On this page
Transformer 架构原理图