Transformer 架构原理图
graph TB
subgraph 输入阶段
A[输入文本] --> B[Token Embedding]
B --> C[Positional Encoding]
end
C --> D[多头自注意力
Multi-Head Self-Attention]
D --> E[Add + LayerNorm]
E --> F[前馈网络 FFN]
F --> G[Add + LayerNorm]
G -->|循环 N 层| D
G --> H[输出 Embedding]
subgraph 注意力计算
I[Query Q] --> K[Attention = softmax QK/sqrt dk]
J[Key K] --> K
K --> M[输出 = Attention x V]
L[Value V] --> M
end
D --> I
D --> J
D --> L
style D fill:#ffd3b6
style F fill:#dcedc1
style A fill:#a8d8ea
- Title: Transformer 架构原理图
- Author: Rxw
- Created at : 2026-05-31 12:33:38
- Updated at : 2026-05-31 12:36:43
- Link: https://rxw2023-github-io.pages.dev/2026/05/31/Transformer-架构原理图/
- License: This work is licensed under CC BY-NC-SA 4.0.
Comments