Skip to content

Latest commit

 

History

History
1 lines (1 loc) · 185 Bytes

File metadata and controls

1 lines (1 loc) · 185 Bytes

Complete Architecture of Transformer Model with PyTorch, proposed in the paper "Attention is all you need" : https://arxiv.org/pdf/1706.03762.pdf, the crux of sequence models like GPT.