Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 561 Bytes

200727 Big Bird.md

File metadata and controls

5 lines (3 loc) · 561 Bytes

https://arxiv.org/abs/2007.14062

Big Bird: Transformers for Longer Sequences (Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed)

long range attention을 위한 sparse attention. longformer와 같은 local, global attention에 random attention이 추가된 형태. (와츠-스트로가츠 랜덤 그래프?) 사실 long range attention을 적절히 활용하도록 학습할 수 있다는 것도 좀 신기한 일이라고 생각. #attention #transformer