https://arxiv.org/abs/2007.14062

Big Bird: Transformers for Longer Sequences (Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed)

long range attention을 위한 sparse attention. longformer와 같은 local, global attention에 random attention이 추가된 형태. (와츠-스트로가츠 랜덤 그래프?) 사실 long range attention을 적절히 활용하도록 학습할 수 있다는 것도 좀 신기한 일이라고 생각. #attention #transformer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

200727 Big Bird.md

200727 Big Bird.md

Files

200727 Big Bird.md

Latest commit

History

200727 Big Bird.md

File metadata and controls