Skip to content

Commit

Permalink
update gpt-2 paper link
Browse files Browse the repository at this point in the history
  • Loading branch information
rasbt committed Sep 8, 2024
1 parent b94546a commit 1e48c13
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions ch04/01_main-chapter-code/ch04.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@
"source": [
"- In previous chapters, we used small embedding dimensions for token inputs and outputs for ease of illustration, ensuring they fit on a single page\n",
"- In this chapter, we consider embedding and model sizes akin to a small GPT-2 model\n",
"- We'll specifically code the architecture of the smallest GPT-2 model (124 million parameters), as outlined in Radford et al.'s [Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) (note that the initial report lists it as 117M parameters, but this was later corrected in the model weight repository)\n",
"- We'll specifically code the architecture of the smallest GPT-2 model (124 million parameters), as outlined in Radford et al.'s [Language Models are Unsupervised Multitask Learners](https://www.semanticscholar.org/paper/Language-Models-are-Unsupervised-Multitask-Learners-Radford-Wu/9405cc0d6169988371b2755e573cc28650d14dfe) (note that the initial report lists it as 117M parameters, but this was later corrected in the model weight repository)\n",
"- Chapter 6 will show how to load pretrained weights into our implementation, which will be compatible with model sizes of 345, 762, and 1542 million parameters"
]
},
Expand Down Expand Up @@ -1271,7 +1271,7 @@
"id": "309a3be4-c20a-4657-b4e0-77c97510b47c",
"metadata": {},
"source": [
"- Exercise: you can try the following other configurations, which are referenced in the [GPT-2 paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf), as well.\n",
"- Exercise: you can try the following other configurations, which are referenced in the [GPT-2 paper](https://www.semanticscholar.org/paper/Language-Models-are-Unsupervised-Multitask-Learners-Radford-Wu/9405cc0d6169988371b2755e573cc28650d14dfe), as well.\n",
"\n",
" - **GPT2-small** (the 124M configuration we already implemented):\n",
" - \"emb_dim\" = 768\n",
Expand Down

3 comments on commit 1e48c13

@d-kleine
Copy link
Contributor

@d-kleine d-kleine commented on 1e48c13 Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rasbt www.semanticscholar.org references the same broken link for the paper (pdf).

What do you think about this link? https://scholar.google.com/citations?view_op=view_citation&hl=en&user=dOad5HoAAAAJ&citation_for_view=dOad5HoAAAAJ:YsMSGLbcyi4C
On the top right, the pdf link works

@rasbt
Copy link
Owner Author

@rasbt rasbt commented on 1e48c13 Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that this was just a temporary issue, so I used the Semantic Scholar landing page. But it seems that it's maybe an issue that will take longer to resolve. I agree that the Google Scholar reference would work better here since it has an alternative working link. Thanks!

@d-kleine
Copy link
Contributor

@d-kleine d-kleine commented on 1e48c13 Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yeah, might be temporary. I have raised an issue here: openai/gpt-2#352 (I don't have high hopes that this will be fixed soon).

I am also super happy that the link checker CI works fairly good for both internal as well as external links 😄

Please sign in to comment.