-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tutorials on pitchfork data (ULMfit) #2
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, good start - add intro sections to both scripts, take advantage of headers to break things up, change the training process to iteratively unfreeze weights, possibly check out fastprogress, use relative paths, and never put keys/secrets in source code again.
], | ||
"source": [ | ||
"print(os.getcwd())\n", | ||
"path='/media/jlealtru/data_files/github/Tutorials/TextAnalytics/pitchfork_data'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use relative paths - add either a standalone script that secures data from source or run it in an intro section, but show how to download the source data directly so all your steps can be rebuilt.
"learn_classifier.freeze_to(-2)\n", | ||
"lr /= 2\n", | ||
"learn_classifier.fit_one_cycle(1, slice(lr/(2.6**4),lr), moms=(0.8,0.7))\n", | ||
"#learn_classifier.fit_one_cycle(2, slice(1e-4/2,1e-2/2), moms=(0.8,0.7))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at fastprogress - I think folks would find it really interesting to be abel to iteratively build a training graph as you progress.
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"learn_classifier.unfreeze()\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fastai folks recommend iteratively unfreezing layers sequentially. Start with -1, then -2, then -3, then unfreeze all. That will likely help out.
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"In this tutorial we are going to implement a transfer learning model for text version of the ULMfit. \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format this markdown and add TOC with hyperlinks and additional sources to review.
"# username \n", | ||
"os.environ['KAGGLE_USERNAME'] = \"jlealtru\" \n", | ||
"# key\n", | ||
"os.environ['KAGGLE_KEY'] = \"6c3a4d6b4d8e7804780d6cb02879ac53\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jlealtru never post secrets/keys in source code. You have a couple options. The easiest, although not safest, is creating a separate file and import that file in and reference the variable name only in the code. Alternatively, you can leverage something like Azure Key Vault (easy to use, super powerful - think of OnePassword or LastPass except at scale/programatically)
], | ||
"source": [ | ||
"#learn.fit_one_cycle(10, 2e-3, moms=(0.8,0.7), wd=0.1)\n", | ||
"learn_pitchfork.fit_one_cycle(12, 2e-3/3, moms=(0.8,0.7), wd= 0.1)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again - iteratively unfreeze layers and train - track progress using something like fastprogress
Adding tutorials on pitchfork data and some old code.