Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tutorials on pitchfork data (ULMfit) #2

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

jlealtru
Copy link

@jlealtru jlealtru commented Jul 5, 2019

Adding tutorials on pitchfork data and some old code.

Copy link
Owner

@datawrestler datawrestler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, good start - add intro sections to both scripts, take advantage of headers to break things up, change the training process to iteratively unfreeze weights, possibly check out fastprogress, use relative paths, and never put keys/secrets in source code again.

],
"source": [
"print(os.getcwd())\n",
"path='/media/jlealtru/data_files/github/Tutorials/TextAnalytics/pitchfork_data'"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use relative paths - add either a standalone script that secures data from source or run it in an intro section, but show how to download the source data directly so all your steps can be rebuilt.

"learn_classifier.freeze_to(-2)\n",
"lr /= 2\n",
"learn_classifier.fit_one_cycle(1, slice(lr/(2.6**4),lr), moms=(0.8,0.7))\n",
"#learn_classifier.fit_one_cycle(2, slice(1e-4/2,1e-2/2), moms=(0.8,0.7))"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at fastprogress - I think folks would find it really interesting to be abel to iteratively build a training graph as you progress.

"metadata": {},
"outputs": [],
"source": [
"learn_classifier.unfreeze()\n",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fastai folks recommend iteratively unfreezing layers sequentially. Start with -1, then -2, then -3, then unfreeze all. That will likely help out.

"cell_type": "markdown",
"metadata": {},
"source": [
"In this tutorial we are going to implement a transfer learning model for text version of the ULMfit. \n",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format this markdown and add TOC with hyperlinks and additional sources to review.

"# username \n",
"os.environ['KAGGLE_USERNAME'] = \"jlealtru\" \n",
"# key\n",
"os.environ['KAGGLE_KEY'] = \"6c3a4d6b4d8e7804780d6cb02879ac53\""
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlealtru never post secrets/keys in source code. You have a couple options. The easiest, although not safest, is creating a separate file and import that file in and reference the variable name only in the code. Alternatively, you can leverage something like Azure Key Vault (easy to use, super powerful - think of OnePassword or LastPass except at scale/programatically)

],
"source": [
"#learn.fit_one_cycle(10, 2e-3, moms=(0.8,0.7), wd=0.1)\n",
"learn_pitchfork.fit_one_cycle(12, 2e-3/3, moms=(0.8,0.7), wd= 0.1)"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again - iteratively unfreeze layers and train - track progress using something like fastprogress

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants