SDC Workshop Winter 2018 - Introduction to Applied Machine Learning (ML) and Natural Language Processing (NLP)
12/12 - 2018, Beijing
Dr. Daniel S. Hain, [email protected] Dr. Roman Jurowetzki, [email protected]
Aalborg University, Denmark
In this repository, you will find all notebooks, presentations and materials from the workshop. We will also use it to link to some Kaggle kernels that you can explore for interactive exercises.
Due to the time limitations, the workshop will feature interactive tutorials but no "self-run coding exercises".
Please register on kaggle.com run the exercises.
In this workshop, we will not teach you one particular trending method or approach but rather introduce to Data Science as a field and its approach to working with data.
Sure, we can only do so much in 3 days, and therefore we tried to find a good balance of broad overview and specific applications.
Hopefully, this will give you a good foundation or at least starting point to learn more. Today, it is really easy to find excellent resources and get skilled at sophistic analytical techniques. But, you need to know what to look for and how all the different things out there relate to each other.
While for several reasons – mostly path dependancy – the innovation studies (and general social science) community are relying on expensive proprietory packages (e.g. SPSS, Stata, SAS or EViews), the people that work with Big Data analytics are working with R and/or Python. We decided not to focus on just one language but will present you both so you can decide which one you find most approachable.
Below you will find links to the different things presented during the workshop. We will update this repository during and after the workshop.
- L1 Notebook static (html)
- L1 Notebook dynamic (executable on Kaggle)
- L1.5 Notebook static (executed Jupyter Notebook on NBViewer)
- L1.5 Notebook dynamic (executable on Kaggle)
- L2 Notebook static (html)
- L2 Notebook dynamic (executable on Kaggle)
- Wine Case study static (executed Jupyter Notebook on NBViewer)
- Wine Case study dynamic (executable on Kaggle)
Vosviewer Easy software for bibliometrics
Citespace More complex bibliometrix software including geospacial features and mapping.
Datacamp Online courses. Intro to R, Python, Github, Excel and Sheets are free Recommended courses:
- R basics: "Introduction to R" (free course)
- R unsupervised ML: "Unsupervised Learning in R" (chapter 1 free)
- R Supervised ML: "Unsupervised Learning in R" (chapter 1 free)
- R Data visualization: "Data Visualization with ggplot2 (Part 1)" (chapter 1 free)
Dataquest Similar to datacamp. Python focused. Also more advanced courses on data engineering
Open Data Science Masters Curriculum Collection of free online resources on all kinds of Data Science topics.
Data and scripts from the ML A-Z course from Udemy R and Python scripts from the course including the course data. The course can be found on Udemy and is usually available for around 12USD.
- Installing R on your machine
- Installing the RStudio IDE on your machine
- Installing Python on Windows
- Installing Python on Mac
- Network analysis and visualization software
- Stackoverflow: Programming help & advice forum
- Informative podcast about professional analytics
- R-Bloggers: R news and tutorials