From b1532df5b03adf11a2e30c2d5e4d21b88897a5aa Mon Sep 17 00:00:00 2001 From: Howard Baek <50791792+howardbaek@users.noreply.github.com> Date: Tue, 9 Jan 2024 12:15:49 -0800 Subject: [PATCH 01/20] Initial commit --- 03-intro-to-python.Rmd | 54 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/03-intro-to-python.Rmd b/03-intro-to-python.Rmd index 44d374e..06dc6ce 100644 --- a/03-intro-to-python.Rmd +++ b/03-intro-to-python.Rmd @@ -7,3 +7,57 @@ ```{r, fig.align='center', echo = FALSE, fig.alt= "Major point!! example image"} ottrpal::include_slide("https://docs.google.com/presentation/d/1k8uC1rqnGTSbKjBsWvKYgiUUxO1q_VhJCwZQHJNWozA/edit#slide=id.g29054a882fd_0_52") ``` + +The main difference between Python and R is that Python is a general-purpose programming language, while R is a statistical programming language. This means that Python is good at multiple things and can do most things, whereas R is very good at statistical analysis, but not as good at other things as Python. + +Python is supported by multiple libraries that support data science tasks: + +- NumPy for numerical computing with multidimensional arrays +- pandas for data manipulation and analysis with data frames +- Matplotlib for data visualization + +Python is particularly suited for large scale machine learning. + +When you want to document your code, you can use Jupyter Notebooks. Jupyter Notebooks are an open source web application for easily sharing documents that contain your live Python code, equations, visualizations and data science explanations. + + + +## Python Syntax for R Users + +Source: https://rstudio.github.io/reticulate/articles/python_primer.html + +### Whitespace + + +### Lists + + +### Tuples + + +### Dictionaries + + +### Sets + +### Iteration with for loops + +### Comprehensions + +### Functions + +### Classes + + +### Importing modules + +### Integers and Floats + + +### Learning More + +https://docs.Python.org/3/ + +https://docs.Python.org/3/library/index.html + + From 34dae96575351d4237f046a3f795854b2b1cfa17 Mon Sep 17 00:00:00 2001 From: Howard Baek <50791792+howardbaek@users.noreply.github.com> Date: Tue, 9 Jan 2024 12:34:48 -0800 Subject: [PATCH 02/20] Start writing the Intro and Python syntax --- 03-intro-to-python.Rmd | 52 ++++++++++++++++++++++++++++++++++-------- 1 file changed, 42 insertions(+), 10 deletions(-) diff --git a/03-intro-to-python.Rmd b/03-intro-to-python.Rmd index 06dc6ce..d8a3a63 100644 --- a/03-intro-to-python.Rmd +++ b/03-intro-to-python.Rmd @@ -1,14 +1,7 @@ # Intro to Python - -## Learning Objectives - -```{r, fig.align='center', echo = FALSE, fig.alt= "Major point!! example image"} -ottrpal::include_slide("https://docs.google.com/presentation/d/1k8uC1rqnGTSbKjBsWvKYgiUUxO1q_VhJCwZQHJNWozA/edit#slide=id.g29054a882fd_0_52") -``` - -The main difference between Python and R is that Python is a general-purpose programming language, while R is a statistical programming language. This means that Python is good at multiple things and can do most things, whereas R is very good at statistical analysis, but not as good at other things as Python. +Python is a popular programming language that was created by Guido van Rossum and released in 1991. Python is supported by multiple libraries that support data science tasks: @@ -16,20 +9,59 @@ Python is supported by multiple libraries that support data science tasks: - pandas for data manipulation and analysis with data frames - Matplotlib for data visualization -Python is particularly suited for large scale machine learning. +The main difference between Python and R is that Python is a general-purpose programming language, while R is a statistical programming language. This means that Python is good at multiple things and can do most things, whereas R is very good at statistical analysis, but not as good at other things as Python. When you want to document your code, you can use Jupyter Notebooks. Jupyter Notebooks are an open source web application for easily sharing documents that contain your live Python code, equations, visualizations and data science explanations. +Python is particularly suited for large scale machine learning. + +## Learning Objectives + +```{r, fig.align='center', echo = FALSE, fig.alt= "Major point!! example image"} +ottrpal::include_slide("https://docs.google.com/presentation/d/1k8uC1rqnGTSbKjBsWvKYgiUUxO1q_VhJCwZQHJNWozA/edit#slide=id.g29054a882fd_0_52") +``` ## Python Syntax for R Users Source: https://rstudio.github.io/reticulate/articles/python_primer.html +Most important difference in syntax is 0-based indexing for Python and 1-based indexing for R. This means that in R, indexing starts with 1 and in Python, indexing starts with 0. Coming from R, this means you have to subtract your "R indexes" by 1 to get the correct index in Python. + +Other major differences in Python: + ### Whitespace +Important in Python. In R, expressions are grouped into a code block with `{}`. In Python, expressions are grouped by indentation level. + +For example, in R, an if statement looks like: + +```{r} +x <- 1 + +if (x > 0) { + print("x is positive") +} else { + print("x is negative") +} +``` + +In Python, the equivalent if statement looks like: + +```{python} +x = 1 + +if x > 0: + print("x is positive") +else: + print("x is negative") +``` + + +### Data Structures + -### Lists +#### Lists ### Tuples From 916e1ae3095b9da0b40b737a4198a7daaf25e83e Mon Sep 17 00:00:00 2001 From: Howard Baek <50791792+howardbaek@users.noreply.github.com> Date: Mon, 12 Feb 2024 13:16:07 -0800 Subject: [PATCH 03/20] Initial draft --- .../figure-html/unnamed-chunk-2-1.png | Bin .../figure-html/unnamed-chunk-3-1.png | Bin 03-intro-to-python.Rmd | 225 +++++++++++++++++- 3 files changed, 214 insertions(+), 11 deletions(-) rename {_bookdown_files/02-chapter_of_course_files => 02-chapter_of_course_files}/figure-html/unnamed-chunk-2-1.png (100%) rename {_bookdown_files/02-chapter_of_course_files => 02-chapter_of_course_files}/figure-html/unnamed-chunk-3-1.png (100%) diff --git a/_bookdown_files/02-chapter_of_course_files/figure-html/unnamed-chunk-2-1.png b/02-chapter_of_course_files/figure-html/unnamed-chunk-2-1.png similarity index 100% rename from _bookdown_files/02-chapter_of_course_files/figure-html/unnamed-chunk-2-1.png rename to 02-chapter_of_course_files/figure-html/unnamed-chunk-2-1.png diff --git a/_bookdown_files/02-chapter_of_course_files/figure-html/unnamed-chunk-3-1.png b/02-chapter_of_course_files/figure-html/unnamed-chunk-3-1.png similarity index 100% rename from _bookdown_files/02-chapter_of_course_files/figure-html/unnamed-chunk-3-1.png rename to 02-chapter_of_course_files/figure-html/unnamed-chunk-3-1.png diff --git a/03-intro-to-python.Rmd b/03-intro-to-python.Rmd index d8a3a63..0f01613 100644 --- a/03-intro-to-python.Rmd +++ b/03-intro-to-python.Rmd @@ -9,11 +9,13 @@ Python is supported by multiple libraries that support data science tasks: - pandas for data manipulation and analysis with data frames - Matplotlib for data visualization +## Main Differences between R and Python + The main difference between Python and R is that Python is a general-purpose programming language, while R is a statistical programming language. This means that Python is good at multiple things and can do most things, whereas R is very good at statistical analysis, but not as good at other things as Python. When you want to document your code, you can use Jupyter Notebooks. Jupyter Notebooks are an open source web application for easily sharing documents that contain your live Python code, equations, visualizations and data science explanations. -Python is particularly suited for large scale machine learning. +Python is particularly suited for large scale machine learning and deep learning with libraries such as TensorFlow, PyTorch, and scikit-learn. ## Learning Objectives @@ -24,8 +26,6 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1k8uC1rqnGTSbKjBs ## Python Syntax for R Users -Source: https://rstudio.github.io/reticulate/articles/python_primer.html - Most important difference in syntax is 0-based indexing for Python and 1-based indexing for R. This means that in R, indexing starts with 1 and in Python, indexing starts with 0. Coming from R, this means you have to subtract your "R indexes" by 1 to get the correct index in Python. Other major differences in Python: @@ -60,36 +60,239 @@ else: ### Data Structures +There are 4 different data storage formats, or data structures, in Python: lists, tuples, dictionaries, and sets #### Lists +Python lists are created using brackets `[]`. You can add elements to the list through the `append()` method. + +```{python} +x = [1, 2, 3] +x.append(4) # add 4 to the end of list + +print("x is", x) +#> x is [1, 2, 3, 4] +``` + + +You can index into lists with integers using brackets `[]`, but note that indexing is 0-based. + +```{python} +x = [1, 2, 3] + +x[0] +#> 1 +x[1] +#> 2 +x[2] +#> 3 +``` + -### Tuples +Negative numbers count from the end of the list. +```{python} +x = [1, 2, 3] + +x[-1] +#> 3 +x[-2] +#> 2 +x[-3] +#> 1 +``` + +You can slice ranges of lists using the : inside brackets. Note that the slice syntax is not inclusive of the end of the slice range. + +```{python} +x = [1, 2, 3, 4, 5, 6] +x[0:2] # get items at index positions 0, 1 +#> [1, 2] +x[1:] # get items from index position 1 to the end +#> [2, 3, 4, 5, 6] +x[:-2] # get items from beginning up to the 2nd to last. +#> [1, 2, 3, 4] +x[:] # get all the items +#> [1, 2, 3, 4, 5, 6] +``` -### Dictionaries +#### Tuples -### Sets +Tuples behave like lists, but are constructued using `()`, instead of `[]`. + +```{python} +x = (1, 2) # tuple of length 2 +type(x) +#> +len(x) +#> 2 +x +#> (1, 2) + +x = (1,) # tuple of length 1 +type(x) +#> +len(x) +#> 1 +x +#> (1,) + +x = 1, 2 # also a tuple +type(x) +#> +len(x) +#> 2 + +x = 1, # beware a single trailing comma! This is a tuple! +type(x) +#> +len(x) +#> 1 +``` + +#### Dictionaries + +Dictionaries are data structures where you can retrieve items by name. They can be created using syntax like {key: value}. + +```{python} +d = {"key1": 1, + "key2": 2} + +d["key1"] +#> 1 +d["key3"] = 3 +d +#> {'key1': 1, 'key2': 2, 'key3': 3} +``` + +#### Sets + +Sets are used to track unique items, and can be constructed using `{val1, val2}`. + +```{python} +s = {1, 2, 3} + +type(s) +#> +s +#> {1, 2, 3} +``` ### Iteration with for loops + +The `for` statement in Python is similar to the `for` loop in R. It can be used to iterate over any kind of data structure. -### Comprehensions +```{python} +for x in [1, 2, 3]: + print(x) +#> 1 +#> 2 +#> 3 +``` ### Functions -### Classes +Python functions are defined with the def statement. The syntax for specifying function arguments and default values is very similar to R. + +```{python} +def my_function(name = "World"): + print("Hello", name) + +my_function() +#> Hello World +my_function("Friend") +#> Hello Friend +``` + +The equivalent R code would be + +```{r} +my_function <- function(name = "World") { + cat("Hello", name, "\n") +} + +my_function() +#> Hello World +my_function("Friend") +#> Hello Friend +``` + + +### Classes and Object Oriented Programming (OOP) + +In R, the most widely used unit of composition for code is functions, and in Python, it is classes. Classes are how you organize and find methods in Python. This approach to code composition is called object oriented programming (OOP). Let's dive in the details of OOP. + +An object is any entity that you want to store and process data about. Each object is an instance of a class in the computer's memory. A class is a template for creating objects. Creating an object from a class is called instantiation. It has properties and methods (functions for the class). + +For example, we could have a class called Person. The properties of this class are what describe this Person class: + +- first_name +- last_name +- gender +- date_of_birth +- occupatiaon + + +The methods of this class are the functions for this Person class: + +- walk() +- run() +- sleep() +- eat() + + +Here is a simple Person class for demonstration purposes. + +```{python} +class Person: + pass # `pass` means do nothing. + +Person +#> +type(Person) +#> + +instance = Person() +instance +#> <__main__.Person object at 0x102ba75e0> +type(instance) +#> +``` + +Like the `def` statement, the `class` statement is used to create a Python class. First note the strong naming convention, classes are typically CamelCase, and functions are typically snake_case. After defining Person, you can interact with it, and see that it has type 'type'. Calling `instance = Person()` creates a new object instance of the class, which has type `Person` (ignore the __main__. prefix for now). ### Importing modules -### Integers and Floats +In R, authors can bundle their code into R packages, and R users can access objects from R packages via `library()` or `::`. In Python, authors bundle code into modules, and users access modules using `import`. + +```{python} +import numpy +``` +Once loaded, you can access symbols from the module using `.`, which is equivalent to `::` in R. + +```{python} +numpy.abs(-1) +``` + +There is special syntax for conveniently bounding a module to a symbol upon importing. + +```{python} +import numpy # import +import numpy as np # import and bind to a custom symbol `np` + +from numpy import abs # import only `numpy.abs` +from numpy import abs as abs2 # import only `numpy.abs`, bind it to `abs2` +``` ### Learning More -https://docs.Python.org/3/ +IF you want to learn more, browse the official documentation for Python: https://docs.Python.org/3/ -https://docs.Python.org/3/library/index.html +### References +- https://rstudio.github.io/reticulate/articles/python_primer.html +- https://www.youtube.com/watch?v=m_MQYyJpIjg From 94b7c64a0e3c7b6441d1a8f14d5834e19a6bfff9 Mon Sep 17 00:00:00 2001 From: Howard Baek <50791792+howardbaek@users.noreply.github.com> Date: Mon, 12 Feb 2024 13:25:22 -0800 Subject: [PATCH 04/20] Use RStudio Projects --- .gitignore | 1 + python.Rproj | 18 ++++++++++++++++++ 2 files changed, 19 insertions(+) create mode 100644 python.Rproj diff --git a/.gitignore b/.gitignore index dc638be..094bf0c 100644 --- a/.gitignore +++ b/.gitignore @@ -8,3 +8,4 @@ spell_check_results.tsv .RData .httr-oauth docker/git_token.txt +.Rproj.user diff --git a/python.Rproj b/python.Rproj new file mode 100644 index 0000000..aecd28b --- /dev/null +++ b/python.Rproj @@ -0,0 +1,18 @@ +Version: 1.0 + +RestoreWorkspace: Default +SaveWorkspace: Default +AlwaysSaveHistory: Default + +EnableCodeIndexing: Yes +UseSpacesForTab: Yes +NumSpacesForTab: 2 +Encoding: UTF-8 + +RnwWeave: Sweave +LaTeX: pdfLaTeX + +AutoAppendNewline: Yes +StripTrailingWhitespace: Yes + +BuildType: Website From 801f01f5e84b02dccaf24b24f426f1b8b9c65883 Mon Sep 17 00:00:00 2001 From: Howard Baek <50791792+howardbaek@users.noreply.github.com> Date: Tue, 13 Feb 2024 10:52:14 -0800 Subject: [PATCH 05/20] Update 03-intro-to-python.Rmd Co-authored-by: Till Hoffmann --- 03-intro-to-python.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/03-intro-to-python.Rmd b/03-intro-to-python.Rmd index 0f01613..8f44cd7 100644 --- a/03-intro-to-python.Rmd +++ b/03-intro-to-python.Rmd @@ -193,7 +193,7 @@ for x in [1, 2, 3]: ### Functions -Python functions are defined with the def statement. The syntax for specifying function arguments and default values is very similar to R. +Python functions are defined with the `def` statement. The syntax for specifying function arguments and default values is very similar to R. ```{python} def my_function(name = "World"): From 169aceff2f8313a3ec005ec905196d95fda28d50 Mon Sep 17 00:00:00 2001 From: Howard Baek <50791792+howardbaek@users.noreply.github.com> Date: Tue, 13 Feb 2024 10:52:37 -0800 Subject: [PATCH 06/20] Update 03-intro-to-python.Rmd Co-authored-by: Till Hoffmann --- 03-intro-to-python.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/03-intro-to-python.Rmd b/03-intro-to-python.Rmd index 8f44cd7..dcd1ef2 100644 --- a/03-intro-to-python.Rmd +++ b/03-intro-to-python.Rmd @@ -289,7 +289,7 @@ from numpy import abs as abs2 # import only `numpy.abs`, bind it to `abs2` ### Learning More -IF you want to learn more, browse the official documentation for Python: https://docs.Python.org/3/ +If you want to learn more, browse the [official documentation for Python](https://docs.Python.org/3/). ### References From 33b1580b7972ea0511dc6a0531c7a234c41596b6 Mon Sep 17 00:00:00 2001 From: Howard Baek <50791792+howardbaek@users.noreply.github.com> Date: Tue, 13 Feb 2024 10:56:53 -0800 Subject: [PATCH 07/20] Incorporate Till's feedback --- 03-intro-to-python.Rmd | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/03-intro-to-python.Rmd b/03-intro-to-python.Rmd index 0f01613..3801d58 100644 --- a/03-intro-to-python.Rmd +++ b/03-intro-to-python.Rmd @@ -5,17 +5,17 @@ Python is a popular programming language that was created by Guido van Rossum an Python is supported by multiple libraries that support data science tasks: -- NumPy for numerical computing with multidimensional arrays -- pandas for data manipulation and analysis with data frames -- Matplotlib for data visualization +- [NumPy](https://numpy.org/) for numerical computing with multidimensional arrays. +- [pandas](https://pandas.pydata.org/) for data manipulation and analysis with data frames. +- [Matplotlib](https://matplotlib.org/) for data visualization. ## Main Differences between R and Python The main difference between Python and R is that Python is a general-purpose programming language, while R is a statistical programming language. This means that Python is good at multiple things and can do most things, whereas R is very good at statistical analysis, but not as good at other things as Python. -When you want to document your code, you can use Jupyter Notebooks. Jupyter Notebooks are an open source web application for easily sharing documents that contain your live Python code, equations, visualizations and data science explanations. +You can use Jupyter Notebooks to generate reports and share them with others. Jupyter Notebooks are an open source web application for easily sharing documents that contain your live Python code, equations, visualizations and data science explanations. -Python is particularly suited for large scale machine learning and deep learning with libraries such as TensorFlow, PyTorch, and scikit-learn. +Python is particularly suited for large scale machine learning and deep learning with libraries such as [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), and [scikit-learn](https://scikit-learn.org/stable/). ## Learning Objectives From 460155f1f5f6b157adb67481c259a0fb7ab2e7c9 Mon Sep 17 00:00:00 2001 From: Howard Baek <50791792+howardbaek@users.noreply.github.com> Date: Thu, 15 Feb 2024 09:55:09 -0800 Subject: [PATCH 08/20] Apply suggestions from code review Co-authored-by: Candace Savonen --- 03-intro-to-python.Rmd | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/03-intro-to-python.Rmd b/03-intro-to-python.Rmd index dcd1ef2..92d946c 100644 --- a/03-intro-to-python.Rmd +++ b/03-intro-to-python.Rmd @@ -13,6 +13,8 @@ Python is supported by multiple libraries that support data science tasks: The main difference between Python and R is that Python is a general-purpose programming language, while R is a statistical programming language. This means that Python is good at multiple things and can do most things, whereas R is very good at statistical analysis, but not as good at other things as Python. +Python is also a "object oriented programming" language (often abbreviated OOP). Programming languages range from being *very* object oriented to not object oriented at all. R, for example is not object oriented much at all, it is not really built for making specific data classes, we typically just use the ones that come with it like data.frames, vectors, etc. But as an OOP, Python makes it easier to build and enforce your own types of objects. We'll dive more into OOP concepts in a later chapter. + When you want to document your code, you can use Jupyter Notebooks. Jupyter Notebooks are an open source web application for easily sharing documents that contain your live Python code, equations, visualizations and data science explanations. Python is particularly suited for large scale machine learning and deep learning with libraries such as TensorFlow, PyTorch, and scikit-learn. @@ -26,7 +28,7 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1k8uC1rqnGTSbKjBs ## Python Syntax for R Users -Most important difference in syntax is 0-based indexing for Python and 1-based indexing for R. This means that in R, indexing starts with 1 and in Python, indexing starts with 0. Coming from R, this means you have to subtract your "R indexes" by 1 to get the correct index in Python. +An important difference in syntax is 0-based indexing for Python and 1-based indexing for R. This means that in R, indexing starts with 1 and in Python, indexing starts with 0. Coming from R, this means you have to subtract your "R indexes" by 1 to get the correct index in Python. Other major differences in Python: @@ -119,7 +121,7 @@ x[:] # get all the items #### Tuples -Tuples behave like lists, but are constructued using `()`, instead of `[]`. +Tuples behave like lists, but are constructed using `()`, instead of `[]`. ```{python} x = (1, 2) # tuple of length 2 @@ -153,7 +155,7 @@ len(x) #### Dictionaries -Dictionaries are data structures where you can retrieve items by name. They can be created using syntax like {key: value}. +Dictionaries are data structures where you can retrieve items by name. They can be created using syntax like `{key: value}`. ```{python} d = {"key1": 1, From 8f298738a8d6a183e0f3b27c677a2a5b661da0e9 Mon Sep 17 00:00:00 2001 From: Howard Baek <50791792+howardbaek@users.noreply.github.com> Date: Mon, 15 Apr 2024 11:51:09 -0700 Subject: [PATCH 09/20] Oops --- docs/03-intro-to-python.md | 660 +++++++++++++++++++++++++++---------- 1 file changed, 485 insertions(+), 175 deletions(-) diff --git a/docs/03-intro-to-python.md b/docs/03-intro-to-python.md index b51c647..a05618c 100644 --- a/docs/03-intro-to-python.md +++ b/docs/03-intro-to-python.md @@ -1,256 +1,566 @@ -# A new chapter +# Intro to Python -*If you haven't yet read the getting started Wiki pages; [start there](https://github.com/jhudsl/OTTR_Template/wiki/Getting-started) +Python is a popular programming language that was created by Guido van Rossum and released in 1991. -Every chapter needs to start out with this chunk of code: +Python is supported by multiple libraries that support data science tasks: +- [NumPy](https://numpy.org/) for numerical computing with multidimensional arrays. +- [pandas](https://pandas.pydata.org/) for data manipulation and analysis with data frames. +- [Matplotlib](https://matplotlib.org/) for data visualization. + +## Main Differences between R and Python + +The main difference between Python and R is that Python is a general-purpose programming language, while R is a statistical programming language. This means that Python is good at multiple things and can do most things, whereas R is very good at statistical analysis, but not as good at other things as Python. + +You can use Jupyter Notebooks to generate reports and share them with others. Jupyter Notebooks are an open source web application for easily sharing documents that contain your live Python code, equations, visualizations and data science explanations. + +Python is particularly suited for large scale machine learning and deep learning with libraries such as [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), and [scikit-learn](https://scikit-learn.org/stable/). + +# TODO: When to use R vs Python ## Learning Objectives -*Every chapter also needs Learning objectives that will look like this: +Major point!! example image + +## Python Syntax for R Users -This chapter will cover: +Most important difference in syntax is 0-based indexing for Python and 1-based indexing for R. This means that in R, indexing starts with 1 and in Python, indexing starts with 0. Coming from R, this means you have to subtract your "R indexes" by 1 to get the correct index in Python. -- {You can use https://tips.uark.edu/using-blooms-taxonomy/ to define some learning objectives here} -- {Another learning objective} +Other major differences in Python: -## Libraries +### Whitespace -For this chapter, we'll need the following packages attached: +Important in Python. In R, expressions are grouped into a code block with `{}`. In Python, expressions are grouped by indentation level. -*Remember to add [any additional packages you need to your course's own docker image](https://github.com/jhudsl/OTTR_Template/wiki/Using-Docker#starting-a-new-docker-image). +For example, in R, an if statement looks like: ```r -library(magrittr) +x <- 1 + +if (x > 0) { + print("x is positive") +} else { + print("x is negative") +} ``` -# Topic of Section +``` +## [1] "x is positive" +``` -You can write all your text in sections like this! +In Python, the equivalent if statement looks like: -## Subtopic -Here's a subheading and some text in this subsection! +```python +x = 1 -### Code examples +if x > 0: + print("x is positive") +else: + print("x is negative") +``` -You can demonstrate code like this: +``` +## x is positive +``` -```r -output_dir <- file.path("resources", "code_output") -if (!dir.exists(output_dir)) { - dir.create(output_dir) -} +### Data Structures + +There are 4 different data storage formats, or data structures, in Python: lists, tuples, dictionaries, and sets + +#### Lists + +Python lists are created using brackets `[]`. You can add elements to the list through the `append()` method. + + +```python +x = [1, 2, 3] +x.append(4) # add 4 to the end of list + +print("x is", x) ``` -And make plots too: +``` +## x is [1, 2, 3, 4] +``` +```python +#> x is [1, 2, 3, 4] +``` -```r -hist_plot <- hist(iris$Sepal.Length) + +You can index into lists with integers using brackets `[]`, but note that indexing is 0-based. + + +```python +x = [1, 2, 3] + +x[0] ``` - +``` +## 1 +``` -You can also save these plots to file: +```python +#> 1 +x[1] +``` +``` +## 2 +``` -```r -png(file.path(output_dir, "test_plot.png")) -hist_plot -``` - -``` -## $breaks -## [1] 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 -## -## $counts -## [1] 5 27 27 30 31 18 6 6 -## -## $density -## [1] 0.06666667 0.36000000 0.36000000 0.40000000 0.41333333 0.24000000 0.08000000 -## [8] 0.08000000 -## -## $mids -## [1] 4.25 4.75 5.25 5.75 6.25 6.75 7.25 7.75 -## -## $xname -## [1] "iris$Sepal.Length" -## -## $equidist -## [1] TRUE -## -## attr(,"class") -## [1] "histogram" +```python +#> 2 +x[2] ``` -```r -dev.off() +``` +## 3 +``` + +```python +#> 3 +``` + + +Negative numbers count from the end of the list. + + +```python +x = [1, 2, 3] + +x[-1] +``` + +``` +## 3 +``` + +```python +#> 3 +x[-2] +``` + +``` +## 2 +``` + +```python +#> 2 +x[-3] +``` + +``` +## 1 +``` + +```python +#> 1 +``` + +You can slice ranges of lists using the : inside brackets. Note that the slice syntax is not inclusive of the end of the slice range. + + +```python +x = [1, 2, 3, 4, 5, 6] +x[0:2] # get items at index positions 0, 1 ``` ``` -## png -## 2 +## [1, 2] ``` -### Image example +```python +#> [1, 2] +x[1:] # get items from index position 1 to the end +``` -How to include a Google slide. It's simplest to use the `ottrpal` package: +``` +## [2, 3, 4, 5, 6] +``` -Major point!! example image +```python +#> [2, 3, 4, 5, 6] +x[:-2] # get items from beginning up to the 2nd to last. +``` -But if you have the slide or some other image locally downloaded you can also use html like this: +``` +## [1, 2, 3, 4] +``` -Major point!! example image +```python +#> [1, 2, 3, 4] +x[:] # get all the items +``` -### Video examples +``` +## [1, 2, 3, 4, 5, 6] +``` -To show videos in your course, you can use markdown syntax like this: +```python +#> [1, 2, 3, 4, 5, 6] +``` -[A video we want to show](https://www.youtube.com/embed/VOCYL-FNbr0) -Alternatively, you can use `knitr::include_url()` like this: -Note that we are using `echo=FALSE` in the code chunk because we don't want the code part of this to show up. -If you are unfamiliar with [how R Markdown code chunks work, read this](https://rmarkdown.rstudio.com/lesson-3.html). +#### Tuples - +Tuples behave like lists, but are constructued using `()`, instead of `[]`. -OR this works: - +```python +x = (1, 2) # tuple of length 2 +type(x) +``` -### Links to files +``` +## +``` -This works: +```python +#> +len(x) +``` - +``` +## 2 +``` -Or this: +```python +#> 2 +x +``` -[This works](https://www.bgsu.edu/content/dam/BGSU/center-for-faculty-excellence/docs/TLGuides/TLGuide-Learning-Objectives.pdf). +``` +## (1, 2) +``` -Or this: +```python +#> (1, 2) - +x = (1,) # tuple of length 1 +type(x) +``` -### Links to websites +``` +## +``` -Examples of including a website link. +```python +#> +len(x) +``` -This works: +``` +## 1 +``` - +```python +#> 1 +x +``` -OR this: +``` +## (1,) +``` -![Another link](https://yihui.org) +```python +#> (1,) -OR this: +x = 1, 2 # also a tuple +type(x) +``` - +``` +## +``` -### Citation examples +```python +#> +len(x) +``` -We can put citations at the end of a sentence like this [@rmarkdown2021]. -Or multiple citations [@rmarkdown2021, @Xie2018]. +``` +## 2 +``` -but they need a ; separator [@rmarkdown2021; @Xie2018]. +```python +#> 2 -In text, we can put citations like this @rmarkdown2021. +x = 1, # beware a single trailing comma! This is a tuple! +type(x) +``` -### FYI boxes +``` +## +``` -::: {.fyi} -Please click on the subsection headers in the left hand -navigation bar (e.g., 2.1, 4.3) a second time to expand the -table of contents and enable the `scroll_highlight` feature -([see more](introduction.html#scroll-highlight)). -::: +```python +#> +len(x) +``` -### Dropdown summaries +``` +## 1 +``` -
You can hide additional information in a dropdown menu -Here's more words that are hidden. -
+```python +#> 1 +``` + +#### Dictionaries + +Dictionaries are data structures where you can retrieve items by name. They can be created using syntax like {key: value}. + + +```python +d = {"key1": 1, + "key2": 2} + +d["key1"] +``` -## Print out session info +``` +## 1 +``` -You should print out session info when you have code for [reproducibility purposes](https://jhudatascience.org/Reproducibility_in_Cancer_Informatics/managing-package-versions.html). +```python +#> 1 +d["key3"] = 3 +d +``` + +``` +## {'key1': 1, 'key2': 2, 'key3': 3} +``` + +```python +#> {'key1': 1, 'key2': 2, 'key3': 3} +``` + +#### Sets + +Sets are used to track unique items, and can be constructed using `{val1, val2}`. + + +```python +s = {1, 2, 3} + +type(s) +``` + +``` +## +``` + +```python +#> +s +``` + +``` +## {1, 2, 3} +``` + +```python +#> {1, 2, 3} +``` + +### Iteration with for loops + +The `for` statement in Python is similar to the `for` loop in R. It can be used to iterate over any kind of data structure. + + +```python +for x in [1, 2, 3]: + print(x) +``` + +``` +## 1 +## 2 +## 3 +``` + +```python +#> 1 +#> 2 +#> 3 +``` + +### Functions + +Python functions are defined with the `def` statement. The syntax for specifying function arguments and default values is very similar to R. + + +```python +def my_function(name = "World"): + print("Hello", name) + +my_function() +``` + +``` +## Hello World +``` + +```python +#> Hello World +my_function("Friend") +``` + +``` +## Hello Friend +``` + +```python +#> Hello Friend +``` + +The equivalent R code would be ```r -devtools::session_info() -``` - -``` -## ─ Session info ─────────────────────────────────────────────────────────────── -## setting value -## version R version 4.0.2 (2020-06-22) -## os Ubuntu 20.04.3 LTS -## system x86_64, linux-gnu -## ui X11 -## language (EN) -## collate en_US.UTF-8 -## ctype en_US.UTF-8 -## tz Etc/UTC -## date 2023-12-11 -## -## ─ Packages ─────────────────────────────────────────────────────────────────── -## package * version date lib source -## assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.0.3) -## bookdown 0.24 2022-02-15 [1] Github (rstudio/bookdown@88bc4ea) -## callr 3.4.4 2020-09-07 [1] RSPM (R 4.0.2) -## cli 2.0.2 2020-02-28 [1] RSPM (R 4.0.0) -## crayon 1.3.4 2017-09-16 [1] RSPM (R 4.0.0) -## curl 4.3 2019-12-02 [1] RSPM (R 4.0.3) -## desc 1.2.0 2018-05-01 [1] RSPM (R 4.0.3) -## devtools 2.3.2 2020-09-18 [1] RSPM (R 4.0.3) -## digest 0.6.25 2020-02-23 [1] RSPM (R 4.0.0) -## ellipsis 0.3.1 2020-05-15 [1] RSPM (R 4.0.3) -## evaluate 0.14 2019-05-28 [1] RSPM (R 4.0.3) -## fansi 0.4.1 2020-01-08 [1] RSPM (R 4.0.0) -## fs 1.5.0 2020-07-31 [1] RSPM (R 4.0.3) -## glue 1.6.1 2022-01-22 [1] CRAN (R 4.0.2) -## highr 0.8 2019-03-20 [1] RSPM (R 4.0.3) -## hms 0.5.3 2020-01-08 [1] RSPM (R 4.0.0) -## htmltools 0.5.0 2020-06-16 [1] RSPM (R 4.0.1) -## httr 1.4.2 2020-07-20 [1] RSPM (R 4.0.3) -## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.2) -## knitr 1.33 2022-02-15 [1] Github (yihui/knitr@a1052d1) -## lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.2) -## magrittr * 2.0.2 2022-01-26 [1] CRAN (R 4.0.2) -## memoise 1.1.0 2017-04-21 [1] RSPM (R 4.0.0) -## ottrpal 0.1.2 2022-02-15 [1] Github (jhudsl/ottrpal@1018848) -## pillar 1.4.6 2020-07-10 [1] RSPM (R 4.0.2) -## pkgbuild 1.1.0 2020-07-13 [1] RSPM (R 4.0.2) -## pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.0.3) -## pkgload 1.1.0 2020-05-29 [1] RSPM (R 4.0.3) -## png 0.1-7 2013-12-03 [1] CRAN (R 4.0.2) -## prettyunits 1.1.1 2020-01-24 [1] RSPM (R 4.0.3) -## processx 3.4.4 2020-09-03 [1] RSPM (R 4.0.2) -## ps 1.3.4 2020-08-11 [1] RSPM (R 4.0.2) -## purrr 0.3.4 2020-04-17 [1] RSPM (R 4.0.3) -## R6 2.4.1 2019-11-12 [1] RSPM (R 4.0.0) -## readr 1.4.0 2020-10-05 [1] RSPM (R 4.0.2) -## remotes 2.2.0 2020-07-21 [1] RSPM (R 4.0.3) -## rlang 0.4.10 2022-02-15 [1] Github (r-lib/rlang@f0c9be5) -## rmarkdown 2.10 2022-02-15 [1] Github (rstudio/rmarkdown@02d3c25) -## rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.2) -## sessioninfo 1.1.1 2018-11-05 [1] RSPM (R 4.0.3) -## stringi 1.5.3 2020-09-09 [1] RSPM (R 4.0.3) -## stringr 1.4.0 2019-02-10 [1] RSPM (R 4.0.3) -## testthat 3.0.1 2022-02-15 [1] Github (R-lib/testthat@e99155a) -## tibble 3.0.3 2020-07-10 [1] RSPM (R 4.0.2) -## usethis 2.1.5.9000 2022-02-15 [1] Github (r-lib/usethis@57b109a) -## vctrs 0.3.4 2020-08-29 [1] RSPM (R 4.0.2) -## withr 2.3.0 2020-09-22 [1] RSPM (R 4.0.2) -## xfun 0.26 2022-02-15 [1] Github (yihui/xfun@74c2a66) -## yaml 2.2.1 2020-02-01 [1] RSPM (R 4.0.3) -## -## [1] /usr/local/lib/R/site-library -## [2] /usr/local/lib/R/library +my_function <- function(name = "World") { + cat("Hello", name, "\n") +} + +my_function() ``` + +``` +## Hello World +``` + +```r +#> Hello World +my_function("Friend") +``` + +``` +## Hello Friend +``` + +```r +#> Hello Friend +``` + + +### Classes and Object Oriented Programming (OOP) + +In R, the most widely used unit of composition for code is functions, and in Python, it is classes. Classes are how you organize and find methods in Python. This approach to code composition is called object oriented programming (OOP). Let's dive in the details of OOP. + +An object is any entity that you want to store and process data about. Each object is an instance of a class in the computer's memory. A class is a template for creating objects. Creating an object from a class is called instantiation. It has properties and methods (functions for the class). + +For example, we could have a class called Person. The properties of this class are what describe this Person class: + +- `first_name` +- `last_name` +- `gender` +- `date_of_birth` +- `occupation` + + +The methods of this class are the functions for this Person class: + +- `walk()` +- `run()` +- `sleep()` +- `eat()` + + +Here is a simple Person class for demonstration purposes. + + +```python +class Person: + pass # `pass` means do nothing. + +Person +``` + +``` +## +``` + +```python +#> +type(Person) +``` + +``` +## +``` + +```python +#> + +instance = Person() +instance +``` + +``` +## <__main__.Person object at 0x10e0f5ed0> +``` + +```python +#> <__main__.Person object at 0x102ba75e0> +type(instance) +``` + +``` +## +``` + +```python +#> +``` + +Like the `def` statement, the `class` statement is used to create a Python class. First note the strong naming convention, classes are typically CamelCase, and functions are typically snake_case. After defining Person, you can interact with it, and see that it has type 'type'. Calling `instance = Person()` creates a new object instance of the class, which has type `Person` (ignore the __main__. prefix for now). + + +### Importing modules + +In R, authors can bundle their code into R packages, and R users can access objects from R packages via `library()` or `::`. In Python, authors bundle code into modules, and users access modules using `import`. + + +```python +import numpy +``` + +Once loaded, you can access symbols from the module using `.`, which is equivalent to `::` in R. + + +```python +numpy.abs(-1) +``` + +``` +## 1 +``` + +There is special syntax for conveniently bounding a module to a symbol upon importing. + + +```python +import numpy # import +import numpy as np # import and bind to a custom symbol `np` + +from numpy import abs # import only `numpy.abs` +from numpy import abs as abs2 # import only `numpy.abs`, bind it to `abs2` +``` + +### Learning More + +If you want to learn more, browse the [official documentation for Python](https://docs.Python.org/3/). + +### References + +- https://rstudio.github.io/reticulate/articles/python_primer.html +- https://www.youtube.com/watch?v=m_MQYyJpIjg + From 61e8d005c639ba46c5a3f4b07c922471e42d3256 Mon Sep 17 00:00:00 2001 From: Howard Baek <50791792+howardbaek@users.noreply.github.com> Date: Mon, 15 Apr 2024 11:58:47 -0700 Subject: [PATCH 10/20] Add the md file --- 03-intro-to-python.md | 257 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 257 insertions(+) create mode 100644 03-intro-to-python.md diff --git a/03-intro-to-python.md b/03-intro-to-python.md new file mode 100644 index 0000000..583279d --- /dev/null +++ b/03-intro-to-python.md @@ -0,0 +1,257 @@ + +# Intro to Python + +Python is a popular programming language that was created by Guido van Rossum and released in 1991. + +Python is supported by multiple libraries that support data science tasks: + +- [NumPy](https://numpy.org/) for numerical computing with multidimensional arrays. +- [pandas](https://pandas.pydata.org/) for data manipulation and analysis with data frames. +- [Matplotlib](https://matplotlib.org/) for data visualization. + +## Main Differences between R and Python + +| Feature | Python | R | +|---------------------------|--------------------------------------------------------------------------------|----------------------------------------------------------------------------------| +| Purpose | General-purpose programming language | Statistical programming language | +| Suitability | Good at multiple things, including machine learning and deep learning | Very good at statistical analysis but less versatile for other tasks | +| Key Libraries | [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), [scikit-learn](https://scikit-learn.org/stable/) | Primarily statistical and visualization libraries (not specified in the text) | +| Tool for Sharing | Jupyter Notebooks: Open source web application for sharing documents with live Python code, equations, visualizations, and explanations | Same as Python, as Jupyter Notebooks support both Python and R | + + + +## Learning Objectives + +```markdown +![alternative if the image is broken](https://docs.google.com/presentation/d/1k8uC1rqnGTSbKjBsWvKYgiUUxO1q_VhJCwZQHJNWozA/export/png?pageid=g29054a882fd_0_52) +``` + + +## Python Syntax for R Users + +An important difference in syntax is 0-based indexing for Python and 1-based indexing for R. This means that in R, indexing starts with 1 and in Python, indexing starts with 0. Coming from R, this means you have to subtract your "R indexes" by 1 to get the correct index in Python. + +Other major differences in Python: + +### Whitespace + +Important in Python. In R, expressions are grouped into a code block with `{}`. In Python, expressions are grouped by indentation level. + +For example, in R, an if statement looks like: + +```{r} +x <- 1 + +if (x > 0) { + print("x is positive") +} else { + print("x is negative") +} +``` + +In Python, the equivalent if statement looks like: + +```python +x = 1 + +if x > 0: + print("x is positive") +else: + print("x is negative") +``` + + +### Data Structures + +There are 4 different data storage formats, or data structures, in Python: lists, tuples, dictionaries, and sets + +#### Lists + +Python lists are created using brackets `[]`. You can add elements to the list through the `append()` method. + +```python +x = [1, 2, 3] +x.append(4) # add 4 to the end of list + +print("x is", x) +#> x is [1, 2, 3, 4] +``` + + +You can index into lists with integers using brackets `[]`, but note that indexing is 0-based. + +```python +x = [1, 2, 3] + +x[0] +#> 1 +x[1] +#> 2 +x[2] +#> 3 +``` + + +Negative numbers count from the end of the list. + +```python +x = [1, 2, 3] + +x[-1] +#> 3 +x[-2] +#> 2 +x[-3] +#> 1 +``` + +You can slice ranges of lists using the : inside brackets. Note that the slice syntax is not inclusive of the end of the slice range. + +```python +x = [1, 2, 3, 4, 5, 6] +x[0:2] # get items at index positions 0, 1 +#> [1, 2] +x[1:] # get items from index position 1 to the end +#> [2, 3, 4, 5, 6] +x[:-2] # get items from beginning up to the 2nd to last. +#> [1, 2, 3, 4] +x[:] # get all the items +#> [1, 2, 3, 4, 5, 6] +``` + + +#### Tuples + +Tuples behave like lists, but are constructed using `()`, instead of `[]`. + +```python +x = (1, 2) # tuple of length 2 +type(x) +#> +len(x) +#> 2 +x +#> (1, 2) + +x = (1,) # tuple of length 1 +type(x) +#> +len(x) +#> 1 +x +#> (1,) + +x = 1, 2 # also a tuple +type(x) +#> +len(x) +#> 2 + +x = 1, # beware a single trailing comma! This is a tuple! +type(x) +#> +len(x) +#> 1 +``` + +#### Dictionaries + +Dictionaries are data structures where you can retrieve items by name. They can be created using syntax like `{key: value}`. + +```python +d = {"key1": 1, + "key2": 2} + +d["key1"] +#> 1 +d["key3"] = 3 +d +#> {'key1': 1, 'key2': 2, 'key3': 3} +``` + +#### Sets + +Sets are used to track unique items, and can be constructed using `{val1, val2}`. + +```python +s = {1, 2, 3} + +type(s) +#> +s +#> {1, 2, 3} +``` + +### Iteration with for loops + +The `for` statement in Python is similar to the `for` loop in R. It can be used to iterate over any kind of data structure. + +```python +for x in [1, 2, 3]: + print(x) +#> 1 +#> 2 +#> 3 +``` + +### Functions + +Python functions are defined with the `def` statement. The syntax for specifying function arguments and default values is very similar to R. + +```python +def my_function(name = "World"): + print("Hello", name) + +my_function() +#> Hello World +my_function("Friend") +#> Hello Friend +``` + +The equivalent R code would be + +```{r} +my_function <- function(name = "World") { + cat("Hello", name, "\n") +} + +my_function() +#> Hello World +my_function("Friend") +#> Hello Friend +``` + + +### Importing modules + +In R, authors can bundle their code into R packages, and R users can access objects from R packages via `library()` or `::`. In Python, authors bundle code into modules, and users access modules using `import`. + +```python +import numpy +``` + +Once loaded, you can access symbols from the module using `.`, which is equivalent to `::` in R. + +```python +numpy.abs(-1) +``` + +There is special syntax for conveniently bounding a module to a symbol upon importing. + +```python +import numpy # import +import numpy as np # import and bind to a custom symbol `np` + +from numpy import abs # import only `numpy.abs` +from numpy import abs as abs2 # import only `numpy.abs`, bind it to `abs2` +``` + +### Learning More + +If you want to learn more, browse the [official documentation for Python](https://docs.Python.org/3/). + +### References + +- https://rstudio.github.io/reticulate/articles/python_primer.html +- https://www.youtube.com/watch?v=m_MQYyJpIjg + From 894532fae1ebcf14c2966bc64ab7a03bcd5982e5 Mon Sep 17 00:00:00 2001 From: Howard Baek <50791792+howardbaek@users.noreply.github.com> Date: Mon, 15 Apr 2024 11:59:42 -0700 Subject: [PATCH 11/20] Delete 03-intro-to-python.Rmd --- 03-intro-to-python.Rmd | 300 ----------------------------------------- 1 file changed, 300 deletions(-) delete mode 100644 03-intro-to-python.Rmd diff --git a/03-intro-to-python.Rmd b/03-intro-to-python.Rmd deleted file mode 100644 index a0035ef..0000000 --- a/03-intro-to-python.Rmd +++ /dev/null @@ -1,300 +0,0 @@ - -# Intro to Python - -Python is a popular programming language that was created by Guido van Rossum and released in 1991. - -Python is supported by multiple libraries that support data science tasks: - -- [NumPy](https://numpy.org/) for numerical computing with multidimensional arrays. -- [pandas](https://pandas.pydata.org/) for data manipulation and analysis with data frames. -- [Matplotlib](https://matplotlib.org/) for data visualization. - -## Main Differences between R and Python - -The main difference between Python and R is that Python is a general-purpose programming language, while R is a statistical programming language. This means that Python is good at multiple things and can do most things, whereas R is very good at statistical analysis, but not as good at other things as Python. - -Python is also a "object oriented programming" language (often abbreviated OOP). Programming languages range from being *very* object oriented to not object oriented at all. R, for example is not object oriented much at all, it is not really built for making specific data classes, we typically just use the ones that come with it like data.frames, vectors, etc. But as an OOP, Python makes it easier to build and enforce your own types of objects. We'll dive more into OOP concepts in a later chapter. - -You can use Jupyter Notebooks to generate reports and share them with others. Jupyter Notebooks are an open source web application for easily sharing documents that contain your live Python code, equations, visualizations and data science explanations. - -Python is particularly suited for large scale machine learning and deep learning with libraries such as [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), and [scikit-learn](https://scikit-learn.org/stable/). - - -## Learning Objectives - -```{r, fig.align='center', echo = FALSE, fig.alt= "Major point!! example image"} -ottrpal::include_slide("https://docs.google.com/presentation/d/1k8uC1rqnGTSbKjBsWvKYgiUUxO1q_VhJCwZQHJNWozA/edit#slide=id.g29054a882fd_0_52") -``` - -## Python Syntax for R Users - -An important difference in syntax is 0-based indexing for Python and 1-based indexing for R. This means that in R, indexing starts with 1 and in Python, indexing starts with 0. Coming from R, this means you have to subtract your "R indexes" by 1 to get the correct index in Python. - -Other major differences in Python: - -### Whitespace - -Important in Python. In R, expressions are grouped into a code block with `{}`. In Python, expressions are grouped by indentation level. - -For example, in R, an if statement looks like: - -```{r} -x <- 1 - -if (x > 0) { - print("x is positive") -} else { - print("x is negative") -} -``` - -In Python, the equivalent if statement looks like: - -```{python} -x = 1 - -if x > 0: - print("x is positive") -else: - print("x is negative") -``` - - -### Data Structures - -There are 4 different data storage formats, or data structures, in Python: lists, tuples, dictionaries, and sets - -#### Lists - -Python lists are created using brackets `[]`. You can add elements to the list through the `append()` method. - -```{python} -x = [1, 2, 3] -x.append(4) # add 4 to the end of list - -print("x is", x) -#> x is [1, 2, 3, 4] -``` - - -You can index into lists with integers using brackets `[]`, but note that indexing is 0-based. - -```{python} -x = [1, 2, 3] - -x[0] -#> 1 -x[1] -#> 2 -x[2] -#> 3 -``` - - -Negative numbers count from the end of the list. - -```{python} -x = [1, 2, 3] - -x[-1] -#> 3 -x[-2] -#> 2 -x[-3] -#> 1 -``` - -You can slice ranges of lists using the : inside brackets. Note that the slice syntax is not inclusive of the end of the slice range. - -```{python} -x = [1, 2, 3, 4, 5, 6] -x[0:2] # get items at index positions 0, 1 -#> [1, 2] -x[1:] # get items from index position 1 to the end -#> [2, 3, 4, 5, 6] -x[:-2] # get items from beginning up to the 2nd to last. -#> [1, 2, 3, 4] -x[:] # get all the items -#> [1, 2, 3, 4, 5, 6] -``` - - -#### Tuples - -Tuples behave like lists, but are constructed using `()`, instead of `[]`. - -```{python} -x = (1, 2) # tuple of length 2 -type(x) -#> -len(x) -#> 2 -x -#> (1, 2) - -x = (1,) # tuple of length 1 -type(x) -#> -len(x) -#> 1 -x -#> (1,) - -x = 1, 2 # also a tuple -type(x) -#> -len(x) -#> 2 - -x = 1, # beware a single trailing comma! This is a tuple! -type(x) -#> -len(x) -#> 1 -``` - -#### Dictionaries - -Dictionaries are data structures where you can retrieve items by name. They can be created using syntax like `{key: value}`. - -```{python} -d = {"key1": 1, - "key2": 2} - -d["key1"] -#> 1 -d["key3"] = 3 -d -#> {'key1': 1, 'key2': 2, 'key3': 3} -``` - -#### Sets - -Sets are used to track unique items, and can be constructed using `{val1, val2}`. - -```{python} -s = {1, 2, 3} - -type(s) -#> -s -#> {1, 2, 3} -``` - -### Iteration with for loops - -The `for` statement in Python is similar to the `for` loop in R. It can be used to iterate over any kind of data structure. - -```{python} -for x in [1, 2, 3]: - print(x) -#> 1 -#> 2 -#> 3 -``` - -### Functions - -Python functions are defined with the `def` statement. The syntax for specifying function arguments and default values is very similar to R. - -```{python} -def my_function(name = "World"): - print("Hello", name) - -my_function() -#> Hello World -my_function("Friend") -#> Hello Friend -``` - -The equivalent R code would be - -```{r} -my_function <- function(name = "World") { - cat("Hello", name, "\n") -} - -my_function() -#> Hello World -my_function("Friend") -#> Hello Friend -``` - - -### Classes and Object Oriented Programming (OOP) - -In R, the most widely used unit of composition for code is functions, and in Python, it is classes. Classes are how you organize and find methods in Python. This approach to code composition is called object oriented programming (OOP). Let's dive in the details of OOP. - -An object is any entity that you want to store and process data about. Each object is an instance of a class in the computer's memory. A class is a template for creating objects. Creating an object from a class is called instantiation. It has properties and methods (functions for the class). - -For example, we could have a class called Person. The properties of this class are what describe this Person class: - -- first_name -- last_name -- gender -- date_of_birth -- occupatiaon - - -The methods of this class are the functions for this Person class: - -- walk() -- run() -- sleep() -- eat() - - -Here is a simple Person class for demonstration purposes. - -```{python} -class Person: - pass # `pass` means do nothing. - -Person -#> -type(Person) -#> - -instance = Person() -instance -#> <__main__.Person object at 0x102ba75e0> -type(instance) -#> -``` - -Like the `def` statement, the `class` statement is used to create a Python class. First note the strong naming convention, classes are typically CamelCase, and functions are typically snake_case. After defining Person, you can interact with it, and see that it has type 'type'. Calling `instance = Person()` creates a new object instance of the class, which has type `Person` (ignore the __main__. prefix for now). - - -### Importing modules - -In R, authors can bundle their code into R packages, and R users can access objects from R packages via `library()` or `::`. In Python, authors bundle code into modules, and users access modules using `import`. - -```{python} -import numpy -``` - -Once loaded, you can access symbols from the module using `.`, which is equivalent to `::` in R. - -```{python} -numpy.abs(-1) -``` - -There is special syntax for conveniently bounding a module to a symbol upon importing. - -```{python} -import numpy # import -import numpy as np # import and bind to a custom symbol `np` - -from numpy import abs # import only `numpy.abs` -from numpy import abs as abs2 # import only `numpy.abs`, bind it to `abs2` -``` - -### Learning More - -If you want to learn more, browse the [official documentation for Python](https://docs.Python.org/3/). - -### References - -- https://rstudio.github.io/reticulate/articles/python_primer.html -- https://www.youtube.com/watch?v=m_MQYyJpIjg - From 1527ea3743bd8ed1f1682312627431302fabfc3e Mon Sep 17 00:00:00 2001 From: Howard Baek <50791792+howardbaek@users.noreply.github.com> Date: Mon, 15 Apr 2024 12:00:13 -0700 Subject: [PATCH 12/20] Update .gitignore --- .gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/.gitignore b/.gitignore index 094bf0c..79e0271 100644 --- a/.gitignore +++ b/.gitignore @@ -9,3 +9,4 @@ spell_check_results.tsv .httr-oauth docker/git_token.txt .Rproj.user +python.Rproj From 2644c9d6978317d8aabcc0c8e476014db430427b Mon Sep 17 00:00:00 2001 From: Howard Baek <50791792+howardbaek@users.noreply.github.com> Date: Wed, 17 Apr 2024 14:02:39 -0700 Subject: [PATCH 13/20] Gitignore `.ipynb_checkpoints/` --- .gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/.gitignore b/.gitignore index 3d2a827..9d2cace 100644 --- a/.gitignore +++ b/.gitignore @@ -15,3 +15,4 @@ docker/git_token.txt python.Rproj venv .vscode +.ipynb_checkpoints/* From cc4dd2cad035c1c265b50fe9a4c481d08631d4be Mon Sep 17 00:00:00 2001 From: Howard Baek <50791792+howardbaek@users.noreply.github.com> Date: Wed, 17 Apr 2024 14:03:46 -0700 Subject: [PATCH 14/20] Delete `python.Rproj` --- python.Rproj | 18 ------------------ 1 file changed, 18 deletions(-) delete mode 100644 python.Rproj diff --git a/python.Rproj b/python.Rproj deleted file mode 100644 index aecd28b..0000000 --- a/python.Rproj +++ /dev/null @@ -1,18 +0,0 @@ -Version: 1.0 - -RestoreWorkspace: Default -SaveWorkspace: Default -AlwaysSaveHistory: Default - -EnableCodeIndexing: Yes -UseSpacesForTab: Yes -NumSpacesForTab: 2 -Encoding: UTF-8 - -RnwWeave: Sweave -LaTeX: pdfLaTeX - -AutoAppendNewline: Yes -StripTrailingWhitespace: Yes - -BuildType: Website From 6921d06e87a5ee9d309d8107bbcbe9495c2660a3 Mon Sep 17 00:00:00 2001 From: Howard Baek <50791792+howardbaek@users.noreply.github.com> Date: Wed, 17 Apr 2024 14:04:41 -0700 Subject: [PATCH 15/20] Delete .ipynb_checkpoints/03-intro-to-python-checkpoint.md --- .../03-intro-to-python-checkpoint.md | 302 ------------------ 1 file changed, 302 deletions(-) delete mode 100644 .ipynb_checkpoints/03-intro-to-python-checkpoint.md diff --git a/.ipynb_checkpoints/03-intro-to-python-checkpoint.md b/.ipynb_checkpoints/03-intro-to-python-checkpoint.md deleted file mode 100644 index 911069b..0000000 --- a/.ipynb_checkpoints/03-intro-to-python-checkpoint.md +++ /dev/null @@ -1,302 +0,0 @@ - -# Intro to Python - -Python is a popular programming language that was created by Guido van Rossum and released in 1991. - -Python is supported by multiple libraries that support data science tasks: - -- [NumPy](https://numpy.org/) for numerical computing with multidimensional arrays. -- [pandas](https://pandas.pydata.org/) for data manipulation and analysis with data frames. -- [Matplotlib](https://matplotlib.org/) for data visualization. - -## Main Differences between R and Python - -The main difference between Python and R is that Python is a general-purpose programming language, while R is a statistical programming language. This means that Python is good at multiple things and can do most things, whereas R is very good at statistical analysis, but not as good at other things as Python. - -You can use Jupyter Notebooks to generate reports and share them with others. Jupyter Notebooks are an open source web application for easily sharing documents that contain your live Python code, equations, visualizations and data science explanations. - -Python is particularly suited for large scale machine learning and deep learning with libraries such as [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), and [scikit-learn](https://scikit-learn.org/stable/). - -# TODO: When to use R vs Python - -When you want to do statistics, R. If you have big datasets you want to manipulate, you might want to go to Python. The community in Python isn't always stats-focused. For ex, Genomics community is big in R. - - -## Learning Objectives - -```{r, fig.align='center', echo = FALSE, fig.alt= "Major point!! example image"} -ottrpal::include_slide("https://docs.google.com/presentation/d/1k8uC1rqnGTSbKjBsWvKYgiUUxO1q_VhJCwZQHJNWozA/edit#slide=id.g29054a882fd_0_52") -``` - -## Python Syntax for R Users - -Most important difference in syntax is 0-based indexing for Python and 1-based indexing for R. This means that in R, indexing starts with 1 and in Python, indexing starts with 0. Coming from R, this means you have to subtract your "R indexes" by 1 to get the correct index in Python. - -Other major differences in Python: - -### Whitespace - -Important in Python. In R, expressions are grouped into a code block with `{}`. In Python, expressions are grouped by indentation level. - -For example, in R, an if statement looks like: - -```{r} -x <- 1 - -if (x > 0) { - print("x is positive") -} else { - print("x is negative") -} -``` - -In Python, the equivalent if statement looks like: - -```{python} -x = 1 - -if x > 0: - print("x is positive") -else: - print("x is negative") -``` - - -### Data Structures - -There are 4 different data storage formats, or data structures, in Python: lists, tuples, dictionaries, and sets - -#### Lists - -Python lists are created using brackets `[]`. You can add elements to the list through the `append()` method. - -```{python} -x = [1, 2, 3] -x.append(4) # add 4 to the end of list - -print("x is", x) -#> x is [1, 2, 3, 4] -``` - - -You can index into lists with integers using brackets `[]`, but note that indexing is 0-based. - -```{python} -x = [1, 2, 3] - -x[0] -#> 1 -x[1] -#> 2 -x[2] -#> 3 -``` - - -Negative numbers count from the end of the list. - -```{python} -x = [1, 2, 3] - -x[-1] -#> 3 -x[-2] -#> 2 -x[-3] -#> 1 -``` - -You can slice ranges of lists using the : inside brackets. Note that the slice syntax is not inclusive of the end of the slice range. - -```{python} -x = [1, 2, 3, 4, 5, 6] -x[0:2] # get items at index positions 0, 1 -#> [1, 2] -x[1:] # get items from index position 1 to the end -#> [2, 3, 4, 5, 6] -x[:-2] # get items from beginning up to the 2nd to last. -#> [1, 2, 3, 4] -x[:] # get all the items -#> [1, 2, 3, 4, 5, 6] -``` - - -#### Tuples - -Tuples behave like lists, but are constructued using `()`, instead of `[]`. - -```{python} -x = (1, 2) # tuple of length 2 -type(x) -#> -len(x) -#> 2 -x -#> (1, 2) - -x = (1,) # tuple of length 1 -type(x) -#> -len(x) -#> 1 -x -#> (1,) - -x = 1, 2 # also a tuple -type(x) -#> -len(x) -#> 2 - -x = 1, # beware a single trailing comma! This is a tuple! -type(x) -#> -len(x) -#> 1 -``` - -#### Dictionaries - -Dictionaries are data structures where you can retrieve items by name. They can be created using syntax like {key: value}. - -```{python} -d = {"key1": 1, - "key2": 2} - -d["key1"] -#> 1 -d["key3"] = 3 -d -#> {'key1': 1, 'key2': 2, 'key3': 3} -``` - -#### Sets - -Sets are used to track unique items, and can be constructed using `{val1, val2}`. - -```{python} -s = {1, 2, 3} - -type(s) -#> -s -#> {1, 2, 3} -``` - -### Iteration with for loops - -The `for` statement in Python is similar to the `for` loop in R. It can be used to iterate over any kind of data structure. - -```{python} -for x in [1, 2, 3]: - print(x) -#> 1 -#> 2 -#> 3 -``` - -### Functions - -Python functions are defined with the `def` statement. The syntax for specifying function arguments and default values is very similar to R. - -```{python} -def my_function(name = "World"): - print("Hello", name) - -my_function() -#> Hello World -my_function("Friend") -#> Hello Friend -``` - -The equivalent R code would be - -```{r} -my_function <- function(name = "World") { - cat("Hello", name, "\n") -} - -my_function() -#> Hello World -my_function("Friend") -#> Hello Friend -``` - - -### Classes and Object Oriented Programming (OOP) - -In R, the most widely used unit of composition for code is functions, and in Python, it is classes. Classes are how you organize and find methods in Python. This approach to code composition is called object oriented programming (OOP). Let's dive in the details of OOP. - -An object is any entity that you want to store and process data about. Each object is an instance of a class in the computer's memory. A class is a template for creating objects. Creating an object from a class is called instantiation. It has properties and methods (functions for the class). - -For example, we could have a class called Person. The properties of this class are what describe this Person class: - -- `first_name` -- `last_name` -- `gender` -- `date_of_birth` -- `occupation` - - -The methods of this class are the functions for this Person class: - -- `walk()` -- `run()` -- `sleep()` -- `eat()` - - -Here is a simple Person class for demonstration purposes. - -```{python} -class Person: - pass # `pass` means do nothing. - -Person -#> -type(Person) -#> - -instance = Person() -instance -#> <__main__.Person object at 0x102ba75e0> -type(instance) -#> -``` - -Like the `def` statement, the `class` statement is used to create a Python class. First note the strong naming convention, classes are typically CamelCase, and functions are typically snake_case. After defining Person, you can interact with it, and see that it has type 'type'. Calling `instance = Person()` creates a new object instance of the class, which has type `Person` (ignore the __main__. prefix for now). - - -### Importing modules - -In R, authors can bundle their code into R packages, and R users can access objects from R packages via `library()` or `::`. In Python, authors bundle code into modules, and users access modules using `import`. - -```{python} -import numpy -``` - -Once loaded, you can access symbols from the module using `.`, which is equivalent to `::` in R. - -```{python} -numpy.abs(-1) -``` - -There is special syntax for conveniently bounding a module to a symbol upon importing. - -```{python} -import numpy # import -import numpy as np # import and bind to a custom symbol `np` - -from numpy import abs # import only `numpy.abs` -from numpy import abs as abs2 # import only `numpy.abs`, bind it to `abs2` -``` - -### Learning More - -If you want to learn more, browse the [official documentation for Python](https://docs.Python.org/3/). - -### References - -- https://rstudio.github.io/reticulate/articles/python_primer.html -- https://www.youtube.com/watch?v=m_MQYyJpIjg - From ae1e93dc40ab751a65b99f5b5ba264e4e34d6d4d Mon Sep 17 00:00:00 2001 From: Candace Savonen Date: Tue, 9 Jul 2024 13:44:52 -0400 Subject: [PATCH 16/20] Rmd -> md in _bookdown.yml --- _bookdown.yml | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/_bookdown.yml b/_bookdown.yml index d3a825d..174cbfa 100644 --- a/_bookdown.yml +++ b/_bookdown.yml @@ -4,19 +4,19 @@ repo: https://github.com/datatrail-jhu/python rmd_files: - index.Rmd - 00-demo.md -- 01-intro.Rmd -- 02-set-up.Rmd -- 03-intro-to-python.Rmd -- 04-getting-data.Rmd +- 01-intro.md +- 02-set-up.md +- 03-intro-to-python.md +- 04-getting-data.md - 05-cleaning-data_01-intro.md - 05-cleaning-data_02-reshaping-data.md - 05-cleaning-data_03-tidying-data.md -- 06-plotting-data.Rmd -- 07-getting-stats.Rmd -- 08-scripting-python.Rmd -- 09-more-python.Rmd -- About.Rmd -- References.Rmd +- 06-plotting-data.md +- 07-getting-stats.md +- 08-scripting-python.md +- 09-more-python.md +- About.md +- References.md new_session: yes bibliography: - book.bib From 0fb441423b313bed7dcda9146c471bad4dbb3a48 Mon Sep 17 00:00:00 2001 From: Candace Savonen Date: Tue, 9 Jul 2024 13:57:51 -0400 Subject: [PATCH 17/20] Change to mds --- 01-intro.Rmd => 01-intro.md | 0 02-set-up.Rmd => 02-set-up.md | 0 04-getting-data.Rmd => 04-getting-data.md | 0 06-plotting-data.Rmd => 06-plotting-data.md | 0 07-getting-stats.Rmd => 07-getting-stats.md | 0 08-scripting-python.Rmd => 08-scripting-python.md | 0 09-more-python.Rmd => 09-more-python.md | 0 About.Rmd => About.md | 0 References.Rmd => References.md | 0 index.Rmd => index.md | 0 10 files changed, 0 insertions(+), 0 deletions(-) rename 01-intro.Rmd => 01-intro.md (100%) rename 02-set-up.Rmd => 02-set-up.md (100%) rename 04-getting-data.Rmd => 04-getting-data.md (100%) rename 06-plotting-data.Rmd => 06-plotting-data.md (100%) rename 07-getting-stats.Rmd => 07-getting-stats.md (100%) rename 08-scripting-python.Rmd => 08-scripting-python.md (100%) rename 09-more-python.Rmd => 09-more-python.md (100%) rename About.Rmd => About.md (100%) rename References.Rmd => References.md (100%) rename index.Rmd => index.md (100%) diff --git a/01-intro.Rmd b/01-intro.md similarity index 100% rename from 01-intro.Rmd rename to 01-intro.md diff --git a/02-set-up.Rmd b/02-set-up.md similarity index 100% rename from 02-set-up.Rmd rename to 02-set-up.md diff --git a/04-getting-data.Rmd b/04-getting-data.md similarity index 100% rename from 04-getting-data.Rmd rename to 04-getting-data.md diff --git a/06-plotting-data.Rmd b/06-plotting-data.md similarity index 100% rename from 06-plotting-data.Rmd rename to 06-plotting-data.md diff --git a/07-getting-stats.Rmd b/07-getting-stats.md similarity index 100% rename from 07-getting-stats.Rmd rename to 07-getting-stats.md diff --git a/08-scripting-python.Rmd b/08-scripting-python.md similarity index 100% rename from 08-scripting-python.Rmd rename to 08-scripting-python.md diff --git a/09-more-python.Rmd b/09-more-python.md similarity index 100% rename from 09-more-python.Rmd rename to 09-more-python.md diff --git a/About.Rmd b/About.md similarity index 100% rename from About.Rmd rename to About.md diff --git a/References.Rmd b/References.md similarity index 100% rename from References.Rmd rename to References.md diff --git a/index.Rmd b/index.md similarity index 100% rename from index.Rmd rename to index.md From 9c4791e66ccf824c9bc11b92ae3370f2e5b65fd6 Mon Sep 17 00:00:00 2001 From: Candace Savonen Date: Tue, 9 Jul 2024 14:03:19 -0400 Subject: [PATCH 18/20] index.Rmd --- index.md => index.Rmd | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename index.md => index.Rmd (100%) diff --git a/index.md b/index.Rmd similarity index 100% rename from index.md rename to index.Rmd From 12bc8f4f7fc1a09ca64b099da1fae37b9ef01173 Mon Sep 17 00:00:00 2001 From: Candace Savonen Date: Tue, 9 Jul 2024 14:09:13 -0400 Subject: [PATCH 19/20] Add to dictionary --- resources/dictionary.txt | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/resources/dictionary.txt b/resources/dictionary.txt index cfae3b5..585cd95 100644 --- a/resources/dictionary.txt +++ b/resources/dictionary.txt @@ -61,6 +61,7 @@ lc Leanpub Macaca Markua +Matplotlib mentorship Mohammed Monotremata @@ -70,6 +71,7 @@ Nayef NCI NHGRI nt +NumPy Nyctibeus omni OTTR @@ -86,10 +88,12 @@ Potto pre Proboscidea reproducibility +Rossum RMarkdown Rodentia Savonen Scandentia +scikit sexualized socio Soricomorpha From 0767ee52265ab6921afc07df11bcaebc77b2a43d Mon Sep 17 00:00:00 2001 From: Candace Savonen Date: Tue, 9 Jul 2024 14:10:21 -0400 Subject: [PATCH 20/20] Add exclude files --- resources/exclude_files.txt | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 resources/exclude_files.txt diff --git a/resources/exclude_files.txt b/resources/exclude_files.txt new file mode 100644 index 0000000..5525a40 --- /dev/null +++ b/resources/exclude_files.txt @@ -0,0 +1,8 @@ +About.Rmd +docs/* +style-sets/* +manuscript/* +CONTRIBUTING.md +LICENSE.md +code_of_conduct.md +README.md