Skip to content

Commit

Permalink
Merge pull request #10 from datatrail-jhu/howardbaek/r-vs-python
Browse files Browse the repository at this point in the history
Intro to Python Chaper
  • Loading branch information
cansavvy authored Jul 9, 2024
2 parents 45c11ad + 0767ee5 commit e3e4cd2
Show file tree
Hide file tree
Showing 18 changed files with 840 additions and 19 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,11 @@ spell_check_results.tsv
.RData
.httr-oauth
docker/git_token.txt
.Rproj.user
python.Rproj
venv
.vscode
.ipynb_checkpoints/*
playground
__pycache__
*.pyc
File renamed without changes.
File renamed without changes.
9 changes: 0 additions & 9 deletions 03-intro-to-python.Rmd

This file was deleted.

257 changes: 257 additions & 0 deletions 03-intro-to-python.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,257 @@

# Intro to Python

Python is a popular programming language that was created by Guido van Rossum and released in 1991.

Python is supported by multiple libraries that support data science tasks:

- [NumPy](https://numpy.org/) for numerical computing with multidimensional arrays.
- [pandas](https://pandas.pydata.org/) for data manipulation and analysis with data frames.
- [Matplotlib](https://matplotlib.org/) for data visualization.

## Main Differences between R and Python

| Feature | Python | R |
|---------------------------|--------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
| Purpose | General-purpose programming language | Statistical programming language |
| Suitability | Good at multiple things, including machine learning and deep learning | Very good at statistical analysis but less versatile for other tasks |
| Key Libraries | [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), [scikit-learn](https://scikit-learn.org/stable/) | Primarily statistical and visualization libraries (not specified in the text) |
| Tool for Sharing | Jupyter Notebooks: Open source web application for sharing documents with live Python code, equations, visualizations, and explanations | Same as Python, as Jupyter Notebooks support both Python and R |



## Learning Objectives

```markdown
![alternative if the image is broken](https://docs.google.com/presentation/d/1k8uC1rqnGTSbKjBsWvKYgiUUxO1q_VhJCwZQHJNWozA/export/png?pageid=g29054a882fd_0_52)
```


## Python Syntax for R Users

An important difference in syntax is 0-based indexing for Python and 1-based indexing for R. This means that in R, indexing starts with 1 and in Python, indexing starts with 0. Coming from R, this means you have to subtract your "R indexes" by 1 to get the correct index in Python.

Other major differences in Python:

### Whitespace

Important in Python. In R, expressions are grouped into a code block with `{}`. In Python, expressions are grouped by indentation level.

For example, in R, an if statement looks like:

```{r}
x <- 1
if (x > 0) {
print("x is positive")
} else {
print("x is negative")
}
```

In Python, the equivalent if statement looks like:

```python
x = 1

if x > 0:
print("x is positive")
else:
print("x is negative")
```


### Data Structures

There are 4 different data storage formats, or data structures, in Python: lists, tuples, dictionaries, and sets

#### Lists

Python lists are created using brackets `[]`. You can add elements to the list through the `append()` method.

```python
x = [1, 2, 3]
x.append(4) # add 4 to the end of list

print("x is", x)
#> x is [1, 2, 3, 4]
```


You can index into lists with integers using brackets `[]`, but note that indexing is 0-based.

```python
x = [1, 2, 3]

x[0]
#> 1
x[1]
#> 2
x[2]
#> 3
```


Negative numbers count from the end of the list.

```python
x = [1, 2, 3]

x[-1]
#> 3
x[-2]
#> 2
x[-3]
#> 1
```

You can slice ranges of lists using the : inside brackets. Note that the slice syntax is not inclusive of the end of the slice range.

```python
x = [1, 2, 3, 4, 5, 6]
x[0:2] # get items at index positions 0, 1
#> [1, 2]
x[1:] # get items from index position 1 to the end
#> [2, 3, 4, 5, 6]
x[:-2] # get items from beginning up to the 2nd to last.
#> [1, 2, 3, 4]
x[:] # get all the items
#> [1, 2, 3, 4, 5, 6]
```


#### Tuples

Tuples behave like lists, but are constructed using `()`, instead of `[]`.

```python
x = (1, 2) # tuple of length 2
type(x)
#> <class 'tuple'>
len(x)
#> 2
x
#> (1, 2)

x = (1,) # tuple of length 1
type(x)
#> <class 'tuple'>
len(x)
#> 1
x
#> (1,)

x = 1, 2 # also a tuple
type(x)
#> <class 'tuple'>
len(x)
#> 2

x = 1, # beware a single trailing comma! This is a tuple!
type(x)
#> <class 'tuple'>
len(x)
#> 1
```

#### Dictionaries

Dictionaries are data structures where you can retrieve items by name. They can be created using syntax like `{key: value}`.

```python
d = {"key1": 1,
"key2": 2}

d["key1"]
#> 1
d["key3"] = 3
d
#> {'key1': 1, 'key2': 2, 'key3': 3}
```

#### Sets

Sets are used to track unique items, and can be constructed using `{val1, val2}`.

```python
s = {1, 2, 3}

type(s)
#> <class 'set'>
s
#> {1, 2, 3}
```

### Iteration with for loops

The `for` statement in Python is similar to the `for` loop in R. It can be used to iterate over any kind of data structure.

```python
for x in [1, 2, 3]:
print(x)
#> 1
#> 2
#> 3
```

### Functions

Python functions are defined with the `def` statement. The syntax for specifying function arguments and default values is very similar to R.

```python
def my_function(name = "World"):
print("Hello", name)

my_function()
#> Hello World
my_function("Friend")
#> Hello Friend
```

The equivalent R code would be

```{r}
my_function <- function(name = "World") {
cat("Hello", name, "\n")
}
my_function()
#> Hello World
my_function("Friend")
#> Hello Friend
```


### Importing modules

In R, authors can bundle their code into R packages, and R users can access objects from R packages via `library()` or `::`. In Python, authors bundle code into modules, and users access modules using `import`.

```python
import numpy
```

Once loaded, you can access symbols from the module using `.`, which is equivalent to `::` in R.

```python
numpy.abs(-1)
```

There is special syntax for conveniently bounding a module to a symbol upon importing.

```python
import numpy # import
import numpy as np # import and bind to a custom symbol `np`

from numpy import abs # import only `numpy.abs`
from numpy import abs as abs2 # import only `numpy.abs`, bind it to `abs2`
```

### Learning More

If you want to learn more, browse the [official documentation for Python](https://docs.Python.org/3/).

### References

- https://rstudio.github.io/reticulate/articles/python_primer.html
- https://www.youtube.com/watch?v=m_MQYyJpIjg

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
20 changes: 10 additions & 10 deletions _bookdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,19 @@ repo: https://github.com/datatrail-jhu/python
rmd_files:
- index.Rmd
- 00-demo.md
- 01-intro.Rmd
- 02-set-up.Rmd
- 03-intro-to-python.Rmd
- 04-getting-data.Rmd
- 01-intro.md
- 02-set-up.md
- 03-intro-to-python.md
- 04-getting-data.md
- 05-cleaning-data_01-intro.md
- 05-cleaning-data_02-reshaping-data.md
- 05-cleaning-data_03-tidying-data.md
- 06-plotting-data.Rmd
- 07-getting-stats.Rmd
- 08-scripting-python.Rmd
- 09-more-python.Rmd
- About.Rmd
- References.Rmd
- 06-plotting-data.md
- 07-getting-stats.md
- 08-scripting-python.md
- 09-more-python.md
- About.md
- References.md
new_session: yes
bibliography:
- book.bib
Expand Down
Loading

0 comments on commit e3e4cd2

Please sign in to comment.