yellowcab is a Python package that provides a broad range of functions to work with the records of taxi rides in New York City. Specifically, it is aimed to work with a dataset of taxi rides obtained from nyc.gov. The project is mostly looking at the year 2020 with a strong focus on New York's most populous borough Brooklyn. Still, much of the functionality is easily reusable for other boroughs and years.
Here are just a few of the things that yellowcab does well:
- Filtering the data in various ways to create a clean dataset, e.g. outliers or zero values
- Adding various columns, e.g. regarding time, location or weather
- Creating plots to visualize distributions and other statistics
- Creating plots to further analyse the Covid-19 pandemic of 2020 and its heavy influence on mobility
- Creating heatmaps, e.g. to visualize traffic in specific service zones
- Offering functions for feature engineering, e.g. correlation analyses
- Offering trained ML models to predict e.g. the distance of a trip, the amount of fares, the type of payment and the upcoming needed capacity
The Documentation can be found in the accompanying GitHub Wiki.
Ensure that you have python 3.6 or higher installed. Furthermore, ensure that you have a package installer in the latest released version, e.g. pip.
The source code is currently hosted on GitHub at: https://github.com/lesar64/pds_brooklyn
Clone the repository and navigate to the folder \pds_brooklyn. Now execute:
# Execute
pip install -e .
# or
pip install yellowcab -e .
Sometimes, errors occur while automatically installing dependencies. The following tipps might help:
- Create a new environment by
conda create --name myenv python=3.8
with replacingmyenv
with a new name. You can check your python version withpython -V
. - Activate the environment by
conda activate myenv
with replacingmyenv
with the name. - Ensure that the latest version of pip is installed.
E.g. by executing
conda install pip
- This Stackoverflow article
can help for problems with installing contextily. Download the .whl files for the first 7 dependencies (cf. chapter Dependencies). The files
can for example be obtained here.
Ensure that you obtain the files for the right python version. Move the files to the
folder \pds_brooklyn.
Navigate to the folder in your Anaconda prompt and install the dependencies with pip
in exactly this order (starting with GDAL, ending with rasterio), each by
pip install filename
with replacingfilename
by the name of the file. Executepip install contextily
afterwards and againpip install .
(in the folder \pds_brooklyn) - Pay attention to trust a notebook, before executing it (setting in the top right corner).
- Sometimes, package installation errors can be bypassed by instead executing
conda install -c conda-forge package
to install a specific package. - For Windows, pipwin seems to be able to solve issues.
- Especially 3 known problems with:
- Fiona (GDAL API bug)
- rasterio (GDAL API bug)
- cartopy (cython version bug)
- Run the following lines of code in your Anaconda prompt if you have problems installing geopandas:
- Especially 3 known problems with:
pip install wheel
pip install pipwin
pipwin install numpy
pipwin install pandas
pipwin install shapely
pipwin install gdal
pipwin install fiona
pipwin install pyproj
pipwin install six
pipwin install rtree
pipwin install geopandas
pipwin install rasterio
pipwin install cython
pipwin install cartopy
- NumPy
- pandas
- Shapely
- GDAL
- Fiona
- proj
- PyProj
- six
- rtree
- geopandas
- cython
- Cartopy
- rasterio
- contextily
- datetime
- seaborn
- folium
- matplotlib
- openpyxl
- pyarrow
- setuptools
- scipy
- click
- sklearn
Work on yellowcab
started at the Faculty of Management,
Economics, and Social Sciences at the University of Cologne.
A team of five graduate students worked under supervision of
Philipp Kienscherf within the scope of the module "Programming
Data Science".
A report of the project with more detailed descriptions and its most interesting findings can be found here and was also upload to Ilias by Steffen Weißhaar.