Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile change #11

Open
wilberh opened this issue Sep 27, 2023 · 3 comments
Open

Dockerfile change #11

wilberh opened this issue Sep 27, 2023 · 3 comments

Comments

@wilberh
Copy link

wilberh commented Sep 27, 2023

Had to do 2 local changes listed below in the Dockerfile to make it work. Only the first time it took long to create the image because it was downloading the jupyter/pyspark-notebook base image(s) and all those spacy packages. I could be wrong on this but noticed it used at least 40GB of my local drive (that included me trying to find the correct tag for the base image), in order to produce a 12.9GB docker image.

Dockerfile changes:

  • had to set a specific Python3.8 version
  • added an ENTRYPOINT using "jupyter-lab"

Also, created a docker-compose file to simplify the cli-command [ docker compose up -d --build ] to build and (re)deploy/run the image.

# Based on the Dockerfiles from the Jupyter Development Team which 
# are Copyright (c) Jupyter Development Team and distributed under 
# the terms of the Modified BSD License.
ARG OWNER=jupyter
ARG BASE_CONTAINER=$OWNER/pyspark-notebook:python-3.8
FROM $BASE_CONTAINER

LABEL maintainer="Paul Deitel <[email protected]>"

# Fix: https://github.com/hadolint/hadolint/wiki/DL4006
# Fix: https://github.com/koalaman/shellcheck/wiki/SC3014
SHELL ["/bin/bash", "-o", "pipefail", "-c"]

RUN mamba install --yes \
    'dnspython' \
    'folium' \
    'geopy' \
    'imageio' \
    'nltk'  \
    'pymongo' \
    'scikit-learn' \
    'spacy' \
    'tweepy' 
     
RUN pip install --upgrade \
    'tensorflow' \
    'openai' \
    'beautifulsoup4' \
    'deepl' \
    'mastodon.py' \
    'better_profanity'  \
    'tweet-preprocessor' \
    'ibm-watson' \
    'pubnub' \
    'textblob' \
    'wordcloud' \
    'dweepy' \
    'sounddevice'
    

# download data required by textblob and spacy
RUN python -m textblob.download_corpora && \
    python -m spacy download en_core_web_sm && \
    python -m spacy download en_core_web_md && \
    python -m spacy download en_core_web_lg 

# clean up
RUN mamba clean --all -f -y && \
    fix-permissions "${CONDA_DIR}" && \
    fix-permissions "/home/${NB_USER}"

ENTRYPOINT ["start.sh", "jupyter-lab"]

Docker compose file:

version: "3"

services:
  deitelpydsft:
    container_name: deitelpydsft
    user: root
    volumes:
      - .:/home/jovyan/work
    build: .
    restart: always
    # env_file: .env
    ports:
      - "8888:8888"
      - "4040:4040"
@oppiet30
Copy link

oppiet30 commented May 25, 2024

wilberh: the version tag in your docker-compose.yml file is no longer needed. It has been deprecated.

@oppiet30
Copy link

I get this error.

C:\Users\Administrator\Desktop\Python\PythonDataScienceFullThrottle>docker build -t deitelpydsft
ERROR: "docker buildx build" requires exactly 1 argument.
See 'docker buildx build --help'.

Usage: docker buildx build [OPTIONS] PATH | URL | -

Start a build

C:\Users\Administrator\Desktop\Python\PythonDataScienceFullThrottle>

@wilberh
Copy link
Author

wilberh commented May 29, 2024

oppiet30: add a period at the end ===>> docker build -t deitelpydsft .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants