Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slim versions of TFX Docker images #6921

Open
axeltidemann opened this issue Oct 1, 2024 · 8 comments
Open

Slim versions of TFX Docker images #6921

axeltidemann opened this issue Oct 1, 2024 · 8 comments
Assignees

Comments

@axeltidemann
Copy link
Contributor

axeltidemann commented Oct 1, 2024

Could we have Docker images that are slimmer? Some examples of TFX Docker image sizes (compressed, even):

TFX 1.0: 5.67GB
TFX 1.5: 6.65GB
TFX 1.10: 8.53GB
TFX 1.15: 11.4GB

At least an explanation why the image sizes keep on growing would be great.

Or is the recommended way to build a Docker image yourself off a slim Python or Ubuntu image?

@axeltidemann
Copy link
Contributor Author

If that TFX image is based on the latest (1.16dev) image, then that is quite a saving, almost half. Interesting. Did you find it hard to build, @pritamdodeja ?

And: justbeambieber is for sure the funniest name I've seen for a Docker image, ever!

@stefandominicus-takealot

One reason (of many) that these large images are problematic is that GCP DataFlow jobs take forever to spin up new workers - anywhere from 15 to 30 minutes, in my experience!

I initially thought this might be due to lengthy dependency installation on worker startup (as described here), but I've confirmed that my dependencies are pre-installed in my custom docker image (based on tensorflow/tfx:1.15.1), and I am setting the --sdk_container_image={PIPELINE_IMAGE_URI} beam argument correctly.

DataFlow system logs confirm the long duration of the image pull:

Pulled image "<redacted>" with image id "sha256:<redacted>", repo tag "<redacted>", repo digest "<redacted>@sha256:<redacted>", size "12284987784" in 21m22.272733249s"

^ This happens for every worker that DataFlow spins up, which makes these jobs very slow to scale.

Anything that can be done to reduce the size of this 12.3GB image would be super helpful!

@janasangeetha
Copy link
Contributor

Dear Users,
Thank you for showing interest. We want to inform you that we are currently not picking this activity. We truly appreciate your contribution to help us improve. Your input is valuable!
Thank you!

Copy link
Contributor

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale label Dec 15, 2024
@kholofelo-phaahlamohlaka

Hello! 👋

We've created a Docker image that significantly reduces the size compared to standard TFX Docker images.
Image Size:

  • Compressed: 1.2 GB
  • Decompressed: 3.7 GB

It has been tested successfully on Vertex-ai pipeline.

Here is the Dockerfile:

FROM python:3.10

# see https://pypi.org/project/tfx/ for package compatibility
RUN pip install --upgrade --no-cache-dir pip \
    && pip install --upgrade --no-cache-dir apache-beam[gcp]==2.56.0 \
    && pip install --upgrade --no-cache-dir tfx[kfp]==1.15.1

# Copy files from the official Apache Beam SDK image
# see https://cloud.google.com/dataflow/docs/guides/build-container-image for more information
COPY --from=apache/beam_python3.10_sdk:2.56.0 /opt/apache/beam /opt/apache/beam

# Set the entrypoint to Apache Beam SDK launcher
ENTRYPOINT ["/opt/apache/beam/boot"]

@axeltidemann
Copy link
Contributor Author

Excellent work, @KholofeloPhahlamohlaka-TAL ! (Full disclosure: we work together, but I think he deserves praise on the world wide web as well!)

@janasangeetha
Copy link
Contributor

Hi @KholofeloPhahlamohlaka-TAL
We appreciate your efforts in building the slimmer version of TFX docker image. I will check with the team internally on next steps and provide an update here.
Thank you!

@adriangay
Copy link

We would be very interested in anyone's experience building TFX images with Nvidia GPU support. In our experience, this can easily double the size, and it's not easy to get TF/TFX/CUDA etc versions to align and be 'found' by TF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants
@axeltidemann @adriangay @stefandominicus-takealot @kholofelo-phaahlamohlaka @janasangeetha and others