Skip to content

Latest commit

 

History

History
127 lines (90 loc) · 5.1 KB

DEVELOPMENT.md

File metadata and controls

127 lines (90 loc) · 5.1 KB

Development

This document describes how to set up a development environment for developing, building, and testing the SageMaker Spark Container image.

Development Environment Setup

You’ll need to have python, pytest, docker, and docker-compose installed on your machine and on your $PATH.

This repository uses GNU make to run build targets specified in Makefile. Consult the Makefile for the full list of build targets.

Pulling Down the Code

  1. If you do not already have one, create a GitHub account by following the prompts at Join Github.
  2. Create a fork of this repository on GitHub. You should end up with a fork at https://github.com/<username>/sagemaker-spark-container.
    1. Follow the instructions at Fork a Repo to fork a GitHub repository.
  3. Clone your fork of the repository: git clone https://github.com/<username>/sagemaker-spark-container where <username> is your github username.

Setting Up The Development Environment

  1. To set up your python environment, we recommend creating and activating a virtual environment using venv, which is part of the Python standard library for python3+:
python3 -m venv .venv
source .venv/bin/activate

You may want to activate the Python environment in your .bashrc or .zshrc.

  1. Then install pytest into the virtual environment:

python -m pip install pytest pytest-parallel

  1. Ensure docker is installed (see: Get Docker | Docker Documentation)

docker will be used to build and test the Spark container locally

  1. Ensure you have access to an AWS account i.e. setup your environment such that awscli can access your account via either an IAM user or an IAM role. We recommend an IAM role for use with AWS. For the purposes of testing in your personal account, the following managed permissions should suffice:

-- AmazonSageMakerFullAccess
-- AmazonS3FullAccess
-- AmazonEC2ContainerRegistryFullAccess

  1. Create an ECR repository with the name "sagemaker-spark" in the us-west-2 region

  2. Setup required environment variables for the container build:

export AWS_ACCOUNT_ID=<YOUR_ACCOUNT_ID>
export REGION=us-west-2
export SPARK_REPOSITORY=sagemaker-spark
export VERSION=latest
export SAGEMAKER_ROLE=<YOUR_SAGEMAKER_ROLE>

Building Scala Test Dependencies

Compiling Scala test JARs requires SBT installed on your system.

Mac users can easily install using Homebrew: brew install sbt

For more info see https://www.scala-sbt.org/1.x/docs/Setup.html

To compile the Scala test JAR: make build-test-scala

Building Java Test Dependencies

Compiling Java test JARs requires Maven installed on your system.

Mac users can easily install using Homebrew: brew install maven

For more info see https://maven.apache.org/install.html

To compile the Java test JAR: make build-test-java

Building Your Image

  1. To build the container image, run the following command:
make build

Upon successful build, you will see two tags applied to the image. For example:

Successfully tagged sagemaker-spark:2.4-cpu-py37-v0.1
Successfully tagged sagemaker-spark:latest
  1. To verify that the image is available in your local docker repository, run docker images. You should see an image with two tags. For example:
✗ docker images
REPOSITORY                                                                             TAG                   IMAGE ID            CREATED             SIZE
sagemaker-spark                                                                        2.4-cpu-py37-v0.1     a748a6e042d2        5 minutes ago        3.06GB
sagemaker-spark                                                                        latest                a748a6e042d2        5 minutes ago        3.06GB

Running Local Tests

To run local tests (unit tests and local container tests using docker compose), run the following command:

make test-local

Running SageMaker Tests

To run tests against Amazon SageMaker using your newly built container image, first publish the image to your ECR repository.

  1. Bootstrap docker credentials for your repository
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.us-west-2.amazonaws.com
  1. Tag the latest Spark image
docker tag $SPARK_REPOSITORY_NAME:latest $AWS_ACCOUNT_ID.dkr.ecr.us-west-2.amazonaws.com/$SPARK_REPOSITORY:$VERSION
  1. Push the latest Spark image to your ECR repository
docker push $AWS_ACCOUNT_ID.dkr.ecr.us-west-2.amazonaws.com/$SPARK_REPOSITORY:$VERSION
  1. Run the SageMaker tests
make test-sagemaker