Please report any issues at https://github.com/determined-ai/determined/issues.
TODO
Determined can be developed and run on both Linux and macOS (Linux is strongly recommended for production deployments). Determined has been tested with Ubuntu 16.04 LTS, Ubuntu 18.04 LTS, Arch Linux, CentOS 7, and macOS. Ubuntu is recommended; on AWS, a good AMI to use is a recent version of "Deep Learning Base AMI (Ubuntu)".
Start by cloning the Determined repo:
git clone [email protected]:determined-ai/determined.git
To install OS-level dependencies, run the appropriate one of the scripts below from within your clone of the repository.
scripts/setup-env-ubuntu.sh
scripts/setup-env-centos.sh
scripts/setup-env-arch.sh
scripts/setup-env-macos.sh
python3.6 -m venv ~/.virtualenvs/determined
. ~/.virtualenvs/determined/bin/activate
make all
In the future, ensure that you activate the virtualenv (by running the
activate
command above) whenever you want to interact with Determined. Tools
such as virtualenvwrapper
or direnv may help streamline the process.
det-deploy
is a tool that we provide to automate the process of deploying
Determined in Docker containers. See the
documentation
for more details.
# Set up a local Docker Compose cluster. This will automatically tear down an
# existing cluster if there is one.
det-deploy local fixture-up
# Watch stdout/stderr.
det-deploy local logs
# Edit code.
...
# Update Docker images and restart the cluster.
make build-docker
det-deploy local fixture-up
# Tear down the cluster.
det-deploy local fixture-down
Running the parts of a Determined cluster individually can help speed up iteration during development. A minimal cluster consists of four services: a PostgreSQL database, a Hasura server, a Determined master, and a Determined agent.
# Create a separate Docker network for Determined.
docker network create determined
# Start PostgreSQL.
docker run --rm --network determined --name determined-db \
-p 127.0.0.1:5432:5432 \
-e POSTGRES_DB=determined \
-e POSTGRES_PASSWORD=my-postgres-password \
postgres:10
# Start Hasura.
docker run --rm --network determined --name determined-graphql \
-p 127.0.0.1:8081:8080 \
-e HASURA_GRAPHQL_DATABASE_URL=postgres://postgres:my-postgres-password@determined-db:5432/determined \
-e HASURA_GRAPHQL_ADMIN_SECRET=my-hasura-secret \
-e HASURA_GRAPHQL_ENABLE_CONSOLE=true \
-e HASURA_GRAPHQL_ENABLE_TELEMETRY=false \
-e HASURA_GRAPHQL_CONSOLE_ASSETS_DIR=/srv/console-assets \
hasura/graphql-engine:v1.1.0
# Start the master.
make -C master install-native
determined-master \
--db-host localhost --db-name determined --db-port 5432 --db-user postgres --db-password my-postgres-password \
--hasura-address localhost:8081 --hasura-secret=my-hasura-secret \
--root build/share/determined/master
# Start the agent.
make -C agent install-native
determined-agent run --master-host localhost --master-port 8080
After following either set of instructions above, the WebUI will be available at
http://localhost:8080. You can also use our command-line tool, det
, to
interact with Determined. For example, det slot list
should print out a line
for each GPU on your machine, if you have any, or a line for your CPU, if not.
For more information, see the reference
documentation.
The examples/official/mnist_pytorch
directory contains code to train a convnet
on MNIST using PyTorch. To train a model,
run
det experiment create <config> examples/official/mnist_pytorch/
where <config>
can be
examples/official/mnist_pytorch/const.yaml
to train a single model with fixed hyperparametersexamples/official/mnist_pytorch/adaptive.yaml
to train multiple models using an adaptive hyperparameter search algorithm
Determined also supports several other hyperparameter search methods.
After starting a model, you can check on its progress using the WebUI
or the CLI command det experiment list
.
Run make check
.
To add a commit message template and a commit-time hook to help you follow our
commit message guidelines, you can also run scripts/configure-repo.sh
(which
does not need to be done repeatedly).
Run make test
.
For cloud integration tests, AWS and GCP credentials must be configured.
# Run local integration tests except for cloud-related tests.
make test-integrations
# Run cloud integration tests.
make test-cloud-integrations
By default, the master process is exposed on port 8081 of the host machine. To change the master port, run
make test-integrations INTEGRATIONS_HOST_PORT=<PORT>
If you want to run the integration tests on GPUs, change the default Docker
container runtime to
nvidia
.
This project uses pip-compile
for
pinning dependencies' versions.
To add a dependency, edit setup.py
or an appropriate .in
file and then run
make pin-deps
. The pip-compile
tool will then generate the appropriate
pinned dependencies in requirements.txt
files throughout the repo.
To update all dependencies, run make upgrade-deps
.
See Releases for cutting new releases.
To connect directly to the Determined metadata database, run this command from the Determined master host:
docker run -it --rm \
--network determined \
-e PGPASSWORD=my-postgres-password \
postgres:10 psql -h determined-db -U postgres -d determined
go tool pprof http://master-ip:port # for CPU samples
go tool pprof http://master-ip:port/debug/pprof/heap # for heap samples
go tool pprof -http :8081 ~/pprof/sample-file
To use Determined with GPUs, the Nvidia CUDA drivers (>= 384.81) and nvidia-docker2 must be installed.
To verify that your system can run containers that use GPUs, try:
docker run --runtime=nvidia --rm nvidia/cuda:10.0-cudnn7-runtime-ubuntu16.04 nvidia-smi
If this command displays one or more GPUs, the Determined agent should automatically detect the system's GPUs and make them available for running experiments.