Here are some notes on the project.
To incorporate necessary Python dependencies for our project, we need to ADD
(or COPY
) the Pipfile
. The Dockerfile is placed at the root of the project because:
The
<src>
path must be inside the context of the build; you cannot ADD ../something /something, because the first step of a docker build is to send the context directory (and subdirectories) to the docker daemon. Dockerfile ADD documentation
In essence, the Dockerfile needs to be at least at the same level as the Pipfile.
As of March 2nd, 2023, DocumentDB is compatible with MongoDB 4.0. Using MongoDB 4.0 in the development environment is advantageous for several reasons:
- Since these versions are compatible, we can use MongoDB to mimic DocumentDB for local development.
- There's no official Docker image for DocumentDB.
- The AWS Glue Lib Docker image comes with a Spark version that does not support the latest mongodb-driver-sync versions.
The provided traceback indicates the issue:
Traceback (most recent call last):
...
py4j.protocol.Py4JJavaError: An error occurred while calling o90.save.
: java.lang.NoClassDefFoundError: com/mongodb/internal/connection/InternalConnectionPoolSettings
...
The Spark version in the AWS Glue Lib Docker image relies on mongodb-driver-sync-3.10.2.jar
. The version mongodb-driver-sync-4.7.2.jar
, provided, introduces breaking changes, one of which is visible in this stack trace.