This repo contains Makefile and base terraform folders and jinja2 files to fit the standard pattern. This repo is a base to create new Terraform repos, renaming the template files and adding the githooks submodule, making the repo ready for use.
Running aviator will create the pipeline required on the AWS-Concourse instance, in order pass a mandatory CI ran status check. this will likely require you to login to Concourse, if you haven't already.
After cloning this repo, please generate terraform.tf
and terraform.tfvars
files:
make bootstrap
In addition, you may want to do the following:
-
Create non-default Terraform workspaces as and if required:
make terraform-workspace-new workspace=<workspace_name>
e.g.
make terraform-workspace-new workspace=qa
-
Configure Concourse CI pipeline:
- Add/remove jobs in
./ci/jobs
as required - Create CI pipeline:
aviator
- Add/remove jobs in
The data egress task is responsible for receiving messages from a SQS queue, retrieving a configuration DynamoDb item for the message and then sending files to a destination location (another S3 bucket or to disk).
- Data is uploaded to source s3 bucket
pipeline_success.flag
file is uploaded to same file path as data- New SQS item added on new
pipeline_success.flag
file upload with path to file as datasource - Egress service picks up jobs from SQS queue
- Egress service queries Dynamo to get what action needs to be taken with the data (set in
data-egress.tf
) - If
transfer_type
is SFT the data is copied to a local directory and picked up by the SFT Service- Prod environment: the data is sent to the corresponding data warehouse location
- Non-prod environment: the data is sent to the
stub-hdfs-***
bucket
- If
transfer_type
is S3 the data is sent to the corresponding S3 location
Row | Description |
---|---|
source_prefix | Partition key. The S3 path to retrieve files for |
pipeline_name | Sort key: The pipeline which sent the files |
decrypt | Whether the files need to be decrypted |
destination_bucket | The S3 bucket to send files to. Blank for SFT |
destination_prefix | The folder path to save files to |
recipient_name | Team name for the receiving files |
source_bucket | S3 bucket location of the files to send |
transfer_type | How to send the files, S3 or SFT |
If source data is required to be sent via S3 and SFT, append the transfer type to pipeline_name
pipeline_name#sft
Ensure the soure prefix is in data-egress_iam.tf
The SFT Agent reads files written to disk by the data egress service and sends these to configured destinations via HTTPS.
It is deployed as a sidecar to the data egress service, and a volume mounted to /data-egress is
shared between the containers.
Which files are read and where they are sent to is determined by config
In non production environments files are send to stub nifi, which is a container running nifi deployed by snapshot sender. This listens for files sent on port 8091 on path /DA. The files it receives will be saved to the s3 bucket
stub-hdfs-****
SFT sends files to a SDX F5 VIP. This receives the files and forwards them on. Authentication is established by TLS, the certificaties requried by SFT are defined here
These are created within the sft-agent entrypoint The agent config is updated with the keystore/ truststore passwords and paths. Importantly the private key password and keystore password have to be the same because SFT runs on a tomcat version with this 'requirement'
Due to the nature of data being transferred there is a requirement to have a specific type of encryption on our EBS volumes where we store some of the DWX data.
It is required that our EBS volumes are encrypted with an external CMK generated by the Security Operations team.
For this implementation we have created an external KMS key. We manually download the wrappers and tokens for each key and send the wrappers to the relevant people in the security operations team that use that wrapper to wrap a key generated via their HSM.
This wrapper is then manually uploaded alongside the token via breakglass to each environment.
All the manual steps are done in the AWS console in the KMS section.
Now that the key is uploaded, we can use this external KMS key to encrypt the EBS volumes.
Some more information is available in our common wiki