diff --git a/CHANGELOG.md b/CHANGELOG.md index 887e185..a94ab3a 100755 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,12 +5,29 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [1.2.0] - 2021-05-04 + +### Added + +- Two stack deployment options that provision machine learning (ML) pipelines either in a single AWS account, or across multiple AWS accounts for development, staging/test, and production environments. +- Ability to provide an optional AWS Key Management Service (KMS) key to encrypt captured data from the real-time Amazon SageMaker endpoint, output of batch transform and data baseline jobs, output of model monitor, and Amazon Elastic Compute Cloud (EC2) instance's volume used by Amazon SageMaker to run the solution's pipelines. +- New pipeline to build and register Docker images for custom ML algorithms. +- Ability to use an existing Amazon Elastic Container Registry (Amazon ECR) repository, or create a new one, to store Docker images for custom ML algorithms. +- Ability to provide different input/output Amazon Simple Storage Service (Amazon S3) buckets per pipeline deployment. + +### Updated + +- The creation of Amazon SageMaker resources using AWS CloudFormation. +- The request body of the solution's API calls to provision pipelines. +- AWS SDK to use the solution's identifier to track requests made by the solution to AWS services. +- AWS Cloud Development Kit (AWS CDK) and AWS Solutions Constructs to version 1.96.0. + ## [1.1.1] - 2021-03-19 ### Updated - AWS ECR image scan on push property's name from `scanOnPush` to `ScanOnPush` for image scanning based on the recently updated property name in AWS CloudFormation. -- AWS ECR repository's name in the IAM policy's resource name from `*` to `*-*` to accommodate recent repository name being prefixed with AWS CloudFormation stack name. +- AWS ECR repository's name in the IAM policy's resource name from `*` to `**` to accommodate recent repository name being prefixed with AWS CloudFormation stack name. ## [1.1.0] - 2021-01-26 diff --git a/README.md b/README.md index 90a1867..4d5c1a2 100755 --- a/README.md +++ b/README.md @@ -2,38 +2,48 @@ The machine learning (ML) lifecycle is an iterative and repetitive process that involves changing models over time and learning from new data. As ML applications gain popularity, -organizations are building new and better applications for a wide range of use cases -including optimized email campaigns, forecasting tools, recommendation engines, self-driving -vehicles, virtual personal assistants, and more. While operational and pipelining -processes vary greatly across projects and organizations, the processes contain -commonalities across use cases. - -The AWS MLOps Framework solution helps you streamline and enforce architecture best -practices for machine learning (ML) model productionization. This solution is an extendable -framework that provides a standard interface for managing ML pipelines for AWS ML -services and third-party services. The solution’s template allows customers to upload their -trained models, configure the orchestration of the pipeline, trigger the start of the deployment -process, move models through different stages of deployment, and monitor the successes -and failures of the operations. - -You can use batch and real-time data inferences to configure the pipeline for your business -context. You can also provision multiple model monitor pipelines to periodically monitor the quality of deployed Amazon SageMaker's ML models. This solution increases your team’s agility and efficiency by allowing them to -repeat successful processes at scale. +organizations are building new and better applications for a wide range of use cases including +optimized email campaigns, forecasting tools, recommendation engines, self-driving vehicles, +virtual personal assistants, and more. While operational and pipelining processes vary greatly +across projects and organizations, the processes contain commonalities across use cases. + +The solution helps you streamline and enforce architecture best practices by providing an extendable +framework for managing ML pipelines for Amazon Machine Learning (Amazon ML) services and third-party +services. The solution’s template allows you to upload trained models, configure the orchestration of +the pipeline, initiate the start of the deployment process, move models through different stages of +deployment, and monitor the successes and failures of the operations. The solution also provides a +pipeline for building and registering Docker images for custom algorithms that can be used for model +deployment on an [Amazon SageMaker](https://aws.amazon.com/sagemaker/) endpoint. + +You can use batch and real-time data inferences to configure the pipeline for your business context. +You can also provision multiple Model Monitor pipelines to periodically monitor the quality of deployed +Amazon SageMaker ML models. This solution increases your team’s agility and efficiency by allowing them +to repeat successful processes at scale. + +#### Benefits + +- **Leverage a pre-configured machine learning pipeline:** Use the solution's reference architecture to initiate a pre-configured pipeline through an API call or a Git repository. +- **Automatically deploy a trained model and inference endpoint:** Use the solution's framework to automate the model monitor pipeline or the Amazon SageMaker BYOM pipeline. Deliver an inference endpoint with model drift detection packaged as a serverless microservice. --- ## Architecture -The AWS CloudFormation template deploys a Pipeline Provisioning framework that -provisions a machine learning pipeline (Bring Your Own Model for SageMaker). The -template includes the AWS Lambda functions and AWS Identity and Access Management -(IAM) roles necessary to set up your account, and it creates an Amazon Simple Storage -Service (Amazon S3) bucket that contains the CloudFormation templates that set up the -pipelines.The template also creates an Amazon API Gateway instance, an additional -Lambda function, and an AWS CodePipeline instance. -The provisioned pipeline includes four stages: source, build, deploy, and share. +This solution is built with two primary components: 1) the orchestrator component, created by deploying the solution’s AWS CloudFormation template, and 2) the AWS CodePipeline instance deployed from either calling the solution’s API Gateway, or by committing a configuration file into an AWS CodeCommit repository. The solution’s pipelines are implemented as AWS CloudFormation templates, which allows you to extend the solution and add custom pipelines. + +To support multiple use cases and business needs, the solution provides two AWS CloudFormation templates: **option 1** for single account deployment, and **option 2** for multi-account deployment. + +### Template option 1: Single account deployment + +The solution’s single account architecture allows you to provision ML pipelines in a single AWS account. + +![architecture-option-1](source/architecture-option-1.png) -![architecture](source/architecture.png) +### Template option 2: Multi-account deployment + +The solution uses [AWS Organizations](https://aws.amazon.com/organizations/) and [AWS CloudFormation StackSets](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/what-is-cfnstacksets.html) to allow you to provision or update ML pipelines across AWS accounts. Using an administrator account (also referred to as the orchestrator account) allows you to deploy ML pipelines implemented as AWS CloudFormation templates into selected target accounts (for example, development, staging, and production accounts). + +![architecture-option-2](source/architecture-option-2.png) --- @@ -117,7 +127,7 @@ aws s3 cp ./dist/ s3://my-bucket-name-/$SOLUTION_NAME/$VERSION/ --re ## Known Issues -### Pipeline may fail in custom model container build due to Docker Hub rate limits +### Image Builder Pipeline may fail due to Docker Hub rate limits When building custom model container that pulls public docker images from Docker Hub in short time period, you may occasionally face throttling errors with an error message such as: ` toomanyrequests You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit` @@ -126,6 +136,16 @@ This is due to Docker Inc. [limiting the rate at which images are pulled under D For more information regarding this issue and short-term and long-term fixes, refer to this AWS blog post: [Advice for customers dealing with Docker Hub rate limits, and a Coming Soon announcement](https://aws.amazon.com/blogs/containers/advice-for-customers-dealing-with-docker-hub-rate-limits-and-a-coming-soon-announcement/) +### Model Monitor Blueprint may fail in multi-account deployment option + +When using the blueprint for Model Monitor pipeline in multi-account deployment option, the deployment of the stack in the staging ("DeployStaging") account may fail with an error message: + +``` +Resource handler returned message: "Error occurred during operation 'CREATE'." (RequestToken:, HandlerErrorCode: GeneralServiceException) +``` + +Workaround: there is no known workaround for this issue for the multi-account Model Monitor blueprint. + --- Copyright 2020-2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. diff --git a/deployment/build-s3-dist.sh b/deployment/build-s3-dist.sh index cd0627e..c27c14f 100755 --- a/deployment/build-s3-dist.sh +++ b/deployment/build-s3-dist.sh @@ -28,7 +28,7 @@ set -e # Important: CDK global version number -cdk_version=1.83.0 +cdk_version=1.96.0 # Check to see if the required parameters have been provided: if [ -z "$1" ] || [ -z "$2" ] || [ -z "$3" ]; then @@ -86,6 +86,10 @@ pip install -r ./lambdas/solution_helper/requirements.txt -t ./lambdas/solution_ echo "pip install -r ./lib/blueprints/byom/lambdas/sagemaker_layer/requirements.txt -t ./lib/blueprints/byom/lambdas/sagemaker_layer/python/" pip install -r ./lib/blueprints/byom/lambdas/sagemaker_layer/requirements.txt -t ./lib/blueprints/byom/lambdas/sagemaker_layer/python/ +# setup crhelper for invoke lambda custom resource +echo "pip install -r ./lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/requirements.txt -t ./lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/" +pip install -r ./lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/requirements.txt -t ./lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/ + echo "------------------------------------------------------------------------------" echo "[Init] Install dependencies for the cdk-solution-helper" echo "------------------------------------------------------------------------------" @@ -106,32 +110,39 @@ echo "npm install -g aws-cdk@$cdk_version" npm install -g aws-cdk@$cdk_version #Run 'cdk synth for BYOM blueprints -echo "cdk synth BYOMRealtimeBuiltinStack > lib/blueprints/byom/byom_realtime_builtin_container.yaml" -cdk synth BYOMRealtimeBuiltinStack > lib/blueprints/byom/byom_realtime_builtin_container.yaml -echo "cdk synth BYOMRealtimeBuildStack > lib/blueprints/byom/byom_realtime_build_container.yaml" -cdk synth BYOMRealtimeBuildStack > lib/blueprints/byom/byom_realtime_build_container.yaml -echo "cdk synth BYOMBatchBuiltinStack > lib/blueprints/byom/byom_batch_builtin_container.yaml" -cdk synth BYOMBatchBuiltinStack > lib/blueprints/byom/byom_batch_builtin_container.yaml -echo "cdk synth BYOMBatchBuildStack > lib/blueprints/byom/byom_batch_build_container.yaml" -cdk synth BYOMBatchBuildStack > lib/blueprints/byom/byom_batch_build_container.yaml -echo "cdk synth ModelMonitorStack > lib/blueprints/byom/model_monitor.yaml" -cdk synth ModelMonitorStack > lib/blueprints/byom/model_monitor.yaml +echo "cdk synth ModelMonitorStack > lib/blueprints/byom/byom_model_monitor.yaml" +cdk synth ModelMonitorStack > lib/blueprints/byom/byom_model_monitor.yaml +echo "cdk synth SingleAccountCodePipelineStack > lib/blueprints/byom/single_account_codepipeline.yaml" +cdk synth SingleAccountCodePipelineStack > lib/blueprints/byom/single_account_codepipeline.yaml +echo "cdk synth MultiAccountCodePipelineStack > lib/blueprints/byom/multi_account_codepipeline.yaml" +cdk synth MultiAccountCodePipelineStack > lib/blueprints/byom/multi_account_codepipeline.yaml +echo "cdk synth BYOMRealtimePipelineStack > lib/blueprints/byom/byom_realtime_inference_pipeline.yaml" +cdk synth BYOMRealtimePipelineStack > lib/blueprints/byom/byom_realtime_inference_pipeline.yaml +echo "cdk synth BYOMCustomAlgorithmImageBuilderStack > lib/blueprints/byom/byom_custom_algorithm_image_builder.yaml" +cdk synth BYOMCustomAlgorithmImageBuilderStack > lib/blueprints/byom/byom_custom_algorithm_image_builder.yaml +echo "cdk synth BYOMBatchStack > lib/blueprints/byom/byom_batch_pipeline.yaml" +cdk synth BYOMBatchStack > lib/blueprints/byom/byom_batch_pipeline.yaml + # Replace %%VERSION%% in other templates replace="s/%%VERSION%%/$3/g" -echo "sed -i -e $replace lib/blueprints/byom/byom_realtime_builtin_container.yaml" -sed -i -e $replace lib/blueprints/byom/byom_realtime_builtin_container.yaml -echo "sed -i -e $replace lib/blueprints/byom/byom_realtime_build_container.yaml" -sed -i -e $replace lib/blueprints/byom/byom_realtime_build_container.yaml -echo "sed -i -e $replace lib/blueprints/byom/byom_batch_builtin_container.yaml" -sed -i -e $replace lib/blueprints/byom/byom_batch_builtin_container.yaml -echo "sed -i -e $replace lib/blueprints/byom/byom_batch_build_container.yaml" -sed -i -e $replace lib/blueprints/byom/byom_batch_build_container.yaml -echo "sed -i -e $replace lib/blueprints/byom/model_monitor.yaml" -sed -i -e $replace lib/blueprints/byom/model_monitor.yaml - -# Run 'cdk synth' for main template to generate raw solution outputs -echo "cdk synth aws-mlops-framework --output=$staging_dist_dir" -cdk synth aws-mlops-framework --output=$staging_dist_dir +echo "sed -i -e $replace lib/blueprints/byom/byom_model_monitor.yaml" +sed -i -e $replace lib/blueprints/byom/byom_model_monitor.yaml +echo "sed -i -e $replace lib/blueprints/byom/byom_realtime_inference_pipeline.yaml" +sed -i -e $replace lib/blueprints/byom/byom_realtime_inference_pipeline.yaml +echo "sed -i -e $replace lib/blueprints/byom/single_account_codepipeline.yaml" +sed -i -e $replace lib/blueprints/byom/single_account_codepipeline.yaml +echo "sed -i -e $replace lib/blueprints/byom/multi_account_codepipeline.yaml" +sed -i -e $replace lib/blueprints/byom/multi_account_codepipeline.yaml +echo "sed -i -e $replace lib/blueprints/byom/byom_custom_algorithm_image_builder.yaml" +sed -i -e $replace lib/blueprints/byom/byom_custom_algorithm_image_builder.yaml +echo "sed -i -e $replace lib/blueprints/byom/byom_batch_pipeline.yaml" +sed -i -e $replace lib/blueprints/byom/byom_batch_pipeline.yaml + +# Run 'cdk synth' for main templates to generate raw solution outputs +echo "cdk synth aws-mlops-single-account-framework --output=$staging_dist_dir" +cdk synth aws-mlops-single-account-framework --output=$staging_dist_dir +echo "cdk synth aws-mlops-multi-account-framework --output=$staging_dist_dir" +cdk synth aws-mlops-multi-account-framework --output=$staging_dist_dir # Remove unnecessary output files echo "cd $staging_dist_dir" @@ -171,14 +182,20 @@ cd $template_dist_dir echo "Updating code source bucket in template with $1" replace="s/%%BUCKET_NAME%%/$1/g" -echo "sed -i -e $replace $template_dist_dir/aws-mlops-framework.template" -sed -i -e $replace $template_dist_dir/aws-mlops-framework.template +echo "sed -i -e $replace $template_dist_dir/aws-mlops-single-account-framework.template" +sed -i -e $replace $template_dist_dir/aws-mlops-single-account-framework.template +echo "sed -i -e $replace $template_dist_dir/aws-mlops-multi-account-framework.template" +sed -i -e $replace $template_dist_dir/aws-mlops-multi-account-framework.template replace="s/%%SOLUTION_NAME%%/$2/g" -echo "sed -i -e $replace $template_dist_dir/aws-mlops-framework.template" -sed -i -e $replace $template_dist_dir/aws-mlops-framework.template +echo "sed -i -e $replace $template_dist_dir/aws-mlops-single-account-framework" +sed -i -e $replace $template_dist_dir/aws-mlops-single-account-framework.template +echo "sed -i -e $replace $template_dist_dir/aws-mlops-multi-account-framework.template" +sed -i -e $replace $template_dist_dir/aws-mlops-multi-account-framework.template replace="s/%%VERSION%%/$3/g" -echo "sed -i -e $replace $template_dist_dir/aws-mlops-framework.template" -sed -i -e $replace $template_dist_dir/aws-mlops-framework.template +echo "sed -i -e $replace $template_dist_dir/aws-mlops-single-account-framework.template" +sed -i -e $replace $template_dist_dir/aws-mlops-single-account-framework.template +echo "sed -i -e $replace $template_dist_dir/aws-mlops-multi-account-framework.template" +sed -i -e $replace $template_dist_dir/aws-mlops-multi-account-framework.template echo "------------------------------------------------------------------------------" diff --git a/source/app.py b/source/app.py index e3ac890..33807a1 100644 --- a/source/app.py +++ b/source/app.py @@ -1,6 +1,6 @@ #!/usr/bin/env python3 # ##################################################################################################################### -# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# Copyright 2020-2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # # # # Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # # with the License. A copy of the License is located at # @@ -13,53 +13,80 @@ # ##################################################################################################################### from aws_cdk import core from lib.aws_mlops_stack import MLOpsStack -from lib.blueprints.byom.byom_batch_build_container import BYOMBatchBuildStack -from lib.blueprints.byom.byom_batch_builtin_container import BYOMBatchBuiltinStack -from lib.blueprints.byom.byom_realtime_build_container import BYOMRealtimeBuildStack -from lib.blueprints.byom.byom_realtime_builtin_container import BYOMRealtimeBuiltinStack from lib.blueprints.byom.model_monitor import ModelMonitorStack +from lib.blueprints.byom.realtime_inference_pipeline import BYOMRealtimePipelineStack +from lib.blueprints.byom.byom_batch_pipeline import BYOMBatchStack +from lib.blueprints.byom.single_account_codepipeline import SingleAccountCodePipelineStack +from lib.blueprints.byom.multi_account_codepipeline import MultiAccountCodePipelineStack +from lib.blueprints.byom.byom_custom_algorithm_image_builder import BYOMCustomAlgorithmImageBuilderStack +from lib.aws_sdk_config_aspect import AwsSDKConfigAspect solution_id = "SO0136" app = core.App() -MLOpsStack(app, "aws-mlops-framework", description=f"({solution_id}) - AWS MLOps Framework. Version %%VERSION%%") -BYOMBatchBuildStack( - app, - "BYOMBatchBuildStack", - description=( - f"({solution_id}byom-bc) - Bring Your Own Model pipeline with Batch Transform and a custom " - f"model build in AWS MLOps Framework. Version %%VERSION%%" - ), +mlops_stack_single = MLOpsStack( + app, "aws-mlops-single-account-framework", description=f"({solution_id}) - AWS MLOps Framework. Version %%VERSION%%" ) -BYOMBatchBuiltinStack( + +# add AWS_SDK_USER_AGENT env variable to Lambda functions +core.Aspects.of(mlops_stack_single).add(AwsSDKConfigAspect(app, "SDKUserAgentSingle", solution_id)) + +mlops_stack_multi = MLOpsStack( app, - "BYOMBatchBuiltinStack", - description=( - f"({solution_id}byom-bb) - Bring Your Own Model pipeline with Batch Transform and a Built-in " - f"Sagemaker model in AWS MLOps Framework. Version %%VERSION%%" - ), + "aws-mlops-multi-account-framework", + multi_account=True, + description=f"({solution_id}) - AWS MLOps Framework. Version %%VERSION%%", ) -BYOMRealtimeBuildStack( + +core.Aspects.of(mlops_stack_multi).add(AwsSDKConfigAspect(app, "SDKUserAgentMulti", solution_id)) + +BYOMCustomAlgorithmImageBuilderStack( app, - "BYOMRealtimeBuildStack", + "BYOMCustomAlgorithmImageBuilderStack", description=( - f"({solution_id}byom-rc) - Bring Your Own Model pipeline with Realtime inference and a custom " - f"model build in AWS MLOps Framework. Version %%VERSION%%" + f"({solution_id}byom-caib) - Bring Your Own Model pipeline to build custom algorithm docker images" + f"in AWS MLOps Framework. Version %%VERSION%%" ), ) -BYOMRealtimeBuiltinStack( + +batch_stack = BYOMBatchStack( app, - "BYOMRealtimeBuiltinStack", + "BYOMBatchStack", description=( - f"({solution_id}byom-rb) - Bring Your Own Model pipeline with Realtime inference and a Built-in " - f"Sagemaker model in AWS MLOps Framework. Version %%VERSION%%" + f"({solution_id}byom-bt) - BYOM Batch Transform pipeline" f"in AWS MLOps Framework. Version %%VERSION%%" ), ) -ModelMonitorStack( +core.Aspects.of(batch_stack).add(AwsSDKConfigAspect(app, "SDKUserAgentBatch", solution_id)) + +model_monitor_stack = ModelMonitorStack( app, "ModelMonitorStack", description=(f"({solution_id}byom-mm) - Model Monitor pipeline. Version %%VERSION%%"), ) +core.Aspects.of(model_monitor_stack).add(AwsSDKConfigAspect(app, "SDKUserAgentMonitor", solution_id)) + + +realtime_stack = BYOMRealtimePipelineStack( + app, + "BYOMRealtimePipelineStack", + description=(f"({solution_id}byom-rip) - BYOM Realtime Inference Pipleline. Version %%VERSION%%"), +) + +core.Aspects.of(realtime_stack).add(AwsSDKConfigAspect(app, "SDKUserAgentRealtime", solution_id)) + +SingleAccountCodePipelineStack( + app, + "SingleAccountCodePipelineStack", + description=(f"({solution_id}byom-sac) - Single-account codepipeline. Version %%VERSION%%"), +) + +MultiAccountCodePipelineStack( + app, + "MultiAccountCodePipelineStack", + description=(f"({solution_id}byom-mac) - Multi-account codepipeline. Version %%VERSION%%"), +) + + app.synth() diff --git a/source/architecture-option-1.png b/source/architecture-option-1.png new file mode 100644 index 0000000..7c86dbd Binary files /dev/null and b/source/architecture-option-1.png differ diff --git a/source/architecture-option-2.png b/source/architecture-option-2.png new file mode 100644 index 0000000..1535d8f Binary files /dev/null and b/source/architecture-option-2.png differ diff --git a/source/architecture.png b/source/architecture.png deleted file mode 100644 index aea976d..0000000 Binary files a/source/architecture.png and /dev/null differ diff --git a/source/lib/blueprints/byom/lambdas/configure_inference_lambda/.coveragerc b/source/lambdas/custom_resource/.coveragerc similarity index 100% rename from source/lib/blueprints/byom/lambdas/configure_inference_lambda/.coveragerc rename to source/lambdas/custom_resource/.coveragerc diff --git a/source/lambdas/custom_resource/.gitignore b/source/lambdas/custom_resource/.gitignore deleted file mode 100755 index e10919c..0000000 --- a/source/lambdas/custom_resource/.gitignore +++ /dev/null @@ -1,5 +0,0 @@ -# exclude python 3rd party modules -*.dist-info/ -crhelper/ -## crhelper tests directory -tests/ diff --git a/source/lambdas/custom_resource/index.py b/source/lambdas/custom_resource/index.py index 1222ef0..f0dcd42 100644 --- a/source/lambdas/custom_resource/index.py +++ b/source/lambdas/custom_resource/index.py @@ -1,5 +1,5 @@ # ##################################################################################################################### -# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # # # # Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # # with the License. A copy of the License is located at # @@ -13,6 +13,7 @@ import os import sys import shutil +import tempfile import logging import traceback import urllib.request @@ -26,41 +27,54 @@ helper = CfnResource(json_logging=True, log_level="INFO") -def on_event(event, context): - helper(event, context) +def copy_assets_to_s3(s3_client): + # get the source and destination locations + source_url = os.environ.get("source_bucket") + "/blueprints.zip" + bucket = os.environ.get("destination_bucket") + base_dir = "blueprints" + # create a tmpdir for the zip file to downlaod + zip_tmpdir = tempfile.mkdtemp() + zip_file_path = os.path.join(zip_tmpdir, f"{base_dir}.zip") -@helper.create -def custom_resource(event, _): + # download blueprints.zip + urllib.request.urlretrieve(source_url, zip_file_path) - try: - # this line is downloading blueprints.zip file from a public bucket. - # if you would like to change this so that it downloads from a bucket in your account - # change this following line to use s3_client.download_fileobj('BUCKET_NAME', 'OBJECT_NAME', file) - # and give s3 read permission to this lambda function - source_url = os.environ.get("source_bucket") + "/blueprints.zip" - urllib.request.urlretrieve(source_url, "/tmp/blueprints.zip") - shutil.unpack_archive("/tmp/blueprints.zip", "/tmp/blueprints/", "zip") + # unpack the zip file in another tmp directory + unpack_tmpdir = tempfile.mkdtemp() + shutil.unpack_archive(zip_file_path, unpack_tmpdir, "zip") + + # construct the path to the unpacked file + local_directory = os.path.join(unpack_tmpdir, base_dir) + + # enumerate local files recursively + for root, dirs, files in os.walk(local_directory): + + for filename in files: - local_directory = "/tmp/blueprints" - bucket = os.environ.get("destination_bucket") - destination = "" + # construct the full local path + local_path = os.path.join(root, filename) - # enumerate local files recursively - for root, dirs, files in os.walk(local_directory): + # construct the full s3 path + relative_path = os.path.relpath(local_path, local_directory) + s3_path = os.path.join(base_dir, relative_path) + logger.info(f"Uploading {s3_path}...") + s3_client.upload_file(local_path, bucket, s3_path) - for filename in files: + return "CopyAssets-" + bucket - # construct the full local path - local_path = os.path.join(root, filename) - # construct the full s3 path - relative_path = os.path.relpath(local_path, local_directory) - s3_path = os.path.join(destination, relative_path) - logger.info("Uploading %s..." % s3_path) - s3_client.upload_file(local_path, bucket, s3_path) +def on_event(event, context): + helper(event, context) + + +@helper.create +def custom_resource(event, _): + + try: + resource_id = copy_assets_to_s3(s3_client) + return resource_id - return "CopyAssets-" + bucket except Exception as e: exc_type, exc_value, exc_tb = sys.exc_info() logger.error(traceback.format_exception(exc_type, exc_value, exc_tb)) diff --git a/source/lib/blueprints/byom/lambdas/configure_inference_lambda/requirements-test.txt b/source/lambdas/custom_resource/requirements-test.txt similarity index 100% rename from source/lib/blueprints/byom/lambdas/configure_inference_lambda/requirements-test.txt rename to source/lambdas/custom_resource/requirements-test.txt diff --git a/source/lambdas/custom_resource/requirements.txt b/source/lambdas/custom_resource/requirements.txt index 76fcf16..fa256a1 100644 --- a/source/lambdas/custom_resource/requirements.txt +++ b/source/lambdas/custom_resource/requirements.txt @@ -1 +1 @@ -crhelper==2.0.6 \ No newline at end of file +crhelper==2.0.10 \ No newline at end of file diff --git a/source/lib/blueprints/byom/lambdas/configure_inference_lambda/setup.py b/source/lambdas/custom_resource/setup.py similarity index 88% rename from source/lib/blueprints/byom/lambdas/configure_inference_lambda/setup.py rename to source/lambdas/custom_resource/setup.py index f88ae84..bb4e1e5 100644 --- a/source/lib/blueprints/byom/lambdas/configure_inference_lambda/setup.py +++ b/source/lambdas/custom_resource/setup.py @@ -1,5 +1,5 @@ ####################################################################################################################### -# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # # # # Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # # with the License. A copy of the License is located at # @@ -12,4 +12,4 @@ # ##################################################################################################################### from setuptools import setup, find_packages -setup(name="configure_inference_lambda", packages=find_packages()) \ No newline at end of file +setup(name="custom_resource", packages=find_packages()) \ No newline at end of file diff --git a/source/lambdas/custom_resource/tests/test_custom_resource.py b/source/lambdas/custom_resource/tests/test_custom_resource.py new file mode 100644 index 0000000..0bd160d --- /dev/null +++ b/source/lambdas/custom_resource/tests/test_custom_resource.py @@ -0,0 +1,84 @@ +# ##################################################################################################################### +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +import os +import boto3 +import tempfile +import pytest +from unittest.mock import patch +from moto import mock_s3 + +from index import copy_assets_to_s3, on_event, custom_resource, no_op + + +@pytest.fixture(autouse=True) +def mock_env_variables(): + os.environ["source_bucket"] = "solutions-bucket" + os.environ["destination_bucket"] = "blueprints-bucket" + os.environ["TESTFILE"] = "blueprints.zip" + + +@pytest.fixture +def event(): + return {"bucket": os.environ["source_bucket"]} + + +@pytest.fixture +def mocked_response(): + return f"CopyAssets-{os.environ['destination_bucket']}" + + +@mock_s3 +@patch("index.os.walk") +@patch("index.shutil.unpack_archive") +@patch("index.urllib.request.urlretrieve") +def test_copy_assets_to_s3(mocked_urllib, mocked_shutil, mocked_walk, mocked_response): + s3_client = boto3.client("s3", region_name="us-east-1") + testfile = tempfile.NamedTemporaryFile() + s3_client.create_bucket(Bucket="solutions-bucket") + s3_client.create_bucket(Bucket="blueprints-bucket") + s3_client.upload_file(testfile.name, os.environ["source_bucket"], os.environ["TESTFILE"]) + local_file = tempfile.NamedTemporaryFile() + mocked_urllib.side_effect = s3_client.download_file( + os.environ["source_bucket"], os.environ["TESTFILE"], local_file.name + ) + tmp = tempfile.mkdtemp() + mocked_walk.return_value = [ + (tmp, (local_file.name,), (local_file.name,)), + ] + + assert copy_assets_to_s3(s3_client) == mocked_response + + +@patch("index.custom_resource") +def test_no_op(mocked_custom, event): + response = no_op(event, {}) + assert response is None + mocked_custom.assert_not_called() + + +@patch("index.helper") +def test_on_event(mocked_helper, event): + on_event(event, {}) + mocked_helper.assert_called_with(event, {}) + + +@patch("index.copy_assets_to_s3") +def test_custom_resource(mocked_copy, event, mocked_response): + # assert expected response + mocked_copy.return_value = mocked_response + respone = custom_resource(event, {}) + assert respone == mocked_response + # assert for error + mocked_copy.side_effect = Exception("mocked error") + with pytest.raises(Exception): + custom_resource(event, {}) diff --git a/source/lambdas/pipeline_orchestration/.coveragerc b/source/lambdas/pipeline_orchestration/.coveragerc index caadc2f..a3b5310 100644 --- a/source/lambdas/pipeline_orchestration/.coveragerc +++ b/source/lambdas/pipeline_orchestration/.coveragerc @@ -6,6 +6,5 @@ omit = cdk.out/* conftest.py test_*.py - *helper.py source = . \ No newline at end of file diff --git a/source/lambdas/pipeline_orchestration/index.py b/source/lambdas/pipeline_orchestration/index.py index 10016b6..c757871 100644 --- a/source/lambdas/pipeline_orchestration/index.py +++ b/source/lambdas/pipeline_orchestration/index.py @@ -1,5 +1,5 @@ # ##################################################################################################################### -# Copyright 2020-2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # # # # Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # # with the License. A copy of the License is located at # @@ -11,19 +11,30 @@ # and limitations under the License. # # ##################################################################################################################### import json -import uuid from json import JSONEncoder import os import datetime -import boto3 from shared.wrappers import BadRequest, api_exception_handler from shared.logger import get_logger - -cloudformation_client = boto3.client("cloudformation") -codepipeline_client = boto3.client("codepipeline") +from shared.helper import get_client +from lambda_helpers import ( + validate, + template_url, + get_stack_name, + get_codepipeline_params, + get_image_builder_params, + format_template_parameters, + create_template_zip_file, +) + +cloudformation_client = get_client("cloudformation") +codepipeline_client = get_client("codepipeline") +s3_client = get_client("s3") logger = get_logger(__name__) +content_type = "plain/text" + # subclass JSONEncoder to be able to convert pipeline status to json class DateTimeEncoder(JSONEncoder): @@ -51,7 +62,7 @@ def handler(event, context): ) -def provision_pipeline(event, client=cloudformation_client): +def provision_pipeline(event, client=cloudformation_client, s3_client=s3_client): """ provision_pipeline takes the lambda event object and creates a cloudformation stack @@ -66,26 +77,58 @@ def provision_pipeline(event, client=cloudformation_client): # validate required attributes based on the pipeline's type validated_event = validate(event) # extract byom attributes - pipeline_type = validated_event.get("pipeline_type", "") - custom_container = validated_event.get("custom_model_container", "") - inference_type = validated_event.get("inference_type", "") - pipeline_template_url = template_url(inference_type, custom_container, pipeline_type) - # construct common temaplate paramaters - provisioned_pipeline_stack_name, template_parameters = get_template_parameters(validated_event) - # create a pipeline stack using user parameters and specified blueprint - stack_response = client.create_stack( - StackName=provisioned_pipeline_stack_name, - TemplateURL=pipeline_template_url, - Parameters=template_parameters, - Capabilities=["CAPABILITY_IAM"], - OnFailure="DO_NOTHING", - RoleARN=os.environ["CFN_ROLE_ARN"], - Tags=[ - {"Key": "stack_name", "Value": provisioned_pipeline_stack_name}, - ], - ) + pipeline_type = validated_event.get("pipeline_type", "").strip().lower() + is_multi_account = os.environ["IS_MULTI_ACCOUNT"] + provisioned_pipeline_template_url = template_url(pipeline_type) + + # construct stack name to provision + provisioned_pipeline_stack_name = get_stack_name(validated_event) + + # if the pipeline to provision is byom_image_builder + if pipeline_type == "byom_image_builder": + image_builder_params = get_image_builder_params(validated_event) + # format the params (the format is the same for multi-accouunt parameters) + formatted_image_builder_params = format_template_parameters(image_builder_params, "True") + # create the codepipeline + stack_response = create_codepipeline_stack( + provisioned_pipeline_stack_name, template_url("byom_image_builder"), formatted_image_builder_params, client + ) + + else: + # create a pipeline stack using user parameters and specified blueprint + codepipeline_stack_name = f"{provisioned_pipeline_stack_name}-codepipeline" + pipeline_template_url = ( + template_url("multi_account_codepipeline") + if is_multi_account == "True" + else template_url("single_account_codepipeline") + ) + + template_zip_name = f"mlops_provisioned_pipelines/{provisioned_pipeline_stack_name}/template.zip" + template_file_name = provisioned_pipeline_template_url.split("/")[-1] + # get the codepipeline parameters + codepipeline_params = get_codepipeline_params( + is_multi_account, provisioned_pipeline_stack_name, template_zip_name, template_file_name + ) + # format the params (the format is the same for multi-accouunt parameters) + formatted_codepipeline_params = format_template_parameters(codepipeline_params, "True") + # create the codepipeline + stack_response = create_codepipeline_stack( + codepipeline_stack_name, pipeline_template_url, formatted_codepipeline_params, client + ) + + # upload template.zip (contains pipeline template and parameters files) + create_template_zip_file( + validated_event, + os.environ["BLUEPRINT_BUCKET"], + os.environ["ASSETS_BUCKET"], + provisioned_pipeline_template_url, + template_zip_name, + is_multi_account, + s3_client, + ) + logger.info("New pipelin stack created") - logger.debug(stack_response) + logger.info(stack_response) response = { "statusCode": 200, "isBase64Encoded": False, @@ -95,11 +138,72 @@ def provision_pipeline(event, client=cloudformation_client): "pipeline_id": stack_response["StackId"], } ), - "headers": {"Content-Type": "plain/text"}, + "headers": {"Content-Type": content_type}, } return response +def update_stack(codepipeline_stack_name, pipeline_template_url, template_parameters, client): + try: + update_response = client.update_stack( + StackName=codepipeline_stack_name, + TemplateURL=pipeline_template_url, + Parameters=template_parameters, + Capabilities=["CAPABILITY_IAM"], + RoleARN=os.environ["CFN_ROLE_ARN"], + Tags=[ + {"Key": "stack_name", "Value": codepipeline_stack_name}, + ], + ) + + logger.info(update_response) + + return {"StackId": f"Pipeline {codepipeline_stack_name} is being updated."} + + except Exception as e: + logger.info(f"Error during stack update {codepipeline_stack_name}: {str(e)}") + if "No updates are to be performed" in str(e): + return { + "StackId": f"Pipeline {codepipeline_stack_name} is already provisioned. No updates are to be performed." + } + else: + raise e + + +def create_codepipeline_stack( + codepipeline_stack_name, pipeline_template_url, template_parameters, client=cloudformation_client +): + try: + stack_response = client.create_stack( + StackName=codepipeline_stack_name, + TemplateURL=pipeline_template_url, + Parameters=template_parameters, + Capabilities=["CAPABILITY_IAM"], + OnFailure="DO_NOTHING", + RoleARN=os.environ["CFN_ROLE_ARN"], + Tags=[ + {"Key": "stack_name", "Value": codepipeline_stack_name}, + ], + ) + + logger.info(stack_response) + return stack_response + + except Exception as e: + logger.error(f"Error in create_update_cf_stackset lambda functions: {str(e)}") + if "already exists" in str(e): + logger.info(f"AWS Codepipeline {codepipeline_stack_name} already exists. Skipping codepipeline create") + # if the pipeline to update is BYOMPipelineImageBuilder + if codepipeline_stack_name.endswith("byompipelineimagebuilder"): + return update_stack(codepipeline_stack_name, pipeline_template_url, template_parameters, client) + + return { + "StackId": f"Pipeline {codepipeline_stack_name} is already provisioned. Updating template parameters." + } + else: + raise e + + def pipeline_status(event, cfn_client=cloudformation_client, cp_client=codepipeline_client): """ pipeline_status takes the lambda event object and returns the status of codepipeline project that's @@ -126,7 +230,7 @@ def pipeline_status(event, cfn_client=cloudformation_client, cp_client=codepipel "statusCode": 200, "isBase64Encoded": False, "body": "pipeline cloudformation stack has not provisioned the pipeline yet.", - "headers": {"Content-Type": "plain/text"}, + "headers": {"Content-Type": content_type}, } else: # object from codepipeline @@ -136,362 +240,5 @@ def pipeline_status(event, cfn_client=cloudformation_client, cp_client=codepipel "statusCode": 200, "isBase64Encoded": False, "body": json.dumps(pipeline_status, indent=4, cls=DateTimeEncoder), - "headers": {"Content-Type": "plain/text"}, + "headers": {"Content-Type": content_type}, } - - -def template_url(inference_type, custom_container, pipeline_type): - """ - template_url is a helper function that determines the cloudformation stack's file name based on - inputs - - :inference_type: type of inference from lambda event input. Possible values: 'batch' or 'realtime' - :custom_container: whether a custom container build is needed in the pipeline or no. - Possible values: 'True' or 'False' - - :return: returns a link to the appropriate coudformation template files which can be one of these values: - byom_realtime_build_container.yaml - byom_realtime_builtin_container.yaml - byom_batch_build_container.yaml - byom_batch_builtin_container.yaml - """ - if pipeline_type == "model_monitor": - url = "https://" + os.environ["BLUEPRINT_BUCKET_URL"] + "/blueprints/byom/model_monitor.yaml" - return url - else: - url = "https://" + os.environ["BLUEPRINT_BUCKET_URL"] + "/blueprints/" + pipeline_type + "/" + pipeline_type - if inference_type.lower() == "realtime": - url = url + "_realtime" - elif inference_type.lower() == "batch": - url = url + "_batch" - else: - raise BadRequest("Bad request format. Inference type must be 'realtime' or 'batch'") - - if len(custom_container) > 0 and custom_container.endswith(".zip"): - url = url + "_build_container.yaml" - elif len(custom_container) == 0: - url = url + "_builtin_container.yaml" - else: - raise BadRequest( - "Bad request. Custom container should point to a path to .zip file containing custom model assets." - ) - return url - - -def get_template_parameters(event): - pipeline_type = event.get("pipeline_type", "") - model_framework = event.get("model_framework", "") - model_framework_version = event.get("model_framework_version", "") - model_name = event.get("model_name", "").lower().strip() - model_artifact_location = event.get("model_artifact_location", "") - inference_instance = event.get("inference_instance", "") - custom_container = event.get("custom_model_container", "") - batch_inference_data = event.get("batch_inference_data", "") - pipeline_stack_name = os.environ["PIPELINE_STACK_NAME"] - endpoint_name = event.get("endpoint_name", "") - template_parameters = [ - { - "ParameterKey": "NOTIFICATIONEMAIL", - "ParameterValue": os.environ["NOTIFICATION_EMAIL"], - "UsePreviousValue": True, - }, - { - "ParameterKey": "BLUEPRINTBUCKET", - "ParameterValue": os.environ["BLUEPRINT_BUCKET"], - "UsePreviousValue": True, - }, - { - "ParameterKey": "ASSETSBUCKET", - "ParameterValue": os.environ["ASSETS_BUCKET"], - "UsePreviousValue": True, - }, - ] - if pipeline_type == "byom": - provisioned_pipeline_stack_name = f"{pipeline_stack_name}-{model_name}" - # construct common parameters across byom builtin/custom and realtime/batch - template_parameters.extend( - [ - { - "ParameterKey": "MODELNAME", - "ParameterValue": model_name, - "UsePreviousValue": True, - }, - { - "ParameterKey": "MODELARTIFACTLOCATION", - "ParameterValue": model_artifact_location, - "UsePreviousValue": True, - }, - { - "ParameterKey": "INFERENCEINSTANCE", - "ParameterValue": inference_instance, - "UsePreviousValue": True, - }, - ] - ) - if ( - event.get("inference_type", "").lower().strip() == "realtime" - and event.get("model_framework", "").strip() != "" - ): - # update stack name - provisioned_pipeline_stack_name = f"{provisioned_pipeline_stack_name}-BYOMPipelineReatimeBuiltIn" - # add builtin/realtime parameters - template_parameters.extend( - [ - { - "ParameterKey": "MODELFRAMEWORK", - "ParameterValue": model_framework, - "UsePreviousValue": True, - }, - { - "ParameterKey": "MODELFRAMEWORKVERSION", - "ParameterValue": model_framework_version, - "UsePreviousValue": True, - }, - ] - ) - elif ( - event.get("inference_type", "").lower().strip() == "batch" - and event.get("model_framework", "").strip() != "" - ): - # update stack name - provisioned_pipeline_stack_name = f"{provisioned_pipeline_stack_name}-BYOMPipelineBatchBuiltIn" - # add builtin/batch parameters - template_parameters.extend( - [ - { - "ParameterKey": "MODELFRAMEWORK", - "ParameterValue": model_framework, - "UsePreviousValue": True, - }, - { - "ParameterKey": "MODELFRAMEWORKVERSION", - "ParameterValue": model_framework_version, - "UsePreviousValue": True, - }, - { - "ParameterKey": "BATCHINFERENCEDATA", - "ParameterValue": batch_inference_data, - "UsePreviousValue": True, - }, - ] - ) - elif ( - event.get("inference_type", "").lower().strip() == "realtime" - and event.get("model_framework", "").strip() == "" - ): - # update stack name - provisioned_pipeline_stack_name = f"{provisioned_pipeline_stack_name}-BYOMPipelineRealtimeBuild" - # add custom/realtime parameters - template_parameters.extend( - [ - { - "ParameterKey": "CUSTOMCONTAINER", - "ParameterValue": custom_container, - "UsePreviousValue": True, - }, - ] - ) - elif ( - event.get("inference_type", "").lower().strip() == "batch" - and event.get("model_framework", "").strip() == "" - ): - # update stack name - provisioned_pipeline_stack_name = f"{provisioned_pipeline_stack_name}-BYOMPipelineBatchBuild" - # add custom/batch parameters - template_parameters.extend( - [ - { - "ParameterKey": "CUSTOMCONTAINER", - "ParameterValue": custom_container, - "UsePreviousValue": True, - }, - { - "ParameterKey": "BATCHINFERENCEDATA", - "ParameterValue": batch_inference_data, - "UsePreviousValue": True, - }, - ] - ) - else: - raise BadRequest( - "Bad request format. Pipeline type not supported. Check documentation for API & config formats." - ) - - elif pipeline_type == "model_monitor": - provisioned_pipeline_stack_name = f"{pipeline_stack_name}-{endpoint_name}-model-monitor" - # get the optional monitoring type - monitoring_type = event.get("monitoring_type", "dataquality").lower().strip() - # create uniques names for data baseline and monitoring schedule. The names need to be unique because - # Old jobs are not deleted, and there is a high possibility that the client create a job with the same name - # which will throw an error. - baseline_job_name = f"{endpoint_name}-baseline-job-{str(uuid.uuid4())[:8]}" - monitoring_schedule_name = f"{endpoint_name}-monitor-{monitoring_type}-{str(uuid.uuid4())[:8]}" - # add model monitor parameters - template_parameters.extend( - [ - { - "ParameterKey": "BASELINEJOBOUTPUTLOCATION", - "ParameterValue": event.get("baseline_job_output_location"), - "UsePreviousValue": True, - }, - { - "ParameterKey": "ENDPOINTNAME", - "ParameterValue": endpoint_name, - "UsePreviousValue": True, - }, - { - "ParameterKey": "BASELINEJOBNAME", - "ParameterValue": baseline_job_name, - "UsePreviousValue": True, - }, - { - "ParameterKey": "MONITORINGSCHEDULENAME", - "ParameterValue": monitoring_schedule_name, - "UsePreviousValue": True, - }, - { - "ParameterKey": "MONITORINGOUTPUTLOCATION", - "ParameterValue": event.get("monitoring_output_location"), - "UsePreviousValue": True, - }, - { - "ParameterKey": "SCHEDULEEXPRESSION", - "ParameterValue": event.get("schedule_expression"), - "UsePreviousValue": True, - }, - { - "ParameterKey": "TRAININGDATA", - "ParameterValue": event.get("training_data"), - "UsePreviousValue": True, - }, - { - "ParameterKey": "INSTANCETYPE", - "ParameterValue": event.get("instance_type"), - "UsePreviousValue": True, - }, - { - "ParameterKey": "INSTANCEVOLUMESIZE", - "ParameterValue": event.get("instance_volume_size"), - "UsePreviousValue": True, - }, - { - "ParameterKey": "MONITORINGTYPE", - "ParameterValue": event.get("monitoring_type", "dataquality"), - "UsePreviousValue": True, - }, - { - "ParameterKey": "MAXRUNTIMESIZE", - "ParameterValue": event.get("max_runtime_seconds", "-1"), - "UsePreviousValue": True, - }, - ] - ) - else: - raise BadRequest( - "Bad request format. Pipeline type not supported. Check documentation for API & config formats." - ) - - return (provisioned_pipeline_stack_name.lower(), template_parameters) - - -def get_required_keys(event): - required_keys = [] - if event.get("pipeline_type", "").lower() == "byom": - # common keys - common_keys = [ - "pipeline_type", - "model_name", - "model_artifact_location", - "inference_instance", - "inference_type", - ] - - if ( - event.get("inference_type", "").lower().strip() == "realtime" - and event.get("model_framework", "").strip() != "" - ): - required_keys = common_keys + [ - "model_framework", - "model_framework_version", - ] - elif ( - event.get("inference_type", "").lower().strip() == "batch" - and event.get("model_framework", "").strip() != "" - ): - required_keys = common_keys + [ - "model_framework", - "model_framework_version", - "batch_inference_data", - ] - elif ( - event.get("inference_type", "").lower().strip() == "realtime" - and event.get("model_framework", "").strip() == "" - ): - required_keys = common_keys + [ - "custom_model_container", - ] - elif ( - event.get("inference_type", "").lower().strip() == "batch" - and event.get("model_framework", "").strip() == "" - ): - required_keys = common_keys + [ - "custom_model_container", - "batch_inference_data", - ] - else: - raise BadRequest("Bad request. missing keys for byom") - elif event.get("pipeline_type", "").lower().strip() == "model_monitor": - required_keys = [ - "pipeline_type", - "endpoint_name", - "baseline_job_output_location", - "monitoring_output_location", - "schedule_expression", - "training_data", - "instance_type", - "instance_volume_size", - ] - - if event.get("monitoring_type", "").lower().strip() in ["modelquality", "modelbias", "modelexplainability"]: - required_keys = required_keys + [ - "features_attribute", - "inference_attribute", - "probability_attribute", - "probability_threshold_attribute", - ] - # monitoring_type is optional, but if the client provided a value not in the allowed values, raise an exception - elif event.get("monitoring_type", "").lower().strip() not in [ - "", - "dataquality", - "modelquality", - "modelbias", - "modelexplainability", - ]: - raise BadRequest( - "Bad request. MonitoringType supported are 'DataQuality'|'ModelQuality'|'ModelBias'|'ModelExplainability'" - ) - else: - raise BadRequest( - "Bad request format. Pipeline type not supported. Check documentation for API & config formats" - ) - - return required_keys - - -def validate(event): - """ - validate is a helper function that checks if all required input parameters are present in the handler's event object - - :event: Lambda function's event object - - :return: returns the event back if it passes the validation othewise it raises a bad request exception - :raises: BadRequest Exception - """ - # get the required keys to validate the event - required_keys = get_required_keys(event) - for key in required_keys: - if key not in event: - logger.error(f"Request event did not have parameter: {key}") - raise BadRequest(f"Bad request. API body does not have the necessary parameter: {key}") - - return event diff --git a/source/lambdas/pipeline_orchestration/lambda_helpers.py b/source/lambdas/pipeline_orchestration/lambda_helpers.py new file mode 100644 index 0000000..2a02fca --- /dev/null +++ b/source/lambdas/pipeline_orchestration/lambda_helpers.py @@ -0,0 +1,412 @@ +# ##################################################################################################################### +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +import os +import json +import sagemaker +import shutil +import tempfile +import uuid +from shared.wrappers import BadRequest +from shared.helper import get_built_in_model_monitor_container_uri +from shared.logger import get_logger + + +logger = get_logger(__name__) + + +def template_url(pipeline_type): + """ + template_url is a helper function that determines the cloudformation stack's file name based on + inputs + + :pipeline_type: type of pipeline. Supported values: + "byom_realtime_builtin"|"byom_realtime_custom"|"byom_batch_builtin"|"byom_batch_custom"| + "byom_model_monitor"|"byom_image_builder"|"single_account_codepipeline"| + "multi_account_codepipeline" + + :return: returns a link to the appropriate coudformation template files which can be one of these values: + byom_realtime_inference_pipeline.yaml + byom_batch_pipeline.yaml + byom_model_monitor.yaml + byom_custom_algorithm_image_builder.yaml + single_account_codepipeline.yaml + multi_account_codepipeline.yaml + """ + url = "https://" + os.environ["BLUEPRINT_BUCKET_URL"] + "/blueprints/byom" + realtime_inference_template = "blueprints/byom/byom_realtime_inference_pipeline.yaml" + batch_inference_template = "blueprints/byom/byom_batch_pipeline.yaml" + + templates_map = { + "byom_realtime_builtin": realtime_inference_template, + "byom_realtime_custom": realtime_inference_template, + "byom_batch_builtin": batch_inference_template, + "byom_batch_custom": batch_inference_template, + "byom_model_monitor": "blueprints/byom/byom_model_monitor.yaml", + "byom_image_builder": f"{url}/byom_custom_algorithm_image_builder.yaml", + "single_account_codepipeline": f"{url}/single_account_codepipeline.yaml", + "multi_account_codepipeline": f"{url}/multi_account_codepipeline.yaml", + } + + if pipeline_type in list(templates_map.keys()): + return templates_map[pipeline_type] + + else: + raise BadRequest(f"Bad request. Pipeline type: {pipeline_type} is not supported.") + + +def get_stage_param(event, api_key, stage): + api_key_value = event.get(api_key, "") + if isinstance(api_key_value, dict) and stage in list(api_key_value.keys()): + api_key_value = api_key_value[stage] + + return api_key_value + + +def get_stack_name(event): + pipeline_type = event.get("pipeline_type") + pipeline_stack_name = os.environ["PIPELINE_STACK_NAME"] + model_name = event.get("model_name", "").lower().strip() + if pipeline_type in [ + "byom_realtime_builtin", + "byom_realtime_custom", + "byom_batch_builtin", + "byom_batch_custom", + ]: + + postfix = { + "byom_realtime_builtin": "BYOMPipelineRealtimeBuiltIn", + "byom_realtime_custom": "BYOMPipelineRealtimeCustom", + "byom_batch_builtin": "BYOMPipelineBatchBuiltIn", + "byom_batch_custom": "BYOMPipelineBatchCustom", + } + # name of stack + provisioned_pipeline_stack_name = f"{pipeline_stack_name}-{model_name}-{postfix[pipeline_type]}" + + elif pipeline_type == "byom_model_monitor": + provisioned_pipeline_stack_name = f"{pipeline_stack_name}-{model_name}-BYOMModelMonitor" + + elif pipeline_type == "byom_image_builder": + image_tag = event.get("image_tag") + provisioned_pipeline_stack_name = f"{pipeline_stack_name}-{image_tag}-BYOMPipelineImageBuilder" + + return provisioned_pipeline_stack_name.lower() + + +def get_template_parameters(event, is_multi_account, stage=None): + pipeline_type = event.get("pipeline_type") + region = os.environ["REGION"] + + kms_key_arn = get_stage_param(event, "kms_key_arn", stage) + common_params = [ + ("ASSETSBUCKET", os.environ["ASSETS_BUCKET"]), + ("KMSKEYARN", kms_key_arn), + ("BLUEPRINTBUCKET", os.environ["BLUEPRINT_BUCKET"]), + ] + if pipeline_type in [ + "byom_realtime_builtin", + "byom_realtime_custom", + "byom_batch_builtin", + "byom_batch_custom", + ]: + + common_params.extend(get_common_realtime_batch_params(event, region, stage)) + + # add realtime specfic parameters + if pipeline_type in ["byom_realtime_builtin", "byom_realtime_custom"]: + common_params.extend(get_realtime_specific_params(event, stage)) + # else add batch params + else: + common_params.extend(get_bacth_specific_params(event, stage)) + + return common_params + + elif pipeline_type == "byom_model_monitor": + common_params.extend(get_model_monitor_params(event, region, stage)) + return common_params + + elif pipeline_type == "byom_image_builder": + return get_image_builder_params(event) + + else: + raise BadRequest("Bad request format. Please provide a supported pipeline") + + +def get_codepipeline_params(is_multi_account, stack_name, template_zip_name, template_file_name): + + single_account_params = [ + ("NOTIFICATIONEMAIL", os.environ["NOTIFICATION_EMAIL"]), + ("TEMPLATEZIPNAME", template_zip_name), + ("TEMPLATEFILENAME", template_file_name), + ("ASSETSBUCKET", os.environ["ASSETS_BUCKET"]), + ("STACKNAME", stack_name), + ] + if is_multi_account == "False": + single_account_params.extend([("TEMPLATEPARAMSNAME", "template_params.json")]) + return single_account_params + + else: + single_account_params.extend( + [ + ("DEVPARAMSNAME", "dev_template_params.json"), + ("STAGINGPARAMSNAME", "staging_template_params.json"), + ("PRODPARAMSNAME", "prod_template_params.json"), + ("DEVACCOUNTID", os.environ["DEV_ACCOUNT_ID"]), + ("DEVORGID", os.environ["DEV_ORG_ID"]), + ("STAGINGACCOUNTID", os.environ["STAGING_ACCOUNT_ID"]), + ("STAGINGORGID", os.environ["STAGING_ORG_ID"]), + ("PRODACCOUNTID", os.environ["PROD_ACCOUNT_ID"]), + ("PRODORGID", os.environ["PROD_ORG_ID"]), + ("BLUEPRINTBUCKET", os.environ["BLUEPRINT_BUCKET"]), + ] + ) + + return single_account_params + + +def get_common_realtime_batch_params(event, region, stage): + inference_instance = get_stage_param(event, "inference_instance", stage) + return [ + ("MODELNAME", event.get("model_name")), + ("MODELARTIFACTLOCATION", event.get("model_artifact_location")), + ("INFERENCEINSTANCE", inference_instance), + ("CUSTOMALGORITHMSECRREPOARN", os.environ["ECR_REPO_ARN"]), + ("IMAGEURI", get_image_uri(event.get("pipeline_type"), event, region)), + ] + + +def clean_param(param): + if param.endswith("/"): + return param[:-1] + else: + return param + + +def get_realtime_specific_params(event, stage): + data_capture_location = clean_param(get_stage_param(event, "data_capture_location", stage)) + return [("DATACAPTURELOCATION", data_capture_location)] + + +def get_bacth_specific_params(event, stage): + batch_inference_data = get_stage_param(event, "batch_inference_data", stage) + batch_job_output_location = clean_param(get_stage_param(event, "batch_job_output_location", stage)) + return [ + ("BATCHINPUTBUCKET", batch_inference_data.split("/")[0]), + ("BATCHINFERENCEDATA", batch_inference_data), + ("BATCHOUTPUTLOCATION", batch_job_output_location), + ] + + +def get_model_monitor_params(event, region, stage): + endpoint_name = get_stage_param(event, "endpoint_name", stage).lower().strip() + monitoring_type = event.get("monitoring_type", "dataquality") + + # generate jobs names + baseline_job_name = f"{endpoint_name}-baseline-job-{str(uuid.uuid4())[:4]}" + monitoring_schedule_name = f"{endpoint_name}-monitor-{str(uuid.uuid4())[:4]}" + + baseline_job_output_location = clean_param(get_stage_param(event, "baseline_job_output_location", stage)) + data_capture_location = clean_param(get_stage_param(event, "baseline_job_output_location", stage)) + instance_type = get_stage_param(event, "instance_type", stage) + instance_volume_size = get_stage_param(event, "instance_volume_size", stage) + max_runtime_seconds = get_stage_param(event, "max_runtime_seconds", stage) + monitoring_output_location = clean_param(get_stage_param(event, "monitoring_output_location", stage)) + schedule_expression = get_stage_param(event, "schedule_expression", stage) + + return [ + ("BASELINEJOBNAME", baseline_job_name), + ("BASELINEOUTPUTBUCKET", baseline_job_output_location.split("/")[0]), + ("BASELINEJOBOUTPUTLOCATION", baseline_job_output_location), + ("DATACAPTUREBUCKET", data_capture_location.split("/")[0]), + ("DATACAPTURELOCATION", data_capture_location), + ("ENDPOINTNAME", endpoint_name), + ("IMAGEURI", get_built_in_model_monitor_container_uri(region)), + ("INSTANCETYPE", instance_type), + ("INSTANCEVOLUMESIZE", instance_volume_size), + ("MAXRUNTIMESECONDS", max_runtime_seconds), + ("MONITORINGOUTPUTLOCATION", monitoring_output_location), + ("MONITORINGSCHEDULENAME", monitoring_schedule_name), + ("MONITORINGTYPE", monitoring_type), + ("SCHEDULEEXPRESSION", schedule_expression), + ("TRAININGDATA", event.get("training_data")), + ] + + +def get_image_builder_params(event): + return [ + ("NOTIFICATIONEMAIL", os.environ["NOTIFICATION_EMAIL"]), + ("ASSETSBUCKET", os.environ["ASSETS_BUCKET"]), + ("CUSTOMCONTAINER", event.get("custom_algorithm_docker")), + ("ECRREPONAME", event.get("ecr_repo_name")), + ("IMAGETAG", event.get("image_tag")), + ] + + +def format_template_parameters(key_value_list, is_multi_account): + if is_multi_account == "True": + # for the multi-account option, the StackSet action, used by multi-account codepipeline, + # requires this parameters format + return [{"ParameterKey": param[0], "ParameterValue": param[1]} for param in key_value_list] + else: + # for single account option, the CloudFormation action, used by single-account codepipeline, + # requires this parameters format + return {"Parameters": {param[0]: param[1] for param in key_value_list}} + + +def write_params_to_json(params, file_path): + with open(file_path, "w") as fp: + json.dump(params, fp, indent=4) + + +def upload_file_to_s3(local_file_path, s3_bucket_name, s3_file_key, s3_client): + s3_client.upload_file(local_file_path, s3_bucket_name, s3_file_key) + + +def download_file_from_s3(s3_bucket_name, file_key, local_file_path, s3_client): + s3_client.download_file(s3_bucket_name, file_key, local_file_path) + + +def create_template_zip_file( + event, blueprint_bucket, assets_bucket, template_url, template_zip_name, is_multi_account, s3_client +): + zip_output_filename = "template" + + # create a tmpdir for the zip file to downlaod + local_directory = tempfile.mkdtemp() + local_file_path = os.path.join(local_directory, template_url.split("/")[-1]) + + # downloawd the template from the blueprints bucket + download_file_from_s3(blueprint_bucket, template_url, local_file_path, s3_client) + + # create tmpdir to zip clodformation and stages parameters + zip_local_directory = tempfile.mkdtemp() + zip_file_path = os.path.join(zip_local_directory, zip_output_filename) + + # downloawd the template from the blueprints bucket + download_file_from_s3(blueprint_bucket, template_url, f"{local_directory}/{template_url.split('/')[-1]}", s3_client) + + # write the params to json file(s) + if is_multi_account == "True": + for stage in ["dev", "staging", "prod"]: + # format the template params + stage_params_list = get_template_parameters(event, is_multi_account, stage) + params_formated = format_template_parameters(stage_params_list, is_multi_account) + write_params_to_json(params_formated, f"{local_directory}/{stage}_template_params.json") + else: + stage_params_list = get_template_parameters(event, is_multi_account) + params_formated = format_template_parameters(stage_params_list, is_multi_account) + write_params_to_json(params_formated, f"{local_directory}/template_params.json") + + # make the zip file + shutil.make_archive( + zip_file_path, + "zip", + local_directory, + ) + + # uploda file + upload_file_to_s3( + f"{zip_file_path}.zip", + assets_bucket, + f"{template_zip_name}", + s3_client, + ) + + +def get_image_uri(pipeline_type, event, region): + if pipeline_type in ["byom_realtime_custom", "byom_batch_custom"]: + return event.get("custom_image_uri") + elif pipeline_type in ["byom_realtime_builtin", "byom_batch_builtin"]: + return sagemaker.image_uris.retrieve( + framework=event.get("model_framework"), region=region, version=event.get("model_framework_version") + ) + else: + raise Exception("Unsupported pipeline by get_image_uri function") + + +def get_required_keys(pipeline_type): + # Realtime/batch pipelines + if pipeline_type in [ + "byom_realtime_builtin", + "byom_realtime_custom", + "byom_batch_builtin", + "byom_batch_custom", + ]: + common_keys = [ + "pipeline_type", + "model_name", + "model_artifact_location", + "inference_instance", + ] + builtin_model_keys = [ + "model_framework", + "model_framework_version", + ] + custom_model_keys = ["custom_image_uri"] + realtime_specific_keys = ["data_capture_location"] + batch_specific_keys = ["batch_inference_data", "batch_job_output_location"] + + keys_map = { + "byom_realtime_builtin": common_keys + builtin_model_keys + realtime_specific_keys, + "byom_realtime_custom": common_keys + custom_model_keys + realtime_specific_keys, + "byom_batch_builtin": common_keys + builtin_model_keys + batch_specific_keys, + "byom_batch_custom": common_keys + custom_model_keys + batch_specific_keys, + } + + return keys_map[pipeline_type] + + # Model Monitor pipeline + elif pipeline_type == "byom_model_monitor": + return [ + "pipeline_type", + "model_name", + "endpoint_name", + "training_data", + "baseline_job_output_location", + "data_capture_location", + "monitoring_output_location", + "schedule_expression", + "instance_type", + "instance_volume_size", + ] + # Image Builder pipeline + elif pipeline_type == "byom_image_builder": + return [ + "pipeline_type", + "custom_algorithm_docker", + "ecr_repo_name", + "image_tag", + ] + + else: + raise BadRequest( + "Bad request format. Pipeline type not supported. Check documentation for API & config formats" + ) + + +def validate(event): + """ + validate is a helper function that checks if all required input parameters are present in the handler's event object + + :event: Lambda function's event object + + :return: returns the event back if it passes the validation othewise it raises a bad request exception + :raises: BadRequest Exception + """ + # get the required keys to validate the event + required_keys = get_required_keys(event.get("pipeline_type", "")) + for key in required_keys: + if key not in event: + logger.error(f"Request event did not have parameter: {key}") + raise BadRequest(f"Bad request. API body does not have the necessary parameter: {key}") + + return event diff --git a/source/lambdas/pipeline_orchestration/shared/helper.py b/source/lambdas/pipeline_orchestration/shared/helper.py index 06c905d..04f70ce 100644 --- a/source/lambdas/pipeline_orchestration/shared/helper.py +++ b/source/lambdas/pipeline_orchestration/shared/helper.py @@ -11,18 +11,27 @@ # and limitations under the License. # # ##################################################################################################################### import boto3 +import json +import os +from botocore.config import Config from shared.logger import get_logger - logger = get_logger(__name__) _helpers_service_clients = dict() -def get_client(service_name): +# Set Boto3 configuration to track the solution's usage +CLIENT_CONFIG = Config( + retries={"max_attempts": 3, "mode": "standard"}, + **json.loads(os.environ.get("AWS_SDK_USER_AGENT", '{"user_agent_extra": null}')), +) + + +def get_client(service_name, config=CLIENT_CONFIG): global _helpers_service_clients if service_name not in _helpers_service_clients: logger.debug(f"Initializing global boto3 client for {service_name}") - _helpers_service_clients[service_name] = boto3.client(service_name) + _helpers_service_clients[service_name] = boto3.client(service_name, config=config) return _helpers_service_clients[service_name] @@ -35,8 +44,6 @@ def reset_client(): # For the latest images per region, see https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-pre-built-container.html # These are SageMaker service account numbers for the built-in SageMaker containers. def get_built_in_model_monitor_container_uri(region): - container_uri_format = "{0}.dkr.ecr.{1}.amazonaws.com/sagemaker-model-monitor-analyzer" - regions_to_accounts = { "us-east-1": "156813124566", "us-east-2": "777275614652", @@ -63,5 +70,8 @@ def get_built_in_model_monitor_container_uri(region): "us-gov-west-1": "362178532790", } - container_uri = container_uri_format.format(regions_to_accounts[region], region) - return container_uri \ No newline at end of file + container_uri = ( + f"{regions_to_accounts[region]}.dkr.ecr.{region}.amazonaws.com/sagemaker-model-monitor-analyzer:latest" + ) + + return container_uri diff --git a/source/lambdas/pipeline_orchestration/shared/logger.py b/source/lambdas/pipeline_orchestration/shared/logger.py index b3a7cc1..72f1863 100644 --- a/source/lambdas/pipeline_orchestration/shared/logger.py +++ b/source/lambdas/pipeline_orchestration/shared/logger.py @@ -45,6 +45,6 @@ def get_logger(name): logging.getLogger("botocore").setLevel(logging.WARNING) logging.getLogger("urllib3").setLevel(logging.WARNING) else: - logging.basicConfig(level=get_level()) + logging.basicConfig(level=get_level()) # NOSONAR (python:S4792) logger = logging.getLogger(name) return logger diff --git a/source/lambdas/pipeline_orchestration/shared/wrappers.py b/source/lambdas/pipeline_orchestration/shared/wrappers.py index 08b4243..2540a27 100644 --- a/source/lambdas/pipeline_orchestration/shared/wrappers.py +++ b/source/lambdas/pipeline_orchestration/shared/wrappers.py @@ -25,26 +25,6 @@ class BadRequest(Exception): pass -def code_pipeline_exception_handler(f): - @wraps(f) - def wrapper(event, context): - try: - return f(event, context) - except Exception as e: - codepipeline = get_client("codepipeline") - exc_type, exc_value, exc_tb = sys.exc_info() - logger.error(traceback.format_exception(exc_type, exc_value, exc_tb)) - codepipeline.put_job_failure_result( - jobId=event["CodePipeline.job"]["id"], - failureDetails={ - "message": f"Job failed. {str(e)}. Check the logs for more info.", - "type": "JobFailed", - }, - ) - - return wrapper - - def api_exception_handler(f): @wraps(f) def wrapper(event, context): diff --git a/source/lambdas/pipeline_orchestration/tests/fixtures/orchestrator_fixtures.py b/source/lambdas/pipeline_orchestration/tests/fixtures/orchestrator_fixtures.py index 09925b9..2315f20 100644 --- a/source/lambdas/pipeline_orchestration/tests/fixtures/orchestrator_fixtures.py +++ b/source/lambdas/pipeline_orchestration/tests/fixtures/orchestrator_fixtures.py @@ -1,5 +1,5 @@ ####################################################################################################################### -# Copyright 2020-2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # # # # Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # # with the License. A copy of the License is located at # @@ -13,6 +13,7 @@ import os import pytest import uuid +import json @pytest.fixture(autouse=True) @@ -21,39 +22,221 @@ def mock_env_variables(): os.environ["NOTIFICATION_EMAIL"] = "test@example.com" os.environ["ASSETS_BUCKET"] = "testassetsbucket" os.environ["BLUEPRINT_BUCKET"] = "testbucket" - os.environ["PIPELINE_STACK_NAME"] = "teststack" + os.environ["PIPELINE_STACK_NAME"] = "mlops-pipeline" os.environ["CFN_ROLE_ARN"] = "arn:aws:role:region:account:action" + os.environ["IS_MULTI_ACCOUNT"] = "False" + os.environ["REGION"] = "us-east-1" + os.environ["ECR_REPO_ARN"] = "test-ecr-repo" + os.environ["DEV_ACCOUNT_ID"] = "dev_account_id" + os.environ["STAGING_ACCOUNT_ID"] = "staging_account_id" + os.environ["PROD_ACCOUNT_ID"] = "prod_account_id" + os.environ["DEV_ORG_ID"] = "dev_org_id" + os.environ["STAGING_ORG_ID"] = "staging_org_id" + os.environ["PROD_ORG_ID"] = "prod_org_id" + os.environ["MODELARTIFACTLOCATION"] = "model.tar.gz" + os.environ["INSTANCETYPE"] = "ml.m5.large" + os.environ["INFERENCEDATA"] = "inference/data.csv" + os.environ["BATCHOUTPUT"] = "bucket/output" + os.environ["DATACAPTURE"] = "bucket/datacapture" + os.environ["TRAININGDATA"] = "model_monitor/training-dataset-with-header.csv" + os.environ["BASELINEOUTPUT"] = "testbucket/model_monitor/baseline_output2" + os.environ["SCHEDULEEXP"] = "cron(0 * ? * * *)" + os.environ["CUSTOMIMAGE"] = "custom/custom_image.zip" + os.environ["TESTFILE"] = "testfile.zip" @pytest.fixture def api_byom_event(): - def _api_byom_event(inference_type, model_framework): + def _api_byom_event(pipeline_type, is_multi=False): event = { - "pipeline_type": "byom", + "pipeline_type": pipeline_type, "model_name": "testmodel", - "model_artifact_location": "model.tar.gz", - "inference_instance": "ml.m5.large", + "model_artifact_location": os.environ["MODELARTIFACTLOCATION"], } - - event["inference_type"] = inference_type - if inference_type == "batch" and model_framework != "": - event["batch_inference_data"] = "inference/data.csv" - event["model_framework"] = "xgboost" - event["model_framework_version"] = "0.90-1" - elif inference_type == "realtime" and model_framework != "": + if is_multi: + event["inference_instance"] = { + "dev": os.environ["INSTANCETYPE"], + "staging": os.environ["INSTANCETYPE"], + "prod": os.environ["INSTANCETYPE"], + } + else: + event["inference_instance"] = os.environ["INSTANCETYPE"] + if pipeline_type in ["byom_batch_builtin", "byom_batch_custom"]: + event["batch_inference_data"] = os.environ["INFERENCEDATA"] + if is_multi: + event["batch_job_output_location"] = { + "dev": "bucket/dev_output", + "staging": "bucket/staging_output", + "prod": "bucket/prod_output", + } + else: + event["batch_job_output_location"] = os.environ["BATCHOUTPUT"] + if pipeline_type in ["byom_realtime_builtin", "byom_realtime_custom"]: + if is_multi: + event["data_capture_location"] = { + "dev": "bucket/dev_datacapture", + "staging": "bucket/staging_datacapture", + "prod": "bucket/prod_datacapture", + } + else: + event["data_capture_location"] = os.environ["DATACAPTURE"] + + if pipeline_type in ["byom_realtime_builtin", "byom_batch_builtin"]: event["model_framework"] = "xgboost" event["model_framework_version"] = "0.90-1" + elif pipeline_type in ["byom_realtime_custom", "byom_batch_custom"]: + event["custom_image_uri"] = "custom-image-uri" - elif inference_type == "batch" and model_framework == "": - event["custom_model_container"] = "my_custom_image.zip" - event["batch_inference_data"] = "inference/data.csv" - elif inference_type == "realtime" and model_framework == "": - event["custom_model_container"] = "my_custom_image.zip" return event return _api_byom_event +@pytest.fixture +def api_monitor_event(): + return { + "pipeline_type": "byom_model_monitor", + "model_name": "testmodel", + "endpoint_name": "test_endpoint", + "training_data": os.environ["TRAININGDATA"], + "baseline_job_output_location": os.environ["BASELINEOUTPUT"], + "monitoring_output_location": "testbucket/model_monitor/monitor_output", + "data_capture_location": "testbucket/xgboost/datacapture", + "schedule_expression": os.environ["SCHEDULEEXP"], + "instance_type": os.environ["INSTANCETYPE"], + "instance_volume_size": "20", + "max_runtime_seconds": "3600", + } + + +@pytest.fixture +def api_image_builder_event(): + return { + "pipeline_type": "byom_image_builder", + "custom_algorithm_docker": os.environ["CUSTOMIMAGE"], + "ecr_repo_name": "mlops-ecrrep", + "image_tag": "tree", + } + + +@pytest.fixture +def expected_params_realtime_custom(): + return [ + ("ASSETSBUCKET", "testassetsbucket"), + ("KMSKEYARN", ""), + ("BLUEPRINTBUCKET", "testbucket"), + ("MODELNAME", "testmodel"), + ("MODELARTIFACTLOCATION", os.environ["MODELARTIFACTLOCATION"]), + ("INFERENCEINSTANCE", os.environ["INSTANCETYPE"]), + ("CUSTOMALGORITHMSECRREPOARN", "test-ecr-repo"), + ("IMAGEURI", "custom-image-uri"), + ("DATACAPTURELOCATION", os.environ["DATACAPTURE"]), + ] + + +@pytest.fixture +def expected_model_monitor_params(): + return [ + ("BASELINEJOBNAME", "test_endpoint-baseline-job-ec3a"), + ("BASELINEOUTPUTBUCKET", "testbucket"), + ("BASELINEJOBOUTPUTLOCATION", os.environ["BASELINEOUTPUT"]), + ("DATACAPTUREBUCKET", "testbucket"), + ("DATACAPTURELOCATION", os.environ["BASELINEOUTPUT"]), + ("ENDPOINTNAME", "test_endpoint"), + ("IMAGEURI", "156813124566.dkr.ecr.us-east-1.amazonaws.com/sagemaker-model-monitor-analyzer:latest"), + ("INSTANCETYPE", os.environ["INSTANCETYPE"]), + ("INSTANCEVOLUMESIZE", "20"), + ("MAXRUNTIMESECONDS", "3600"), + ("MONITORINGOUTPUTLOCATION", "testbucket/model_monitor/monitor_output"), + ("MONITORINGSCHEDULENAME", "test_endpoint-monitor-2a87"), + ("MONITORINGTYPE", "dataquality"), + ("SCHEDULEEXPRESSION", os.environ["SCHEDULEEXP"]), + ("TRAININGDATA", os.environ["TRAININGDATA"]), + ] + + +@pytest.fixture +def expected_common_realtime_batch_params(): + return [ + ("MODELNAME", "testmodel"), + ("MODELARTIFACTLOCATION", os.environ["MODELARTIFACTLOCATION"]), + ("INFERENCEINSTANCE", os.environ["INSTANCETYPE"]), + ("CUSTOMALGORITHMSECRREPOARN", "test-ecr-repo"), + ("IMAGEURI", "custom-image-uri"), + ] + + +@pytest.fixture +def expected_image_builder_params(): + return [ + ("NOTIFICATIONEMAIL", os.environ["NOTIFICATION_EMAIL"]), + ("ASSETSBUCKET", "testassetsbucket"), + ("CUSTOMCONTAINER", os.environ["CUSTOMIMAGE"]), + ("ECRREPONAME", "mlops-ecrrep"), + ("IMAGETAG", "tree"), + ] + + +@pytest.fixture +def expected_realtime_specific_params(): + return [("DATACAPTURELOCATION", os.environ["DATACAPTURE"])] + + +@pytest.fixture +def expect_single_account_params_format(): + return { + "Parameters": { + "NOTIFICATIONEMAIL": os.environ["NOTIFICATION_EMAIL"], + "ASSETSBUCKET": "testassetsbucket", + "CUSTOMCONTAINER": os.environ["CUSTOMIMAGE"], + "ECRREPONAME": "mlops-ecrrep", + "IMAGETAG": "tree", + } + } + + +@pytest.fixture +def stack_name(): + return "teststack-testmodel-byompipelineimagebuilder" + + +@pytest.fixture +def expected_multi_account_params_format(): + return [ + {"ParameterKey": "NOTIFICATIONEMAIL", "ParameterValue": os.environ["NOTIFICATION_EMAIL"]}, + {"ParameterKey": "ASSETSBUCKET", "ParameterValue": "testassetsbucket"}, + {"ParameterKey": "CUSTOMCONTAINER", "ParameterValue": os.environ["CUSTOMIMAGE"]}, + {"ParameterKey": "ECRREPONAME", "ParameterValue": "mlops-ecrrep"}, + {"ParameterKey": "IMAGETAG", "ParameterValue": "tree"}, + ] + + +@pytest.fixture +def expected_batch_specific_params(): + return [ + ("BATCHINPUTBUCKET", "inference"), + ("BATCHINFERENCEDATA", os.environ["INFERENCEDATA"]), + ("BATCHOUTPUTLOCATION", os.environ["BATCHOUTPUT"]), + ] + + +@pytest.fixture +def expected_batch_params(): + return [ + ("ASSETSBUCKET", "testassetsbucket"), + ("KMSKEYARN", ""), + ("BLUEPRINTBUCKET", "testbucket"), + ("MODELNAME", "testmodel"), + ("MODELARTIFACTLOCATION", os.environ["MODELARTIFACTLOCATION"]), + ("INFERENCEINSTANCE", os.environ["INSTANCETYPE"]), + ("CUSTOMALGORITHMSECRREPOARN", "test-ecr-repo"), + ("IMAGEURI", "custom-image-uri"), + ("BATCHINPUTBUCKET", "inference"), + ("BATCHINFERENCEDATA", os.environ["INFERENCEDATA"]), + ("BATCHOUTPUTLOCATION", os.environ["BATCHOUTPUT"]), + ] + + @pytest.fixture def required_api_byom_realtime_builtin(): return [ @@ -61,7 +244,7 @@ def required_api_byom_realtime_builtin(): "model_name", "model_artifact_location", "inference_instance", - "inference_type", + "data_capture_location", "model_framework", "model_framework_version", ] @@ -74,7 +257,7 @@ def required_api_byom_batch_builtin(): "model_name", "model_artifact_location", "inference_instance", - "inference_type", + "batch_job_output_location", "model_framework", "model_framework_version", "batch_inference_data", @@ -88,8 +271,8 @@ def required_api_byom_realtime_custom(): "model_name", "model_artifact_location", "inference_instance", - "inference_type", - "custom_model_container", + "data_capture_location", + "custom_image_uri", ] @@ -97,12 +280,22 @@ def required_api_byom_realtime_custom(): def required_api_byom_batch_custom(): return [ "pipeline_type", + "custom_image_uri", "model_name", "model_artifact_location", "inference_instance", - "inference_type", - "custom_model_container", "batch_inference_data", + "batch_job_output_location", + ] + + +@pytest.fixture +def required_api_image_builder(): + return [ + "pipeline_type", + "custom_algorithm_docker", + "ecr_repo_name", + "image_tag", ] @@ -110,13 +303,15 @@ def required_api_byom_batch_custom(): def api_model_monitor_event(): def _api_model_monitor_event(monitoring_type=""): monitor_event = { - "pipeline_type": "model_monitor", + "pipeline_type": "byom_model_monitor", + "model_name": "mymodel2", "endpoint_name": "xgb-churn-prediction-endpoint", - "training_data": "model_monitor/training-dataset-with-header.csv", - "baseline_job_output_location": "baseline_job_output", - "monitoring_output_location": "monitoring_output", - "schedule_expression": "cron(0 * ? * * *)", - "instance_type": "ml.m5.large", + "training_data": os.environ["TRAININGDATA"], + "baseline_job_output_location": "bucket/baseline_job_output", + "data_capture_location": os.environ["DATACAPTURE"], + "monitoring_output_location": "bucket/monitoring_output", + "schedule_expression": os.environ["SCHEDULEEXP"], + "instance_type": os.environ["INSTANCETYPE"], "instance_volume_size": "20", } if monitoring_type.lower() != "" and monitoring_type.lower() in [ @@ -135,12 +330,14 @@ def required_api_keys_model_monitor(): def _required_api_keys_model_monitor(default=True): default_keys = [ "pipeline_type", + "model_name", "endpoint_name", "baseline_job_output_location", "monitoring_output_location", "schedule_expression", "training_data", "instance_type", + "data_capture_location", "instance_volume_size", ] if default: @@ -380,14 +577,19 @@ def _get_parameters_keys(parameters): @pytest.fixture def cf_client_params(api_byom_event, template_parameters_realtime_builtin): - template_parameters = template_parameters_realtime_builtin(api_byom_event("realtime", "xgboost")) + template_parameters = template_parameters_realtime_builtin(api_byom_event("byom_realtime_builtin")) cf_params = { "Capabilities": ["CAPABILITY_IAM"], "OnFailure": "DO_NOTHING", "Parameters": template_parameters, "RoleARN": "arn:aws:role:region:account:action", - "StackName": "teststack-testmodel-byompipelinereatimebuiltin", - "Tags": [{"Key": "stack_name", "Value": "teststack-testmodel-byompipelinereatimebuiltin"}], + "StackName": "teststack-testmodel-BYOMPipelineReatimeBuiltIn", + "Tags": [{"Key": "stack_name", "Value": "teststack-testmodel-BYOMPipelineReatimeBuiltIn"}], "TemplateURL": "https://testurl/blueprints/byom/byom_realtime_builtin_container.yaml", } return cf_params + + +@pytest.fixture +def expcted_update_response(stack_name): + return {"StackId": f"Pipeline {stack_name} is already provisioned. No updates are to be performed."} \ No newline at end of file diff --git a/source/lambdas/pipeline_orchestration/tests/test_helper.py b/source/lambdas/pipeline_orchestration/tests/test_helper.py new file mode 100644 index 0000000..6056725 --- /dev/null +++ b/source/lambdas/pipeline_orchestration/tests/test_helper.py @@ -0,0 +1,37 @@ +# ##################################################################################################################### +# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +import pytest +from shared.helper import get_built_in_model_monitor_container_uri, get_client, reset_client + + +_helpers_service_clients = dict() + + +def test_get_built_in_model_monitor_container_uri(): + assert ( + get_built_in_model_monitor_container_uri("us-east-1") + == "156813124566.dkr.ecr.us-east-1.amazonaws.com/sagemaker-model-monitor-analyzer:latest" + ) + + +@pytest.mark.parametrize("service,enpoint_url", [("s3", "https://s3"), ("cloudformation", "https://cloudformation")]) +def test_get_client(service, enpoint_url): + client = get_client(service) + assert enpoint_url in client.meta.endpoint_url + + +@pytest.mark.parametrize("service", ["s3", "cloudformation"]) +def test_reset_client(service): + get_client(service) + reset_client() + assert _helpers_service_clients == dict() \ No newline at end of file diff --git a/source/lambdas/pipeline_orchestration/tests/test_logger.py b/source/lambdas/pipeline_orchestration/tests/test_logger.py new file mode 100644 index 0000000..30c2ccc --- /dev/null +++ b/source/lambdas/pipeline_orchestration/tests/test_logger.py @@ -0,0 +1,39 @@ +# ##################################################################################################################### +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +import logging +import os +import pytest +from shared.logger import get_level, get_logger + + +@pytest.fixture(scope="function", autouse=True) +def rest_logger(): + if "LOG_LEVEL" in os.environ: + os.environ.pop("LOG_LEVEL") + + +@pytest.mark.parametrize("log_level", ["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]) +def test_get_level(log_level): + os.environ["LOG_LEVEL"] = log_level + assert get_level() == log_level + + +def test_default_level(): + os.environ["LOG_LEVEL"] = "no_supported" + assert get_level() == "WARNING" + + +def test_get_level_locally(): + logging.getLogger().handlers = [] + logger = get_logger(__name__) + assert logger.level == 0 diff --git a/source/lambdas/pipeline_orchestration/tests/test_pipeline_orchestration.py b/source/lambdas/pipeline_orchestration/tests/test_pipeline_orchestration.py index 763c341..1305f47 100644 --- a/source/lambdas/pipeline_orchestration/tests/test_pipeline_orchestration.py +++ b/source/lambdas/pipeline_orchestration/tests/test_pipeline_orchestration.py @@ -1,5 +1,5 @@ # ##################################################################################################################### -# Copyright 2020-2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # # # # Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # # with the License. A copy of the License is located at # @@ -13,25 +13,60 @@ import os import json import datetime +import boto3 +import tempfile import pytest from unittest.mock import patch from unittest import TestCase import botocore.session from botocore.stub import Stubber +from moto import mock_s3 +from pipeline_orchestration.lambda_helpers import ( + clean_param, + get_stack_name, + get_common_realtime_batch_params, + get_bacth_specific_params, + get_model_monitor_params, + get_image_builder_params, + format_template_parameters, + get_codepipeline_params, + upload_file_to_s3, + download_file_from_s3, + get_image_uri, + template_url, + get_stage_param, + create_template_zip_file, + get_realtime_specific_params, + get_template_parameters, + get_required_keys, + validate, +) from pipeline_orchestration.index import ( handler, provision_pipeline, + create_codepipeline_stack, + update_stack, pipeline_status, DateTimeEncoder, - get_template_parameters, - get_required_keys, - template_url, - validate, ) from shared.wrappers import BadRequest from tests.fixtures.orchestrator_fixtures import ( mock_env_variables, api_byom_event, + expected_params_realtime_custom, + expected_common_realtime_batch_params, + expected_realtime_specific_params, + expected_batch_specific_params, + stack_name, + api_monitor_event, + expcted_update_response, + expected_model_monitor_params, + required_api_image_builder, + expected_batch_params, + api_image_builder_event, + expected_image_builder_params, + expect_single_account_params_format, + expected_multi_account_params_format, required_api_byom_realtime_builtin, required_api_byom_batch_builtin, required_api_byom_realtime_custom, @@ -43,13 +78,15 @@ template_parameters_batch_builtin, template_parameters_realtime_custom, template_parameters_batch_custom, - generate_names, template_parameters_model_monitor, get_parameters_keys, cf_client_params, ) +content_type = "plain/text" + + def test_handler(): with patch("pipeline_orchestration.index.provision_pipeline") as mock_provision_pipeline: event = { @@ -77,7 +114,22 @@ def test_handler(): + "Check documentation for API & config formats." } ), - "headers": {"Content-Type": "plain/text"}, + "headers": {"Content-Type": content_type}, + } + + event = { + "httpMethod": "POST", + "path": "/doesnotexist", + "body": json.dumps({"test": "test"}), + } + response = handler(event, {}) + assert response == { + "statusCode": 400, + "isBase64Encoded": False, + "body": json.dumps( + {"message": "Unacceptable event path. Path must be /provisionpipeline or /pipelinestatus"} + ), + "headers": {"Content-Type": content_type}, } with patch("pipeline_orchestration.index.pipeline_status") as mock_pipeline_status: @@ -90,28 +142,163 @@ def test_handler(): mock_pipeline_status.assert_called_with(json.loads(event["body"])) -def test_provision_pipeline(cf_client_params, api_byom_event): +def test_clean_param(): + test_path = "path/to/prefix" + TestCase().assertEqual(clean_param(f"{test_path}/"), test_path) + TestCase().assertEqual(clean_param(test_path), test_path) + + +def test_template_url(): + url = "https://" + os.environ["BLUEPRINT_BUCKET_URL"] + "/blueprints/byom" + TestCase().assertEqual(template_url("byom_batch_custom"), "blueprints/byom/byom_batch_pipeline.yaml") + TestCase().assertEqual(template_url("single_account_codepipeline"), f"{url}/single_account_codepipeline.yaml") + with pytest.raises(Exception): + template_url("byom_not_supported") - client = botocore.session.get_session().create_client("cloudformation") - cp_client = botocore.session.get_session().create_client("codepipeline") +def test_provision_pipeline(api_image_builder_event, api_byom_event): + client = botocore.session.get_session().create_client("cloudformation") stubber = Stubber(client) - cp_stubber = Stubber(cp_client) - cfn_response = {"StackId": "1234"} - expected_params = cf_client_params expected_response = { "statusCode": 200, "isBase64Encoded": False, "body": json.dumps({"message": "success: stack creation started", "pipeline_id": "1234"}), - "headers": {"Content-Type": "plain/text"}, + "headers": {"Content-Type": content_type}, } - event = api_byom_event("realtime", "xgboost") - cfn_response = {"StackId": "1234"} - stubber.add_response("create_stack", cfn_response, expected_params) + # The stubber will be called twice + stubber.add_response("create_stack", {"StackId": "1234"}) + stubber.add_response("create_stack", {"StackId": "1234"}) with stubber: - with cp_stubber: + with mock_s3(): + event = api_image_builder_event response = provision_pipeline(event, client) assert response == expected_response + event = api_byom_event("byom_realtime_builtin") + s3_client = boto3.client("s3", region_name="us-east-1") + testfile = tempfile.NamedTemporaryFile() + s3_client.create_bucket(Bucket="testbucket") + upload_file_to_s3( + testfile.name, "testbucket", "blueprints/byom/byom_realtime_inference_pipeline.yaml", s3_client + ) + s3_client.create_bucket(Bucket="testassetsbucket") + response = provision_pipeline(event, client, s3_client) + assert response == expected_response + + +@mock_s3 +def test_upload_file_to_s3(): + s3_clinet = boto3.client("s3", region_name="us-east-1") + testfile = tempfile.NamedTemporaryFile() + s3_clinet.create_bucket(Bucket="assetsbucket") + upload_file_to_s3(testfile.name, "assetsbucket", os.environ["TESTFILE"], s3_clinet) + + +@mock_s3 +def test_download_file_from_s3(): + s3_clinet = boto3.client("s3", region_name="us-east-1") + testfile = tempfile.NamedTemporaryFile() + s3_clinet.create_bucket(Bucket="assetsbucket") + upload_file_to_s3(testfile.name, "assetsbucket", os.environ["TESTFILE"], s3_clinet) + download_file_from_s3("assetsbucket", os.environ["TESTFILE"], testfile.name, s3_clinet) + + +def test_create_codepipeline_stack(cf_client_params, stack_name, expcted_update_response): + cf_client = botocore.session.get_session().create_client("cloudformation") + not_image_satck = "teststack-testmodel-BYOMPipelineReatimeBuiltIn" + stubber = Stubber(cf_client) + expected_params = cf_client_params + cfn_response = {"StackId": "1234"} + + stubber.add_response("create_stack", cfn_response, expected_params) + with stubber: + response = create_codepipeline_stack( + not_image_satck, + expected_params["TemplateURL"], + expected_params["Parameters"], + cf_client, + ) + assert response["StackId"] == cfn_response["StackId"] + + stubber.add_client_error("create_stack", expected_params=expected_params) + + with stubber: + with pytest.raises(Exception): + create_codepipeline_stack( + not_image_satck, + expected_params["TemplateURL"], + expected_params["Parameters"], + cf_client, + ) + stubber.add_client_error("create_stack", service_message="already exists") + expected_response = {"StackId": f"Pipeline {not_image_satck} is already provisioned. Updating template parameters."} + with stubber: + response = create_codepipeline_stack( + not_image_satck, + expected_params["TemplateURL"], + expected_params["Parameters"], + cf_client, + ) + + assert response == expected_response + + # Test if the stack is image builder + stubber.add_client_error("create_stack", service_message="already exists") + stubber.add_client_error("update_stack", service_message="No updates are to be performed") + expected_response = expcted_update_response + with stubber: + response = create_codepipeline_stack( + stack_name, + expected_params["TemplateURL"], + expected_params["Parameters"], + cf_client, + ) + + assert response == expected_response + + +def test_update_stack(cf_client_params, stack_name, expcted_update_response): + cf_client = botocore.session.get_session().create_client("cloudformation") + + expected_params = cf_client_params + stubber = Stubber(cf_client) + expected_params["StackName"] = stack_name + expected_params["Tags"] = [{"Key": "stack_name", "Value": stack_name}] + del expected_params["OnFailure"] + cfn_response = {"StackId": f"Pipeline {stack_name} is being updated."} + + stubber.add_response("update_stack", cfn_response, expected_params) + + with stubber: + response = update_stack( + stack_name, + expected_params["TemplateURL"], + expected_params["Parameters"], + cf_client, + ) + assert response == cfn_response + + # Test for no update error + stubber.add_client_error("update_stack", service_message="No updates are to be performed") + expected_response = expcted_update_response + with stubber: + response = update_stack( + stack_name, + expected_params["TemplateURL"], + expected_params["Parameters"], + cf_client, + ) + assert response == expected_response + + # Test for other exceptions + stubber.add_client_error("update_stack", service_message="Some Exception") + with stubber: + with pytest.raises(Exception): + update_stack( + stack_name, + expected_params["TemplateURL"], + expected_params["Parameters"], + cf_client, + ) def test_pipeline_status(): @@ -189,8 +376,9 @@ def test_pipeline_status(): "statusCode": 200, "isBase64Encoded": False, "body": json.dumps(cp_response, indent=4, cls=DateTimeEncoder), - "headers": {"Content-Type": "plain/text"}, + "headers": {"Content-Type": content_type}, } + cfn_stubber.add_response("list_stack_resources", cfn_response, cfn_expected_params) cp_stubber.add_response("get_pipeline_state", cp_response, cp_expected_params) @@ -199,39 +387,91 @@ def test_pipeline_status(): response = pipeline_status(event, cfn_client=cfn_client, cp_client=cp_client) assert response == expected_response + # test codepipeline has not been created yet + no_cp_cfn_response = { + "StackResourceSummaries": [ + { + "ResourceType": "AWS::CodeBuild::Project", + "PhysicalResourceId": "testId", + "LogicalResourceId": "test", + "ResourceStatus": "test", + "LastUpdatedTimestamp": datetime.datetime(2000, 1, 1, 1, 1), + } + ] + } + + expected_response_no_cp = { + "statusCode": 200, + "isBase64Encoded": False, + "body": "pipeline cloudformation stack has not provisioned the pipeline yet.", + "headers": {"Content-Type": content_type}, + } + cfn_stubber.add_response("list_stack_resources", no_cp_cfn_response, cfn_expected_params) + + with cfn_stubber: + with cp_stubber: + response = pipeline_status(event, cfn_client=cfn_client, cp_client=cp_client) + assert response == expected_response_no_cp + + +def test_get_stack_name(api_byom_event, api_monitor_event, api_image_builder_event): + # realtime builtin pipeline + realtime_builtin = api_byom_event("byom_realtime_builtin") + assert ( + get_stack_name(realtime_builtin) + == f"mlops-pipeline-{realtime_builtin['model_name']}-byompipelinerealtimebuiltin" + ) + # batch builtin pipeline + batch_builtin = api_byom_event("byom_batch_builtin") + assert get_stack_name(batch_builtin) == f"mlops-pipeline-{batch_builtin['model_name']}-byompipelinebatchbuiltin" + + # model monitor pipeline + assert get_stack_name(api_monitor_event) == f"mlops-pipeline-{api_monitor_event['model_name']}-byommodelmonitor" + + # image builder pipeline + assert ( + get_stack_name(api_image_builder_event) + == f"mlops-pipeline-{api_image_builder_event['image_tag']}-byompipelineimagebuilder" + ) + def test_get_required_keys( - api_byom_event, - api_model_monitor_event, + api_byom_event, # NOSONAR:S107 this test function is designed to take many fixtures + api_monitor_event, required_api_byom_realtime_builtin, required_api_byom_batch_builtin, required_api_byom_realtime_custom, required_api_byom_batch_custom, required_api_keys_model_monitor, + required_api_image_builder, ): # Required keys in byom, realtime, builtin - returned_keys = get_required_keys(api_byom_event("realtime", "xgboost")) + returned_keys = get_required_keys("byom_realtime_builtin") expected_keys = required_api_byom_realtime_builtin TestCase().assertCountEqual(expected_keys, returned_keys) # Required keys in byom, batch, builtin - returned_keys = get_required_keys(api_byom_event("batch", "xgboost")) + returned_keys = get_required_keys("byom_batch_builtin") expected_keys = required_api_byom_batch_builtin TestCase().assertCountEqual(expected_keys, returned_keys) # Required keys in byom, realtime, custom - returned_keys = get_required_keys(api_byom_event("realtime", "")) + returned_keys = get_required_keys("byom_realtime_custom") expected_keys = required_api_byom_realtime_custom TestCase().assertCountEqual(expected_keys, returned_keys) # Required keys in byom, batch, custom - returned_keys = get_required_keys(api_byom_event("batch", "")) + returned_keys = get_required_keys("byom_batch_custom") expected_keys = required_api_byom_batch_custom TestCase().assertCountEqual(expected_keys, returned_keys) # Required keys in model_monitor, default (no monitoring_type provided) - returned_keys = get_required_keys(api_model_monitor_event()) + returned_keys = get_required_keys("byom_model_monitor") expected_keys = required_api_keys_model_monitor() TestCase().assertCountEqual(expected_keys, returned_keys) # Required keys in model_monitor, with monitoring_type provided - returned_keys = get_required_keys(api_model_monitor_event("modelquality")) - expected_keys = required_api_keys_model_monitor(False) + returned_keys = get_required_keys("byom_model_monitor") + expected_keys = required_api_keys_model_monitor(True) + TestCase().assertCountEqual(expected_keys, returned_keys) + # Required keys in image builder + returned_keys = get_required_keys("byom_image_builder") + expected_keys = required_api_image_builder TestCase().assertCountEqual(expected_keys, returned_keys) # assert for exceptions with pytest.raises(BadRequest) as exceinfo: @@ -240,89 +480,160 @@ def test_get_required_keys( str(exceinfo.value) == "Bad request format. Pipeline type not supported. Check documentation for API & config formats" ) - with pytest.raises(BadRequest) as exceinfo: - get_required_keys({"pipeline_type": "model_monitor", "monitoring_type": "not_supported"}) - assert ( - str(exceinfo.value) - == "Bad request. MonitoringType supported are 'DataQuality'|'ModelQuality'|'ModelBias'|'ModelExplainability'" + + +def test_get_stage_param(api_byom_event): + single_event = api_byom_event("byom_realtime_custom", False) + TestCase().assertEqual(get_stage_param(single_event, "data_capture_location", None), "bucket/datacapture") + multi_event = api_byom_event("byom_realtime_custom", True) + TestCase().assertEqual(get_stage_param(multi_event, "data_capture_location", "dev"), "bucket/dev_datacapture") + + +def test_get_template_parameters( + api_byom_event, + api_image_builder_event, + expected_params_realtime_custom, + expected_image_builder_params, + expected_batch_params, +): + single_event = api_byom_event("byom_realtime_custom", False) + TestCase().assertEqual(get_template_parameters(single_event, False), expected_params_realtime_custom) + TestCase().assertEqual(get_template_parameters(api_image_builder_event, False), expected_image_builder_params) + TestCase().assertEqual( + get_template_parameters(api_byom_event("byom_batch_custom", False), False), + expected_batch_params, ) - with pytest.raises(BadRequest) as exceinfo: - get_required_keys({"pipeline_type": "byom"}) - assert str(exceinfo.value) == "Bad request. missing keys for byom" + # test for exception + with pytest.raises(BadRequest): + get_template_parameters({"pipeline_type": "unsupported"}, False) -def test_template_url(): - # model monitor CF template - assert ( - template_url("", "", "model_monitor") - == "https://" + os.environ["BLUEPRINT_BUCKET_URL"] + "/blueprints/byom/model_monitor.yaml" + +def test_get_common_realtime_batch_params(api_byom_event, expected_common_realtime_batch_params): + realtime_event = api_byom_event("byom_realtime_custom", False) + batch_event = api_byom_event("byom_batch_custom", False) + realtime_event.update(batch_event) + TestCase().assertEqual( + get_common_realtime_batch_params(realtime_event, False, None), expected_common_realtime_batch_params ) - # realtime/builtin CF template - assert ( - template_url("realtime", "", "byom") - == "https://" + os.environ["BLUEPRINT_BUCKET_URL"] + "/blueprints/byom/byom_realtime_builtin_container.yaml" + + +def test_get_realtime_specific_params(api_byom_event, expected_realtime_specific_params): + realtime_event = api_byom_event("byom_realtime_builtin", False) + TestCase().assertEqual(get_realtime_specific_params(realtime_event, None), expected_realtime_specific_params) + + +def test_get_bacth_specific_params(api_byom_event, expected_batch_specific_params): + batch_event = api_byom_event("byom_batch_custom", False) + TestCase().assertEqual(get_bacth_specific_params(batch_event, None), expected_batch_specific_params) + + +def test_get_model_monitor_params(api_monitor_event, expected_model_monitor_params): + TestCase().assertEqual( + len(get_model_monitor_params(api_monitor_event, "us-east-1", None)), len(expected_model_monitor_params) ) - # batch/builtin CF template - assert ( - template_url("batch", "", "byom") - == "https://" + os.environ["BLUEPRINT_BUCKET_URL"] + "/blueprints/byom/byom_batch_builtin_container.yaml" + + +def test_get_image_builder_params(api_image_builder_event, expected_image_builder_params): + TestCase().assertEqual(get_image_builder_params(api_image_builder_event), expected_image_builder_params) + + +def test_format_template_parameters( + expected_image_builder_params, expected_multi_account_params_format, expect_single_account_params_format +): + TestCase().assertEqual( + format_template_parameters(expected_image_builder_params, "True"), expected_multi_account_params_format ) - # realtime/custom CF template - assert ( - template_url("realtime", "my_custom_image.zip", "byom") - == "https://" + os.environ["BLUEPRINT_BUCKET_URL"] + "/blueprints/byom/byom_realtime_build_container.yaml" + TestCase().assertEqual( + format_template_parameters(expected_image_builder_params, "False"), expect_single_account_params_format ) - # batch/custom CF template - assert ( - template_url("batch", "my_custom_image.zip", "byom") - == "https://" + os.environ["BLUEPRINT_BUCKET_URL"] + "/blueprints/byom/byom_batch_build_container.yaml" + + +@patch("lambda_helpers.sagemaker.image_uris.retrieve") +def test_get_image_uri(mocked_sm, api_byom_event): + custom_event = api_byom_event("byom_realtime_custom", False) + TestCase().assertEqual(get_image_uri("byom_realtime_custom", custom_event, "us-east-1"), "custom-image-uri") + mocked_sm.return_value = "test-imge-uri" + builtin_event = api_byom_event("byom_realtime_builtin", False) + TestCase().assertEqual(get_image_uri("byom_realtime_builtin", builtin_event, "us-east-1"), "test-imge-uri") + mocked_sm.assert_called_with( + framework=builtin_event.get("model_framework"), + region="us-east-1", + version=builtin_event.get("model_framework_version"), ) - # assert for exceptions - with pytest.raises(BadRequest) as exceinfo: - template_url("notsupported", "my_custom_image.zip", "byom") - assert str(exceinfo.value) == "Bad request format. Inference type must be 'realtime' or 'batch'" -def test_get_template_parameters( - template_parameters_realtime_builtin, - template_parameters_batch_builtin, - template_parameters_realtime_custom, - template_parameters_batch_custom, - template_parameters_model_monitor, - api_byom_event, - api_model_monitor_event, - get_parameters_keys, +@patch("boto3.client") +@patch("builtins.open") +@patch("lambda_helpers.shutil.make_archive") +@patch("lambda_helpers.write_params_to_json") +@patch("lambda_helpers.format_template_parameters") +@patch("lambda_helpers.get_template_parameters") +@patch("index.os.makedirs") +@patch("index.os.path.exists") +def test_create_template_zip_file( + mocked_path, # NOSONAR:S107 this test function is designed to take many fixtures + mocked_mkdir, + mocked_get_template, + mocked_format, + mocked_wrire, + mocked_shutil, + mocked_open, + mocked_client, + api_monitor_event, ): - # assert template parameters: realtime/builtin - _, returned_parameters = get_template_parameters(api_byom_event("realtime", "xgboost")) - expected_parameters = template_parameters_realtime_builtin(api_byom_event("realtime", "xgboost")) - TestCase().assertCountEqual(get_parameters_keys(expected_parameters), get_parameters_keys(returned_parameters)) - # assert template parameters: batch/builtin - _, returned_parameters = get_template_parameters(api_byom_event("batch", "xgboost")) - expected_parameters = template_parameters_batch_builtin(api_byom_event("batch", "xgboost")) - TestCase().assertCountEqual(get_parameters_keys(expected_parameters), get_parameters_keys(returned_parameters)) - # assert template parameters: realtime/custom - _, returned_parameters = get_template_parameters(api_byom_event("realtime", "")) - expected_parameters = template_parameters_realtime_custom(api_byom_event("realtime", "")) - TestCase().assertCountEqual(get_parameters_keys(expected_parameters), get_parameters_keys(returned_parameters)) - # assert template parameters: batch/custom - _, returned_parameters = get_template_parameters(api_byom_event("batch", "")) - expected_parameters = template_parameters_batch_custom(api_byom_event("batch", "")) - TestCase().assertCountEqual(get_parameters_keys(expected_parameters), get_parameters_keys(returned_parameters)) - # assert template parameters: model monitor - _, returned_parameters = get_template_parameters(api_model_monitor_event()) - expected_parameters = template_parameters_model_monitor(api_model_monitor_event()) - TestCase().assertCountEqual(get_parameters_keys(expected_parameters), get_parameters_keys(returned_parameters)) + mocked_path.return_value = False + s3_clinet = boto3.client("s3", region_name="us-east-1") + # multi account + create_template_zip_file( + api_monitor_event, "blueprint", "assets_bucket", "byom/template.yaml", "zipfile", "True", s3_clinet + ) + # single account + create_template_zip_file( + api_monitor_event, "blueprint", "assets_bucket", "byom/template.yaml", "zipfile", "False", s3_clinet + ) + + +def test_get_codepipeline_params(): + common_params = [ + ("NOTIFICATIONEMAIL", "test@example.com"), + ("TEMPLATEZIPNAME", "template_zip_name"), + ("TEMPLATEFILENAME", "template_file_name"), + ("ASSETSBUCKET", "testassetsbucket"), + ("STACKNAME", "stack_name"), + ] + # multi account codepipeline + TestCase().assertEqual( + get_codepipeline_params("True", "stack_name", "template_zip_name", "template_file_name"), + common_params + + [ + ("DEVPARAMSNAME", "dev_template_params.json"), + ("STAGINGPARAMSNAME", "staging_template_params.json"), + ("PRODPARAMSNAME", "prod_template_params.json"), + ("DEVACCOUNTID", "dev_account_id"), + ("DEVORGID", "dev_org_id"), + ("STAGINGACCOUNTID", "staging_account_id"), + ("STAGINGORGID", "staging_org_id"), + ("PRODACCOUNTID", "prod_account_id"), + ("PRODORGID", "prod_org_id"), + ("BLUEPRINTBUCKET", "testbucket"), + ], + ) + # single account codepipeline + TestCase().assertEqual( + get_codepipeline_params("False", "stack_name", "template_zip_name", "template_file_name"), + common_params + [("TEMPLATEPARAMSNAME", "template_params.json")], + ) def test_validate(api_byom_event): # event with required keys - valid_event = api_byom_event("batch", "xgboost") + valid_event = api_byom_event("byom_realtime_builtin") TestCase().assertDictEqual(validate(valid_event), valid_event) # event with missing required keys - bad_event = api_byom_event("batch", "xgboost") + bad_event = api_byom_event("byom_batch_builtin") # remove required key del bad_event["model_artifact_location"] with pytest.raises(BadRequest) as execinfo: validate(bad_event) - assert str(execinfo.value) == "Bad request. API body does not have the necessary parameter: model_artifact_location" \ No newline at end of file + assert str(execinfo.value) == "Bad request. API body does not have the necessary parameter: model_artifact_location" diff --git a/source/lambdas/solution_helper/test_lambda_function.py b/source/lambdas/solution_helper/test_lambda_function.py index fcf8fc7..e6e3838 100644 --- a/source/lambdas/solution_helper/test_lambda_function.py +++ b/source/lambdas/solution_helper/test_lambda_function.py @@ -13,6 +13,8 @@ import unittest, requests from unittest import mock +import pytest +from lambda_function import handler def mocked_requests_post(*args, **kwargs): @@ -70,6 +72,16 @@ def test_send_metrics_successful(self, mock_post): {"Foo": "Bar", "RequestType": "Create", "gitSelected": "Test", "bucketSelected": "test-bucket"}, ) + # no values provided + event["ResourceProperties"].update({"gitSelected": ""}) + event["ResourceProperties"].update({"bucketSelected": ""}) + custom_resource(event, None) + actual_payload = mock_post.call_args.kwargs["json"] + self.assertEqual( + actual_payload["Data"], + {"Foo": "Bar", "RequestType": "Create", "gitSelected": "", "bucketSelected": ""}, + ) + @mock.patch("requests.post") def test_send_metrics_connection_error(self, mock_post): mock_post.side_effect = requests.exceptions.ConnectionError() @@ -116,3 +128,8 @@ def test_sanitize_data(self): actual_response = _sanitize_data(resource_properties) self.assertCountEqual(expected_response, actual_response) + + @mock.patch("lambda_function.helper") + def test_helper(self, mocked_helper): + handler({}, {}) + mocked_helper.assert_called() diff --git a/source/lib/aws_mlops_stack.py b/source/lib/aws_mlops_stack.py index b678cc0..746eac7 100644 --- a/source/lib/aws_mlops_stack.py +++ b/source/lib/aws_mlops_stack.py @@ -14,6 +14,7 @@ from aws_cdk import ( aws_iam as iam, aws_s3 as s3, + aws_ecr as ecr, aws_lambda as lambda_, aws_codepipeline as codepipeline, aws_codepipeline_actions as codepipeline_actions, @@ -27,72 +28,65 @@ from lib.blueprints.byom.pipeline_definitions.helpers import ( suppress_s3_access_policy, apply_secure_bucket_policy, + suppress_lambda_policies, ) +from lib.blueprints.byom.pipeline_definitions.templates_parameters import ( + create_notification_email_parameter, + create_git_address_parameter, + create_existing_bucket_parameter, + create_existing_ecr_repo_parameter, + create_account_id_parameter, + create_org_id_parameter, + create_git_address_provided_condition, + create_existing_bucket_provided_condition, + create_existing_ecr_provided_condition, + create_new_bucket_condition, + create_new_ecr_repo_condition, +) +from lib.blueprints.byom.pipeline_definitions.deploy_actions import sagemaker_layer class MLOpsStack(core.Stack): - def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: + def __init__(self, scope: core.Construct, id: str, *, multi_account=False, **kwargs) -> None: super().__init__(scope, id, **kwargs) - # Get stack parameters: email and repo address - notification_email = core.CfnParameter( - self, - "Email Address", - type="String", - description="Specify an email to receive notifications about pipeline outcomes.", - allowed_pattern="^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$", - min_length=5, - max_length=320, - constraint_description="Please enter an email address with correct format (example@exmaple.com)", - ) - git_address = core.CfnParameter( - self, - "CodeCommit Repo Address", - type="String", - description="AWS CodeCommit repository clone URL to connect to the framework.", - allowed_pattern=( - "^(((https:\/\/|ssh:\/\/)(git\-codecommit)\.[a-zA-Z0-9_.+-]+(amazonaws\.com\/)[a-zA-Z0-9-.]" - "+(\/)[a-zA-Z0-9-.]+(\/)[a-zA-Z0-9-.]+$)|^$)" - ), - min_length=0, - max_length=320, - constraint_description=( - "CodeCommit address must follow the pattern: ssh or " - "https://git-codecommit.REGION.amazonaws.com/version/repos/REPONAME" - ), - ) - + # Get stack parameters: + notification_email = create_notification_email_parameter(self) + git_address = create_git_address_parameter(self) # Get the optional S3 assets bucket to use - existing_bucket = core.CfnParameter( - self, - "ExistingS3Bucket", - type="String", - description="Name of existing S3 bucket to be used for ML assests. S3 Bucket must be in the same region as the deployed stack, and has versioning enabled. If not provided, a new S3 bucket will be created.", - allowed_pattern="((?=^.{3,63}$)(?!^(\d+\.)+\d+$)(^(([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])\.)*([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])$)|^$)", - min_length=0, - max_length=63, - ) + existing_bucket = create_existing_bucket_parameter(self) + # Get the optional S3 assets bucket to use + existing_ecr_repo = create_existing_ecr_repo_parameter(self) + # create only if multi_account template + if multi_account: + # create development parameters + account_type = "development" + dev_account_id = create_account_id_parameter(self, "DEV_ACCOUNT_ID", account_type) + dev_org_id = create_org_id_parameter(self, "DEV_ORG_ID", account_type) + # create staging parameters + account_type = "staging" + staging_account_id = create_account_id_parameter(self, "STAGING_ACCOUNT_ID", account_type) + staging_org_id = create_org_id_parameter(self, "STAGING_ORG_ID", account_type) + # create production parameters + account_type = "production" + prod_account_id = create_account_id_parameter(self, "PROD_ACCOUNT_ID", account_type) + prod_org_id = create_org_id_parameter(self, "PROD_ORG_ID", account_type) # Conditions - git_address_provided = core.CfnCondition( - self, - "GitAddressProvided", - expression=core.Fn.condition_not(core.Fn.condition_equals(git_address, "")), - ) + git_address_provided = create_git_address_provided_condition(self, git_address) # client provided an existing S3 bucket name, to be used for assets - existing_bucket_provided = core.CfnCondition( - self, - "S3BucketProvided", - expression=core.Fn.condition_not(core.Fn.condition_equals(existing_bucket.value_as_string.strip(), "")), - ) + existing_bucket_provided = create_existing_bucket_provided_condition(self, existing_bucket) + + # client provided an existing Amazon ECR name + existing_ecr_provided = create_existing_ecr_provided_condition(self, existing_ecr_repo) # S3 bucket needs to be created for assets - create_new_bucket = core.CfnCondition( - self, - "CreateS3Bucket", - expression=core.Fn.condition_equals(existing_bucket.value_as_string.strip(), ""), - ) + create_new_bucket = create_new_bucket_condition(self, existing_bucket) + + # Amazon ECR repo needs too be created for custom Algorithms + create_new_ecr_repo = create_new_ecr_repo_condition(self, existing_ecr_repo) + # Constants pipeline_stack_name = "mlops-pipeline" @@ -110,7 +104,8 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: # This is a logging bucket. access_logs_bucket.node.default_child.cfn_options.metadata = suppress_s3_access_policy() - # Import user provide S3 bucket, if any. s3.Bucket.from_bucket_arn is used instead of s3.Bucket.from_bucket_name to allow cross account bucket. + # Import user provide S3 bucket, if any. s3.Bucket.from_bucket_arn is used instead of + # s3.Bucket.from_bucket_name to allow cross account bucket. client_existing_bucket = s3.Bucket.from_bucket_arn( self, "ClientExistingBucket", @@ -120,6 +115,14 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: # Create the resource if existing_bucket_provided condition is True core.Aspects.of(client_existing_bucket).add(ConditionalResources(existing_bucket_provided)) + # Import user provided Amazon ECR repository + + client_erc_repo = ecr.Repository.from_repository_name( + self, "ClientExistingECRReo", existing_ecr_repo.value_as_string + ) + # Create the resource if existing_ecr_provided condition is True + core.Aspects.of(client_erc_repo).add(ConditionalResources(existing_ecr_provided)) + # Creating assets bucket so that users can upload ML Models to it. assets_bucket = s3.Bucket( self, @@ -133,6 +136,23 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: # Apply secure transport bucket policy apply_secure_bucket_policy(assets_bucket) + s3_actions = ["s3:GetObject", "s3:ListBucket"] + # if multi account + if multi_account: + # add permissions for other accounts to access the assets bucket + + assets_bucket.add_to_resource_policy( + iam.PolicyStatement( + effect=iam.Effect.ALLOW, + actions=s3_actions, + principals=[ + iam.AccountPrincipal(dev_account_id.value_as_string), + iam.AccountPrincipal(staging_account_id.value_as_string), + iam.AccountPrincipal(prod_account_id.value_as_string), + ], + resources=[assets_bucket.bucket_arn, f"{assets_bucket.bucket_arn}/*"], + ) + ) # Create the resource if create_new_bucket condition is True core.Aspects.of(assets_bucket).add(ConditionalResources(create_new_bucket)) @@ -144,6 +164,46 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: assets_bucket.bucket_name, ).to_string() + # Creating Amazon ECR repository + ecr_repo = ecr.Repository(self, "ECRRepo", image_scan_on_push=True) + + # if multi account + if multi_account: + # add permissios to other account to pull images + ecr_repo.add_to_resource_policy( + iam.PolicyStatement( + effect=iam.Effect.ALLOW, + actions=[ + "ecr:DescribeImages", + "ecr:DescribeRepositories", + "ecr:GetDownloadUrlForLayer", + "ecr:BatchGetImage", + "ecr:BatchCheckLayerAvailability", + ], + principals=[ + iam.AccountPrincipal(dev_account_id.value_as_string), + iam.AccountPrincipal(staging_account_id.value_as_string), + iam.AccountPrincipal(prod_account_id.value_as_string), + ], + ) + ) + # Create the resource if create_new_ecr condition is True + core.Aspects.of(ecr_repo).add(ConditionalResources(create_new_ecr_repo)) + + # Get ECR repo's name based on the condition + ecr_repo_name = core.Fn.condition_if( + existing_ecr_provided.logical_id, + client_erc_repo.repository_name, + ecr_repo.repository_name, + ).to_string() + + # Get ECR repo's arn based on the condition + ecr_repo_arn = core.Fn.condition_if( + existing_ecr_provided.logical_id, + client_erc_repo.repository_arn, + ecr_repo.repository_arn, + ).to_string() + blueprints_bucket_name = "blueprint-repository-" + str(uuid.uuid4()) blueprint_repository_bucket = s3.Bucket( self, @@ -156,6 +216,22 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: # Apply secure transport bucket policy apply_secure_bucket_policy(blueprint_repository_bucket) + # if multi account + if multi_account: + # add permissions for other accounts to access the blueprint bucket + blueprint_repository_bucket.add_to_resource_policy( + iam.PolicyStatement( + effect=iam.Effect.ALLOW, + actions=s3_actions, + principals=[ + iam.AccountPrincipal(dev_account_id.value_as_string), + iam.AccountPrincipal(staging_account_id.value_as_string), + iam.AccountPrincipal(prod_account_id.value_as_string), + ], + resources=[blueprint_repository_bucket.bucket_arn, f"{blueprint_repository_bucket.bucket_arn}/*"], + ) + ) + # Custom resource to copy source bucket content to blueprints bucket custom_resource_lambda_fn = lambda_.Function( self, @@ -172,16 +248,8 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: }, timeout=core.Duration.seconds(60), ) - custom_resource_lambda_fn.node.default_child.cfn_options.metadata = { - "cfn_nag": { - "rules_to_suppress": [ - { - "id": "W58", - "reason": "The lambda functions role already has permissions to write cloudwatch logs", - } - ] - } - } + + custom_resource_lambda_fn.node.default_child.cfn_options.metadata = suppress_lambda_policies() blueprint_repository_bucket.grant_write(custom_resource_lambda_fn) custom_resource = core.CustomResource( self, @@ -195,6 +263,7 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: "mlopscloudformationrole", assumed_by=iam.ServicePrincipal("cloudformation.amazonaws.com"), ) + lambda_invoke_action = "lambda:InvokeFunction" # Cloudformation policy setup orchestrator_policy = iam.Policy( self, @@ -231,15 +300,12 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: iam.PolicyStatement( actions=[ "ecr:CreateRepository", - "ecr:DeleteRepository", "ecr:DescribeRepositories", ], - # The * is needed in front of awsmlopsmodels because AWS-ECR CDK adds - # the stack's name in front of the ECR repository's name resources=[ ( f"arn:{core.Aws.PARTITION}:ecr:{core.Aws.REGION}:" - f"{core.Aws.ACCOUNT_ID}:repository/{pipeline_stack_name}*-awsmlopsmodels*" + f"{core.Aws.ACCOUNT_ID}:repository/{ecr_repo_name}" ) ], ), @@ -268,7 +334,7 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: actions=[ "lambda:CreateFunction", "lambda:DeleteFunction", - "lambda:InvokeFunction", + lambda_invoke_action, "lambda:PublishLayerVersion", "lambda:DeleteLayerVersion", "lambda:GetLayerVersion", @@ -284,7 +350,7 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: ], ), iam.PolicyStatement( - actions=["s3:GetObject"], + actions=s3_actions, resources=[ blueprint_repository_bucket.bucket_arn, blueprint_repository_bucket.arn_for_objects("*"), @@ -294,6 +360,7 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: iam.PolicyStatement( actions=[ "codepipeline:CreatePipeline", + "codepipeline:UpdatePipeline", "codepipeline:DeletePipeline", "codepipeline:GetPipeline", "codepipeline:GetPipelineState", @@ -338,7 +405,13 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: "s3:PutBucketPublicAccessBlock", "s3:PutBucketLogging", ], - resources=["arn:" + core.Aws.PARTITION + ":s3:::*"], + resources=[f"arn:{core.Aws.PARTITION}:s3:::*"], + ), + iam.PolicyStatement( + actions=[ + "s3:PutObject", + ], + resources=[f"arn:{core.Aws.PARTITION}:s3:::{assets_s3_bucket_name}/*"], ), iam.PolicyStatement( actions=[ @@ -352,12 +425,8 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: resources=[ ( f"arn:{core.Aws.PARTITION}:sns:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:" - f"{pipeline_stack_name}*-PipelineNotification*" - ), - ( - f"arn:{core.Aws.PARTITION}:sns:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:" - f"{pipeline_stack_name}*-ModelMonitorPipelineNotification*" - ), + f"{pipeline_stack_name}*-*PipelineNotification*" + ) ], ), iam.PolicyStatement( @@ -380,6 +449,10 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: # Lambda function IAM setup lambda_passrole_policy = iam.PolicyStatement(actions=["iam:passrole"], resources=[cloudformation_role.role_arn]) + # create sagemaker layer + sm_layer = sagemaker_layer(self, blueprint_repository_bucket) + # make sure the sagemaker code is uploaded first to the blueprints bucket + sm_layer.node.add_dependency(custom_resource) # API Gateway and lambda setup to enable provisioning pipelines through API calls provisioner_apigw_lambda = aws_apigateway_lambda.ApiGatewayToLambda( self, @@ -388,6 +461,8 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: "runtime": lambda_.Runtime.PYTHON_3_8, "handler": "index.handler", "code": lambda_.Code.from_asset("lambdas/pipeline_orchestration"), + "layers": [sm_layer], + "timeout": core.Duration.minutes(10), }, api_gateway_props={ "defaultMethodOptions": { @@ -399,6 +474,9 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: }, ) + # add lambda supressions + provisioner_apigw_lambda.lambda_function.node.default_child.cfn_options.metadata = suppress_lambda_policies() + provision_resource = provisioner_apigw_lambda.api_gateway.root.add_resource("provisionpipeline") provision_resource.add_method("POST") status_resource = provisioner_apigw_lambda.api_gateway.root.add_resource("pipelinestatus") @@ -406,20 +484,7 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: blueprint_repository_bucket.grant_read(provisioner_apigw_lambda.lambda_function) provisioner_apigw_lambda.lambda_function.add_to_role_policy(lambda_passrole_policy) orchestrator_policy.attach_to_role(provisioner_apigw_lambda.lambda_function.role) - provisioner_apigw_lambda.lambda_function.add_to_role_policy( - iam.PolicyStatement(actions=["xray:PutTraceSegments"], resources=["*"]) - ) - lambda_node = provisioner_apigw_lambda.lambda_function.node.default_child - lambda_node.cfn_options.metadata = { - "cfn_nag": { - "rules_to_suppress": [ - { - "id": "W12", - "reason": "The xray permissions PutTraceSegments is not able to be bound to resources.", - } - ] - } - } + # Environment variables setup provisioner_apigw_lambda.lambda_function.add_environment( key="BLUEPRINT_BUCKET_URL", @@ -439,6 +504,34 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: provisioner_apigw_lambda.lambda_function.add_environment( key="NOTIFICATION_EMAIL", value=notification_email.value_as_string ) + provisioner_apigw_lambda.lambda_function.add_environment(key="REGION", value=core.Aws.REGION) + provisioner_apigw_lambda.lambda_function.add_environment(key="IS_MULTI_ACCOUNT", value=str(multi_account)) + + # if multi account + if multi_account: + provisioner_apigw_lambda.lambda_function.add_environment( + key="DEV_ACCOUNT_ID", value=dev_account_id.value_as_string + ) + provisioner_apigw_lambda.lambda_function.add_environment(key="DEV_ORG_ID", value=dev_org_id.value_as_string) + + provisioner_apigw_lambda.lambda_function.add_environment( + key="STAGING_ACCOUNT_ID", value=staging_account_id.value_as_string + ) + provisioner_apigw_lambda.lambda_function.add_environment( + key="STAGING_ORG_ID", value=staging_org_id.value_as_string + ) + + provisioner_apigw_lambda.lambda_function.add_environment( + key="PROD_ACCOUNT_ID", value=prod_account_id.value_as_string + ) + provisioner_apigw_lambda.lambda_function.add_environment( + key="PROD_ORG_ID", value=prod_org_id.value_as_string + ) + + provisioner_apigw_lambda.lambda_function.add_environment(key="ECR_REPO_NAME", value=ecr_repo_name) + + provisioner_apigw_lambda.lambda_function.add_environment(key="ECR_REPO_ARN", value=ecr_repo_arn) + provisioner_apigw_lambda.lambda_function.add_environment(key="LOG_LEVEL", value="DEBUG") cfn_policy_for_lambda = orchestrator_policy.node.default_child cfn_policy_for_lambda.cfn_options.metadata = { @@ -511,13 +604,13 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: ) codecommit_pipeline.add_to_role_policy( iam.PolicyStatement( - actions=["lambda:InvokeFunction"], + actions=[lambda_invoke_action], resources=[provisioner_apigw_lambda.lambda_function.function_arn], ) ) codebuild_project.add_to_role_policy( iam.PolicyStatement( - actions=["lambda:InvokeFunction"], + actions=[lambda_invoke_action], resources=[provisioner_apigw_lambda.lambda_function.function_arn], ) ) @@ -538,11 +631,11 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: } # custom resource for operational metrics### - metricsMapping = core.CfnMapping(self, "AnonymousData", mapping={"SendAnonymousData": {"Data": "Yes"}}) + metrics_mapping = core.CfnMapping(self, "AnonymousData", mapping={"SendAnonymousData": {"Data": "Yes"}}) metrics_condition = core.CfnCondition( self, "AnonymousDatatoAWS", - expression=core.Fn.condition_equals(metricsMapping.find_in_map("SendAnonymousData", "Data"), "Yes"), + expression=core.Fn.condition_equals(metrics_mapping.find_in_map("SendAnonymousData", "Data"), "Yes"), ) helper_function = lambda_.Function( @@ -554,7 +647,8 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: timeout=core.Duration.seconds(60), ) - createIdFunction = core.CustomResource( + helper_function.node.default_child.cfn_options.metadata = suppress_lambda_policies() + create_id_function = core.CustomResource( self, "CreateUniqueID", service_token=helper_function.function_arn, @@ -562,13 +656,13 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: resource_type="Custom::CreateUUID", ) - sendDataFunction = core.CustomResource( + send_data_function = core.CustomResource( self, "SendAnonymousData", service_token=helper_function.function_arn, properties={ "Resource": "AnonymousMetric", - "UUID": createIdFunction.get_att_string("UUID"), + "UUID": create_id_function.get_att_string("UUID"), "gitSelected": git_address.value_as_string, "Region": core.Aws.REGION, "SolutionId": "SO0136", @@ -578,18 +672,8 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: ) core.Aspects.of(helper_function).add(ConditionalResources(metrics_condition)) - core.Aspects.of(createIdFunction).add(ConditionalResources(metrics_condition)) - core.Aspects.of(sendDataFunction).add(ConditionalResources(metrics_condition)) - helper_function.node.default_child.cfn_options.metadata = { - "cfn_nag": { - "rules_to_suppress": [ - { - "id": "W58", - "reason": "The lambda functions role already has permissions to write cloudwatch logs", - } - ] - } - } + core.Aspects.of(create_id_function).add(ConditionalResources(metrics_condition)) + core.Aspects.of(send_data_function).add(ConditionalResources(metrics_condition)) # If user chooses Git as pipeline provision type, create codepipeline with Git repo as source core.Aspects.of(repo).add(ConditionalResources(git_address_provided)) @@ -597,23 +681,53 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: core.Aspects.of(codebuild_project).add(ConditionalResources(git_address_provided)) # Create Template Interface + paramaters_list = [ + notification_email.logical_id, + git_address.logical_id, + existing_bucket.logical_id, + existing_ecr_repo.logical_id, + ] + + # if multi account + if multi_account: + paramaters_list.extend( + [ + dev_account_id.logical_id, + dev_org_id.logical_id, + staging_account_id.logical_id, + staging_org_id.logical_id, + prod_account_id.logical_id, + prod_org_id.logical_id, + ] + ) + + paramaters_labels = { + f"{notification_email.logical_id}": {"default": "Notification Email (Required)"}, + f"{git_address.logical_id}": {"default": "CodeCommit Repo URL Address (Optional)"}, + f"{existing_bucket.logical_id}": {"default": "Name of an Existing S3 Bucket (Optional)"}, + f"{existing_ecr_repo.logical_id}": {"default": "Name of an Existing Amazon ECR repository (Optional)"}, + } + + if multi_account: + paramaters_labels.update( + { + f"{dev_account_id.logical_id}": {"default": "Development Account ID (Required)"}, + f"{dev_org_id.logical_id}": {"default": "Development Account Organizational Unit ID (Required)"}, + f"{staging_account_id.logical_id}": {"default": "Staging Account ID (Required)"}, + f"{staging_org_id.logical_id}": {"default": "Staging Account Organizational Unit ID (Required)"}, + f"{prod_account_id.logical_id}": {"default": "Production Account ID (Required)"}, + f"{prod_org_id.logical_id}": {"default": "Production Account Organizational Unit ID (Required)"}, + } + ) self.template_options.metadata = { "AWS::CloudFormation::Interface": { "ParameterGroups": [ { "Label": {"default": "MLOps Framework Settings"}, - "Parameters": [ - notification_email.logical_id, - git_address.logical_id, - existing_bucket.logical_id, - ], + "Parameters": paramaters_list, } ], - "ParameterLabels": { - f"{notification_email.logical_id}": {"default": "Notification Email (Required)"}, - f"{git_address.logical_id}": {"default": "CodeCommit Repo URL Address (Optional)"}, - f"{existing_bucket.logical_id}": {"default": "Name of an Existing S3 Bucket (Optional)"}, - }, + "ParameterLabels": paramaters_labels, } } # Outputs # @@ -628,4 +742,16 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: id="AssetsBucket", value=f"https://s3.console.aws.amazon.com/s3/buckets/{assets_s3_bucket_name}", description="S3 Bucket to upload model artifact", + ) + core.CfnOutput( + self, + id="ECRRepoName", + value=ecr_repo_name, + description="Amazon ECR repository's name", + ) + core.CfnOutput( + self, + id="ECRRepoArn", + value=ecr_repo_arn, + description="Amazon ECR repository's arn", ) \ No newline at end of file diff --git a/source/lib/aws_sdk_config_aspect.py b/source/lib/aws_sdk_config_aspect.py new file mode 100644 index 0000000..4231fc6 --- /dev/null +++ b/source/lib/aws_sdk_config_aspect.py @@ -0,0 +1,28 @@ +# ##################################################################################################################### +# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +import jsii +import json +from aws_cdk.core import IAspect, IConstruct, Construct +from aws_cdk.aws_lambda import Function + + +@jsii.implements(IAspect) +class AwsSDKConfigAspect(Construct): + def __init__(self, scope: Construct, id: str, solution_id: str): + super().__init__(scope, id) + self.solution_id = solution_id + + def visit(self, node: IConstruct): + if isinstance(node, Function): + user_agent = json.dumps({"user_agent_extra": f"AwsSolution/{self.solution_id}/%%VERSION%%"}) + node.add_environment(key="AWS_SDK_USER_AGENT", value=user_agent) diff --git a/source/lib/blueprints/byom/byom_batch_build_container.py b/source/lib/blueprints/byom/byom_batch_build_container.py deleted file mode 100644 index 4c9fcb4..0000000 --- a/source/lib/blueprints/byom/byom_batch_build_container.py +++ /dev/null @@ -1,246 +0,0 @@ -# ##################################################################################################################### -# Copyright 2020-2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # -# # -# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # -# with the License. A copy of the License is located at # -# # -# http://www.apache.org/licenses/LICENSE-2.0 # -# # -# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # -# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # -# and limitations under the License. # -# ##################################################################################################################### -import uuid -from aws_cdk import ( - aws_iam as iam, - aws_s3 as s3, - aws_sns as sns, - aws_sns_subscriptions as subscriptions, - aws_codepipeline as codepipeline, - aws_events_targets as targets, - aws_events as events, - core, -) -from lib.blueprints.byom.pipeline_definitions.source_actions import source_action_custom -from lib.blueprints.byom.pipeline_definitions.build_actions import build_action -from lib.blueprints.byom.pipeline_definitions.deploy_actions import ( - create_model, - batch_transform, - sagemaker_layer, -) -from lib.blueprints.byom.pipeline_definitions.helpers import ( - suppress_assets_bucket, - pipeline_permissions, - suppress_iam_complex, - suppress_pipeline_bucket, - suppress_list_function_policy, - suppress_sns, -) - - -class BYOMBatchBuildStack(core.Stack): - def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: - super().__init__(scope, id, **kwargs) - - # Parameteres # - notification_email = core.CfnParameter( - self, - "NOTIFICATION_EMAIL", - type="String", - description="email for pipeline outcome notifications", - allowed_pattern="^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$", - constraint_description="Please enter an email address with correct format (example@exmaple.com)", - min_length=5, - max_length=320, - ) - blueprint_bucket_name = core.CfnParameter( - self, - "BLUEPRINT_BUCKET", - type="String", - description="Bucket name for blueprints of different types of ML Pipelines.", - min_length=3, - ) - assets_bucket_name = core.CfnParameter( - self, "ASSETS_BUCKET", type="String", description="Bucket name for access logs.", min_length=3 - ) - custom_container = core.CfnParameter( - self, - "CUSTOM_CONTAINER", - default="", - type="String", - description=( - "Should point to a zip file containing dockerfile and assets for building a custom model. " - "If empty it will beusing containers from SageMaker Registry" - ), - ) - model_framework = core.CfnParameter( - self, - "MODEL_FRAMEWORK", - default="", - type="String", - description="The ML framework which is used for training the model. E.g., xgboost, kmeans, etc.", - ) - model_framework_version = core.CfnParameter( - self, - "MODEL_FRAMEWORK_VERSION", - default="", - type="String", - description="The version of the ML framework which is used for training the model. E.g., 1.1-2", - ) - model_name = core.CfnParameter( - self, "MODEL_NAME", type="String", description="An arbitrary name for the model.", min_length=1 - ) - model_artifact_location = core.CfnParameter( - self, - "MODEL_ARTIFACT_LOCATION", - type="String", - description="Path to model artifact inside assets bucket.", - ) - inference_instance = core.CfnParameter( - self, - "INFERENCE_INSTANCE", - type="String", - description="Inference instance that inference requests will be running on. E.g., ml.m5.large", - allowed_pattern="^[a-zA-Z0-9_.+-]+\.[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$", - min_length=7, - ) - inference_type = core.CfnParameter( - self, - "INFERENCE_TYPE", - type="String", - allowed_values=["batch", "realtime"], - default="realtime", - description="Type of inference. Possible values: batch | realtime", - ) - batch_inference_data = core.CfnParameter( - self, - "BATCH_INFERENCE_DATA", - type="String", - default="", - description=( - "Location of batch inference data if inference type is set to batch. Otherwise, can be left empty." - ), - ) - - # Resources # - assets_bucket = s3.Bucket.from_bucket_name(self, "AssetsBucket", assets_bucket_name.value_as_string) - # getting blueprint bucket object from its name - will be used later in the stack - blueprint_bucket = s3.Bucket.from_bucket_name(self, "BlueprintBucket", blueprint_bucket_name.value_as_string) - - # Defining pipeline stages - # source stage - source_output, source_action_definition = source_action_custom( - model_artifact_location, assets_bucket, custom_container - ) - - # build stage - build_action_definition, container_uri = build_action(self, source_output) - - # deploy stage - sm_layer = sagemaker_layer(self, blueprint_bucket) - # creating a sagemaker model - model_lambda_arn, create_model_definition = create_model( - self, - blueprint_bucket, - assets_bucket, - model_name, - model_artifact_location, - custom_container, - model_framework, - model_framework_version, - container_uri, - sm_layer, - ) - # creating a batch transform job - batch_lambda_arn, batch_transform_definition = batch_transform( - self, - blueprint_bucket, - assets_bucket, - model_name, - inference_instance, - batch_inference_data, - sm_layer, - ) - - # create invoking lambda policy - invoke_lambdas_policy = iam.PolicyStatement( - actions=[ - "lambda:InvokeFunction", - ], - resources=[model_lambda_arn, batch_lambda_arn], - ) - - pipeline_notification_topic = sns.Topic( - self, - "PipelineNotification", - ) - pipeline_notification_topic.node.default_child.cfn_options.metadata = suppress_sns() - pipeline_notification_topic.add_subscription( - subscriptions.EmailSubscription(email_address=notification_email.value_as_string) - ) - - # createing pipeline stages - source_stage = codepipeline.StageProps(stage_name="Source", actions=[source_action_definition]) - build_stage = codepipeline.StageProps(stage_name="Build", actions=[build_action_definition]) - deploy_stage_batch = codepipeline.StageProps( - stage_name="Deploy", - actions=[create_model_definition, batch_transform_definition], - ) - batch_build_pipeline = codepipeline.Pipeline( - self, - "BYOMPipelineBatchBuild", - stages=[source_stage, build_stage, deploy_stage_batch], - cross_account_keys=False, - ) - batch_build_pipeline.on_state_change( - "NotifyUser", - description="Notify user of the outcome of the pipeline", - target=targets.SnsTopic( - pipeline_notification_topic, - message=events.RuleTargetInput.from_text( - ( - f"Pipeline {events.EventField.from_path('$.detail.pipeline')} finished executing. " - f"Pipeline execution result is {events.EventField.from_path('$.detail.state')}" - ) - ), - ), - event_pattern=events.EventPattern(detail={"state": ["SUCCEEDED", "FAILED"]}), - ) - batch_build_pipeline.add_to_role_policy( - iam.PolicyStatement( - actions=["events:PutEvents"], - resources=[ - f"arn:{core.Aws.PARTITION}:events:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:event-bus/*", - ], - ) - ) - # add lambda permissons - batch_build_pipeline.add_to_role_policy(invoke_lambdas_policy) - - # Enhancement: This is to find CDK object nodes so that unnecessary cfn-nag warnings can be suppressed - # There is room for improving the method in future versions to find CDK nodes without having to use - # hardocded index numbers - pipeline_child_nodes = batch_build_pipeline.node.find_all() - pipeline_child_nodes[1].node.default_child.cfn_options.metadata = suppress_pipeline_bucket() - pipeline_child_nodes[6].node.default_child.cfn_options.metadata = suppress_iam_complex() - pipeline_child_nodes[13].node.default_child.cfn_options.metadata = suppress_list_function_policy() - pipeline_child_nodes[19].node.default_child.cfn_options.metadata = suppress_list_function_policy() - pipeline_child_nodes[25].node.default_child.cfn_options.metadata = suppress_list_function_policy() - pipeline_child_nodes[30].node.default_child.cfn_options.metadata = suppress_list_function_policy() - # attaching iam permissions to the pipelines - pipeline_permissions(batch_build_pipeline, assets_bucket) - - core.CfnOutput( - self, - id="Pipelines", - value=( - f"https://console.aws.amazon.com/codesuite/codepipeline/pipelines/" - f"{batch_build_pipeline.pipeline_name}/view?region={core.Aws.REGION}" - ), - ) - core.CfnOutput( - self, - id="BatchTransformOutputLocation", - value=f"https://s3.console.aws.amazon.com/s3/buckets/{assets_bucket.bucket_name}/batch_transform/output", - description="Output location of the batch transform. Output will be saved under the job name", - ) diff --git a/source/lib/blueprints/byom/byom_batch_builtin_container.py b/source/lib/blueprints/byom/byom_batch_builtin_container.py deleted file mode 100644 index 391c477..0000000 --- a/source/lib/blueprints/byom/byom_batch_builtin_container.py +++ /dev/null @@ -1,232 +0,0 @@ -# ##################################################################################################################### -# Copyright 2020-2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # -# # -# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # -# with the License. A copy of the License is located at # -# # -# http://www.apache.org/licenses/LICENSE-2.0 # -# # -# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # -# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # -# and limitations under the License. # -# ##################################################################################################################### -import uuid -from aws_cdk import ( - aws_iam as iam, - aws_s3 as s3, - aws_sns as sns, - aws_sns_subscriptions as subscriptions, - aws_events_targets as targets, - aws_events as events, - aws_codepipeline as codepipeline, - core, -) -from lib.blueprints.byom.pipeline_definitions.source_actions import source_action -from lib.blueprints.byom.pipeline_definitions.deploy_actions import ( - create_model, - batch_transform, - sagemaker_layer, -) -from lib.blueprints.byom.pipeline_definitions.helpers import ( - suppress_assets_bucket, - pipeline_permissions, - suppress_pipeline_bucket, - suppress_iam_complex, - suppress_list_function_policy, - suppress_sns, -) - - -class BYOMBatchBuiltinStack(core.Stack): - def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: - super().__init__(scope, id, **kwargs) - - # Parameteres # - notification_email = core.CfnParameter( - self, - "NOTIFICATION_EMAIL", - type="String", - description="email for pipeline outcome notifications", - allowed_pattern="^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$", - constraint_description="Please enter an email address with correct format (example@exmaple.com)", - min_length=5, - max_length=320, - ) - blueprint_bucket_name = core.CfnParameter( - self, - "BLUEPRINT_BUCKET", - type="String", - description="Bucket name for blueprints of different types of ML Pipelines.", - min_length=3, - ) - assets_bucket_name = core.CfnParameter( - self, "ASSETS_BUCKET", type="String", description="Bucket name for access logs.", min_length=3 - ) - custom_container = core.CfnParameter( - self, - "CUSTOM_CONTAINER", - default="", - type="String", - description=( - "Should point to a zip file containing dockerfile and assets for building a custom model. " - "If empty it will beusing containers from SageMaker Registry" - ), - ) - model_framework = core.CfnParameter( - self, - "MODEL_FRAMEWORK", - default="", - type="String", - description="The ML framework which is used for training the model. E.g., xgboost, kmeans, etc.", - ) - model_framework_version = core.CfnParameter( - self, - "MODEL_FRAMEWORK_VERSION", - default="", - type="String", - description="The version of the ML framework which is used for training the model. E.g., 1.1-2", - ) - model_name = core.CfnParameter( - self, "MODEL_NAME", type="String", description="An arbitrary name for the model.", min_length=1 - ) - model_artifact_location = core.CfnParameter( - self, - "MODEL_ARTIFACT_LOCATION", - type="String", - description="Path to model artifact inside assets bucket.", - ) - inference_instance = core.CfnParameter( - self, - "INFERENCE_INSTANCE", - type="String", - description="Inference instance that inference requests will be running on. E.g., ml.m5.large", - allowed_pattern="^[a-zA-Z0-9_.+-]+\.[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$", - min_length=7, - ) - batch_inference_data = core.CfnParameter( - self, - "BATCH_INFERENCE_DATA", - type="String", - default="", - description=( - "Location of batch inference data if inference type is set to batch. Otherwise, can be left empty." - ), - ) - - # Resources # - assets_bucket = s3.Bucket.from_bucket_name(self, "AssetsBucket", assets_bucket_name.value_as_string) - # getting blueprint bucket object from its name - will be used later in the stack - blueprint_bucket = s3.Bucket.from_bucket_name(self, "BlueprintBucket", blueprint_bucket_name.value_as_string) - - # Defining pipeline stages - # source stage - source_output, source_action_definition = source_action(model_artifact_location, assets_bucket) - - # deploy stage - sm_layer = sagemaker_layer(self, blueprint_bucket) - # creating a sagemaker model - model_lambda_arn, create_model_definition = create_model( - self, - blueprint_bucket, - assets_bucket, - model_name, - model_artifact_location, - custom_container, - model_framework, - model_framework_version, - "", - sm_layer, - ) - # creating a batch transform job - batch_lambda_arn, batch_transform_definition = batch_transform( - self, - blueprint_bucket, - assets_bucket, - model_name, - inference_instance, - batch_inference_data, - sm_layer, - ) - - # create invoking lambda policy - invoke_lambdas_policy = iam.PolicyStatement( - actions=[ - "lambda:InvokeFunction", - ], - resources=[model_lambda_arn, batch_lambda_arn], - ) - - pipeline_notification_topic = sns.Topic( - self, - "PipelineNotification", - ) - pipeline_notification_topic.node.default_child.cfn_options.metadata = suppress_sns() - pipeline_notification_topic.add_subscription( - subscriptions.EmailSubscription(email_address=notification_email.value_as_string) - ) - - # createing pipeline stages - source_stage = codepipeline.StageProps(stage_name="Source", actions=[source_action_definition]) - deploy_stage_batch = codepipeline.StageProps( - stage_name="Deploy", - actions=[create_model_definition, batch_transform_definition], - ) - - batch_nobuild_pipeline = codepipeline.Pipeline( - self, - "BYOMPipelineBatchBuiltIn", - stages=[source_stage, deploy_stage_batch], - cross_account_keys=False, - ) - pipeline_rule = batch_nobuild_pipeline.on_state_change( - "NotifyUser", - description="Notify user of the outcome of the pipeline", - target=targets.SnsTopic( - pipeline_notification_topic, - message=events.RuleTargetInput.from_text( - ( - f"Pipeline {events.EventField.from_path('$.detail.pipeline')} finished executing. " - f"Pipeline execution result is {events.EventField.from_path('$.detail.state')}" - ) - ), - ), - event_pattern=events.EventPattern(detail={"state": ["SUCCEEDED", "FAILED"]}), - ) - batch_nobuild_pipeline.add_to_role_policy( - iam.PolicyStatement( - actions=["events:PutEvents"], - resources=[ - f"arn:{core.Aws.PARTITION}:events:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:event-bus/*", - ], - ) - ) - # add lambda permissons - batch_nobuild_pipeline.add_to_role_policy(invoke_lambdas_policy) - - # Enhancement: This is to find CDK object nodes so that unnecessary cfn-nag warnings can be suppressed - # There is room for improving the method in future versions to find CDK nodes without having to use - # hardocded index numbers - pipeline_child_nodes = batch_nobuild_pipeline.node.find_all() - pipeline_child_nodes[1].node.default_child.cfn_options.metadata = suppress_pipeline_bucket() - pipeline_child_nodes[6].node.default_child.cfn_options.metadata = suppress_iam_complex() - pipeline_child_nodes[13].node.default_child.cfn_options.metadata = suppress_list_function_policy() - pipeline_child_nodes[19].node.default_child.cfn_options.metadata = suppress_list_function_policy() - pipeline_child_nodes[24].node.default_child.cfn_options.metadata = suppress_list_function_policy() - # attaching iam permissions to the pipeline - pipeline_permissions(batch_nobuild_pipeline, assets_bucket) - - # Outputs # - core.CfnOutput( - self, - id="Pipelines", - value=( - f"https://console.aws.amazon.com/codesuite/codepipeline/pipelines/" - f"{batch_nobuild_pipeline.pipeline_name}/view?region={core.Aws.REGION}" - ), - ) - core.CfnOutput( - self, - id="BatchTransformOutputLocation", - value=f"https://s3.console.aws.amazon.com/s3/buckets/{assets_bucket.bucket_name}/batch_transform/output", - description="Output location of the batch transform. Our will be saved under the job name", - ) diff --git a/source/lib/blueprints/byom/byom_batch_pipeline.py b/source/lib/blueprints/byom/byom_batch_pipeline.py new file mode 100644 index 0000000..1c8f32b --- /dev/null +++ b/source/lib/blueprints/byom/byom_batch_pipeline.py @@ -0,0 +1,158 @@ +# ##################################################################################################################### +# Copyright 2020-2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +from aws_cdk import ( + aws_s3 as s3, + core, +) +from lib.blueprints.byom.pipeline_definitions.deploy_actions import ( + batch_transform, + sagemaker_layer, + create_invoke_lambda_custom_resource, +) +from lib.blueprints.byom.pipeline_definitions.sagemaker_role import create_sagemaker_role +from lib.blueprints.byom.pipeline_definitions.sagemaker_model import create_sagemaker_model +from lib.blueprints.byom.pipeline_definitions.templates_parameters import ( + create_blueprint_bucket_name_parameter, + create_assets_bucket_name_parameter, + create_algorithm_image_uri_parameter, + create_batch_input_bucket_name_parameter, + create_batch_inference_data_parameter, + create_batch_job_output_location_parameter, + create_custom_algorithms_ecr_repo_arn_parameter, + create_inference_instance_parameter, + create_kms_key_arn_parameter, + create_model_artifact_location_parameter, + create_model_name_parameter, + create_custom_algorithms_ecr_repo_arn_provided_condition, + create_kms_key_arn_provided_condition, +) + + +class BYOMBatchStack(core.Stack): + def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: + super().__init__(scope, id, **kwargs) + + # Parameteres # + blueprint_bucket_name = create_blueprint_bucket_name_parameter(self) + assets_bucket_name = create_assets_bucket_name_parameter(self) + custom_algorithms_ecr_repo_arn = create_custom_algorithms_ecr_repo_arn_parameter(self) + kms_key_arn = create_kms_key_arn_parameter(self) + algorithm_image_uri = create_algorithm_image_uri_parameter(self) + model_name = create_model_name_parameter(self) + model_artifact_location = create_model_artifact_location_parameter(self) + inference_instance = create_inference_instance_parameter(self) + batch_input_bucket = create_batch_input_bucket_name_parameter(self) + batch_inference_data = create_batch_inference_data_parameter(self) + batch_job_output_location = create_batch_job_output_location_parameter(self) + + # Conditions + custom_algorithms_ecr_repo_arn_provided = create_custom_algorithms_ecr_repo_arn_provided_condition( + self, custom_algorithms_ecr_repo_arn + ) + kms_key_arn_provided = create_kms_key_arn_provided_condition(self, kms_key_arn) + + # Resources # + assets_bucket = s3.Bucket.from_bucket_name(self, "AssetsBucket", assets_bucket_name.value_as_string) + # getting blueprint bucket object from its name - will be used later in the stack + blueprint_bucket = s3.Bucket.from_bucket_name(self, "BlueprintBucket", blueprint_bucket_name.value_as_string) + + sm_layer = sagemaker_layer(self, blueprint_bucket) + # creating a sagemaker model + # create Sagemaker role + sagemaker_role = create_sagemaker_role( + self, + "MLOpsSagemakerBatchRole", + custom_algorithms_ecr_arn=custom_algorithms_ecr_repo_arn.value_as_string, + kms_key_arn=kms_key_arn.value_as_string, + assets_bucket_name=assets_bucket_name.value_as_string, + input_bucket_name=batch_input_bucket.value_as_string, + input_s3_location=batch_inference_data.value_as_string, + output_s3_location=batch_job_output_location.value_as_string, + ecr_repo_arn_provided_condition=custom_algorithms_ecr_repo_arn_provided, + kms_key_arn_provided_condition=kms_key_arn_provided, + ) + + # create sagemaker model + sagemaker_model = create_sagemaker_model( + self, + "MLOpsSagemakerModel", + execution_role=sagemaker_role, + primary_container={ + "image": algorithm_image_uri.value_as_string, + "modelDataUrl": f"s3://{assets_bucket_name.value_as_string}/{model_artifact_location.value_as_string}", + }, + tags=[{"key": "model_name", "value": model_name.value_as_string}], + ) + + # create batch tranform lambda + batch_transform_lambda = batch_transform( + self, + "BatchTranformLambda", + blueprint_bucket, + assets_bucket, + sagemaker_model.attr_model_name, + inference_instance.value_as_string, + batch_input_bucket.value_as_string, + batch_inference_data.value_as_string, + batch_job_output_location.value_as_string, + core.Fn.condition_if( + kms_key_arn_provided.logical_id, kms_key_arn.value_as_string, core.Aws.NO_VALUE + ).to_string(), + sm_layer, + ) + + # create custom resource to invoke the batch transform lambda + invoke_lambda_custom_resource = create_invoke_lambda_custom_resource( + self, + "InvokeBatchLambda", + batch_transform_lambda.function_arn, + batch_transform_lambda.function_name, + blueprint_bucket, + { + "Resource": "InvokeLambda", + "function_name": batch_transform_lambda.function_name, + "sagemaker_model_name": sagemaker_model.attr_model_name, + "model_name": model_name.value_as_string, + "inference_instance": inference_instance.value_as_string, + "algorithm_image": algorithm_image_uri.value_as_string, + "model_artifact": model_artifact_location.value_as_string, + "assets_bucket": assets_bucket.bucket_name, + "batch_inference_data": batch_inference_data.value_as_string, + "batch_job_output_location": batch_job_output_location.value_as_string, + "custom_algorithms_ecr_arn": custom_algorithms_ecr_repo_arn.value_as_string, + "kms_key_arn": kms_key_arn.value_as_string, + }, + ) + + invoke_lambda_custom_resource.node.add_dependency(batch_transform_lambda) + + core.CfnOutput( + self, + id="ModelName", + value=sagemaker_model.attr_model_name, + description="The name of the SageMaker model used by the batch transform job", + ) + + core.CfnOutput( + self, + id="BatchTransformJobName", + value=f"{sagemaker_model.attr_model_name}-batch-transform-*", + description="The name of the SageMaker batch transform job", + ) + + core.CfnOutput( + self, + id="BatchTransformOutputLocation", + value=f"https://s3.console.aws.amazon.com/s3/buckets/{batch_job_output_location.value_as_string}/", + description="Output location of the batch transform. Our will be saved under the job name", + ) diff --git a/source/lib/blueprints/byom/byom_custom_algorithm_image_builder.py b/source/lib/blueprints/byom/byom_custom_algorithm_image_builder.py new file mode 100644 index 0000000..29d6b1f --- /dev/null +++ b/source/lib/blueprints/byom/byom_custom_algorithm_image_builder.py @@ -0,0 +1,126 @@ +# ##################################################################################################################### +# Copyright 2020-2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +from aws_cdk import ( + aws_iam as iam, + aws_s3 as s3, + aws_sns as sns, + aws_sns_subscriptions as subscriptions, + aws_events_targets as targets, + aws_events as events, + aws_codepipeline as codepipeline, + core, +) +from lib.blueprints.byom.pipeline_definitions.source_actions import source_action_custom +from lib.blueprints.byom.pipeline_definitions.build_actions import build_action +from lib.blueprints.byom.pipeline_definitions.helpers import ( + pipeline_permissions, + suppress_pipeline_bucket, + suppress_iam_complex, + suppress_sns, +) +from lib.blueprints.byom.pipeline_definitions.templates_parameters import ( + create_notification_email_parameter, + create_assets_bucket_name_parameter, + create_custom_container_parameter, + create_ecr_repo_name_parameter, + create_image_tag_parameter, +) + + +class BYOMCustomAlgorithmImageBuilderStack(core.Stack): + def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: + super().__init__(scope, id, **kwargs) + + # Parameteres # + notification_email = create_notification_email_parameter(self) + assets_bucket_name = create_assets_bucket_name_parameter(self) + custom_container = create_custom_container_parameter(self) + ecr_repo_name = create_ecr_repo_name_parameter(self) + image_tag = create_image_tag_parameter(self) + + # Resources # + assets_bucket = s3.Bucket.from_bucket_name(self, "AssetsBucket", assets_bucket_name.value_as_string) + + # Defining pipeline stages + # source stage + source_output, source_action_definition = source_action_custom(assets_bucket, custom_container) + + # build stage + build_action_definition, container_uri = build_action( + self, ecr_repo_name.value_as_string, image_tag.value_as_string, source_output + ) + + pipeline_notification_topic = sns.Topic( + self, + "PipelineNotification", + ) + pipeline_notification_topic.node.default_child.cfn_options.metadata = suppress_sns() + pipeline_notification_topic.add_subscription( + subscriptions.EmailSubscription(email_address=notification_email.value_as_string) + ) + + # createing pipeline stages + source_stage = codepipeline.StageProps(stage_name="Source", actions=[source_action_definition]) + build_stage = codepipeline.StageProps(stage_name="Build", actions=[build_action_definition]) + + image_builder_pipeline = codepipeline.Pipeline( + self, + "BYOMPipelineReatimeBuild", + stages=[source_stage, build_stage], + cross_account_keys=False, + ) + image_builder_pipeline.on_state_change( + "NotifyUser", + description="Notify user of the outcome of the pipeline", + target=targets.SnsTopic( + pipeline_notification_topic, + message=events.RuleTargetInput.from_text( + ( + f"Pipeline {events.EventField.from_path('$.detail.pipeline')} finished executing. " + f"Pipeline execution result is {events.EventField.from_path('$.detail.state')}" + ) + ), + ), + event_pattern=events.EventPattern(detail={"state": ["SUCCEEDED", "FAILED"]}), + ) + + image_builder_pipeline.add_to_role_policy( + iam.PolicyStatement( + actions=["events:PutEvents"], + resources=[ + f"arn:{core.Aws.PARTITION}:events:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:event-bus/*", + ], + ) + ) + + # add cfn nag supressions + pipeline_child_nodes = image_builder_pipeline.node.find_all() + pipeline_child_nodes[1].node.default_child.cfn_options.metadata = suppress_pipeline_bucket() + pipeline_child_nodes[6].node.default_child.cfn_options.metadata = suppress_iam_complex() + # attaching iam permissions to the pipelines + pipeline_permissions(image_builder_pipeline, assets_bucket) + + # Outputs # + core.CfnOutput( + self, + id="Pipelines", + value=( + f"https://console.aws.amazon.com/codesuite/codepipeline/pipelines/" + f"{image_builder_pipeline.pipeline_name}/view?region={core.Aws.REGION}" + ), + ) + core.CfnOutput( + self, + id="CustomAlgorithmImageURI", + value=container_uri, + ) diff --git a/source/lib/blueprints/byom/byom_realtime_build_container.py b/source/lib/blueprints/byom/byom_realtime_build_container.py deleted file mode 100644 index f134d3a..0000000 --- a/source/lib/blueprints/byom/byom_realtime_build_container.py +++ /dev/null @@ -1,248 +0,0 @@ -# ##################################################################################################################### -# Copyright 2020-2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # -# # -# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # -# with the License. A copy of the License is located at # -# # -# http://www.apache.org/licenses/LICENSE-2.0 # -# # -# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # -# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # -# and limitations under the License. # -# ##################################################################################################################### -import uuid -from aws_cdk import ( - aws_iam as iam, - aws_s3 as s3, - aws_sns as sns, - aws_sns_subscriptions as subscriptions, - aws_events_targets as targets, - aws_events as events, - aws_codepipeline as codepipeline, - core, -) -from lib.blueprints.byom.pipeline_definitions.source_actions import source_action_custom -from lib.blueprints.byom.pipeline_definitions.build_actions import build_action -from lib.blueprints.byom.pipeline_definitions.deploy_actions import ( - create_model, - create_endpoint, - sagemaker_layer, -) -from lib.blueprints.byom.pipeline_definitions.share_actions import configure_inference -from lib.blueprints.byom.pipeline_definitions.helpers import ( - suppress_assets_bucket, - pipeline_permissions, - suppress_list_function_policy, - suppress_pipeline_bucket, - suppress_iam_complex, - suppress_sns, -) - - -class BYOMRealtimeBuildStack(core.Stack): - def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: - super().__init__(scope, id, **kwargs) - - # Parameteres # - notification_email = core.CfnParameter( - self, - "NOTIFICATION_EMAIL", - type="String", - description="email for pipeline outcome notifications", - allowed_pattern="^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$", - constraint_description="Please enter an email address with correct format (example@exmaple.com)", - min_length=5, - max_length=320, - ) - blueprint_bucket_name = core.CfnParameter( - self, - "BLUEPRINT_BUCKET", - type="String", - description="Bucket name for blueprints of different types of ML Pipelines.", - min_length=3, - ) - assets_bucket_name = core.CfnParameter( - self, "ASSETS_BUCKET", type="String", description="Bucket name for access logs.", min_length=3 - ) - custom_container = core.CfnParameter( - self, - "CUSTOM_CONTAINER", - default="", - type="String", - description=( - "Should point to a zip file containing dockerfile and assets for building a custom model. " - "If empty it will beusing containers from SageMaker Registry" - ), - ) - model_framework = core.CfnParameter( - self, - "MODEL_FRAMEWORK", - default="", - type="String", - description="The ML framework which is used for training the model. E.g., xgboost, kmeans, etc.", - ) - model_framework_version = core.CfnParameter( - self, - "MODEL_FRAMEWORK_VERSION", - default="", - type="String", - description="The version of the ML framework which is used for training the model. E.g., 1.1-2", - ) - model_name = core.CfnParameter( - self, "MODEL_NAME", type="String", description="An arbitrary name for the model.", min_length=1 - ) - model_artifact_location = core.CfnParameter( - self, - "MODEL_ARTIFACT_LOCATION", - type="String", - description="Path to model artifact inside assets bucket.", - ) - inference_instance = core.CfnParameter( - self, - "INFERENCE_INSTANCE", - type="String", - description="Inference instance that inference requests will be running on. E.g., ml.m5.large", - allowed_pattern="^[a-zA-Z0-9_.+-]+\.[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$", - min_length=7, - ) - # Resources # - - # access_bucket = s3.Bucket.from_bucket_name(self, "AccessBucket", access_bucket_name.value_as_string) - assets_bucket = s3.Bucket.from_bucket_name(self, "AssetsBucket", assets_bucket_name.value_as_string) - # getting blueprint bucket object from its name - will be used later in the stack - blueprint_bucket = s3.Bucket.from_bucket_name(self, "BlueprintBucket", blueprint_bucket_name.value_as_string) - - # Defining pipeline stages - # source stage - source_output, source_action_definition = source_action_custom( - model_artifact_location, assets_bucket, custom_container - ) - - # build stage - build_action_definition, container_uri = build_action(self, source_output) - - # deploy stage - sm_layer = sagemaker_layer(self, blueprint_bucket) - # creating a sagemaker model - model_lambda_arn, create_model_definition = create_model( - self, - blueprint_bucket, - assets_bucket, - model_name, - model_artifact_location, - custom_container, - model_framework, - model_framework_version, - container_uri, - sm_layer, - ) - # creating a sagemaker endpoint - endpoint_lambda_arn, create_endpoint_definition = create_endpoint( - self, blueprint_bucket, assets_bucket, model_name, inference_instance - ) - # Share stage - configure_lambda_arn, configure_inference_definition = configure_inference(self, blueprint_bucket) - - # create invoking lambda policy - invoke_lambdas_policy = iam.PolicyStatement( - actions=[ - "lambda:InvokeFunction", - ], - resources=[model_lambda_arn, endpoint_lambda_arn, configure_lambda_arn], - ) - - pipeline_notification_topic = sns.Topic( - self, - "PipelineNotification", - ) - pipeline_notification_topic.node.default_child.cfn_options.metadata = suppress_sns() - pipeline_notification_topic.add_subscription( - subscriptions.EmailSubscription(email_address=notification_email.value_as_string) - ) - - # createing pipeline stages - source_stage = codepipeline.StageProps(stage_name="Source", actions=[source_action_definition]) - build_stage = codepipeline.StageProps(stage_name="Build", actions=[build_action_definition]) - deploy_stage_realtime = codepipeline.StageProps( - stage_name="Deploy", - actions=[ - create_model_definition, - create_endpoint_definition, - ], - ) - share_stage = codepipeline.StageProps(stage_name="Share", actions=[configure_inference_definition]) - - realtime_build_pipeline = codepipeline.Pipeline( - self, - "BYOMPipelineReatimeBuild", - stages=[source_stage, build_stage, deploy_stage_realtime, share_stage], - cross_account_keys=False, - ) - realtime_build_pipeline.on_state_change( - "NotifyUser", - description="Notify user of the outcome of the pipeline", - target=targets.SnsTopic( - pipeline_notification_topic, - message=events.RuleTargetInput.from_text( - ( - f"Pipeline {events.EventField.from_path('$.detail.pipeline')} finished executing. " - f"Pipeline execution result is {events.EventField.from_path('$.detail.state')}" - ) - ), - ), - event_pattern=events.EventPattern(detail={"state": ["SUCCEEDED", "FAILED"]}), - ) - realtime_build_pipeline.add_to_role_policy( - iam.PolicyStatement( - actions=["events:PutEvents"], - resources=[ - f"arn:{core.Aws.PARTITION}:events:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:event-bus/*", - ], - ) - ) - # add lambda permissons - realtime_build_pipeline.add_to_role_policy(invoke_lambdas_policy) - # Enhancement: This is to find CDK object nodes so that unnecessary cfn-nag warnings can be suppressed - # There is room for improving the method in future versions to find CDK nodes without having to use - # hardocded index numbers - pipeline_child_nodes = realtime_build_pipeline.node.find_all() - pipeline_child_nodes[1].node.default_child.cfn_options.metadata = suppress_pipeline_bucket() - pipeline_child_nodes[6].node.default_child.cfn_options.metadata = suppress_iam_complex() - pipeline_child_nodes[13].node.default_child.cfn_options.metadata = suppress_iam_complex() - pipeline_child_nodes[19].node.default_child.cfn_options.metadata = suppress_list_function_policy() - pipeline_child_nodes[25].node.default_child.cfn_options.metadata = suppress_list_function_policy() - pipeline_child_nodes[30].node.default_child.cfn_options.metadata = suppress_list_function_policy() - pipeline_child_nodes[36].node.default_child.cfn_options.metadata = suppress_list_function_policy() - # attaching iam permissions to the pipelines - pipeline_permissions(realtime_build_pipeline, assets_bucket) - - # Outputs # - core.CfnOutput( - self, - id="Pipelines", - value=( - f"https://console.aws.amazon.com/codesuite/codepipeline/pipelines/" - f"{realtime_build_pipeline.pipeline_name}/view?region={core.Aws.REGION}" - ), - ) - core.CfnOutput( - self, - id="SageMakerModelName", - value=model_name.value_as_string, - ) - core.CfnOutput( - self, - id="SageMakerEndpointConfigName", - value=f"{model_name.value_as_string}-endpoint-config", - ) - core.CfnOutput( - self, - id="SageMakerEndpointName", - value=f"{model_name.value_as_string}-endpoint", - ) - core.CfnOutput( - self, - id="EndpointDataCaptureLocation", - value=f"https://s3.console.aws.amazon.com/s3/buckets/{assets_bucket.bucket_name}/datacapture", - description="Endpoint data capture location (to be used by Model Monitor)", - ) \ No newline at end of file diff --git a/source/lib/blueprints/byom/byom_realtime_builtin_container.py b/source/lib/blueprints/byom/byom_realtime_builtin_container.py deleted file mode 100644 index 2d96261..0000000 --- a/source/lib/blueprints/byom/byom_realtime_builtin_container.py +++ /dev/null @@ -1,239 +0,0 @@ -# ##################################################################################################################### -# Copyright 2020-2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # -# # -# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # -# with the License. A copy of the License is located at # -# # -# http://www.apache.org/licenses/LICENSE-2.0 # -# # -# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # -# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # -# and limitations under the License. # -# ##################################################################################################################### -import uuid -from aws_cdk import ( - aws_iam as iam, - aws_s3 as s3, - aws_sns as sns, - aws_sns_subscriptions as subscriptions, - aws_events_targets as targets, - aws_events as events, - aws_codepipeline as codepipeline, - core, -) -from lib.blueprints.byom.pipeline_definitions.source_actions import source_action -from lib.blueprints.byom.pipeline_definitions.deploy_actions import ( - create_model, - create_endpoint, - sagemaker_layer, -) -from lib.blueprints.byom.pipeline_definitions.share_actions import configure_inference -from lib.blueprints.byom.pipeline_definitions.helpers import ( - suppress_assets_bucket, - pipeline_permissions, - suppress_list_function_policy, - suppress_pipeline_bucket, - suppress_iam_complex, - suppress_sns, -) - - -class BYOMRealtimeBuiltinStack(core.Stack): - def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: - super().__init__(scope, id, **kwargs) - - # Parameteres # - notification_email = core.CfnParameter( - self, - "NOTIFICATION_EMAIL", - type="String", - description="email for pipeline outcome notifications", - allowed_pattern="^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$", - constraint_description="Please enter an email address with correct format (example@exmaple.com)", - min_length=5, - max_length=320, - ) - blueprint_bucket_name = core.CfnParameter( - self, - "BLUEPRINT_BUCKET", - type="String", - description="Bucket name for blueprints of different types of ML Pipelines.", - min_length=3, - ) - assets_bucket_name = core.CfnParameter( - self, "ASSETS_BUCKET", type="String", description="Bucket name for access logs.", min_length=3 - ) - custom_container = core.CfnParameter( - self, - "CUSTOM_CONTAINER", - default="", - type="String", - description=( - "Should point to path to a zip file containing dockerfile and assets for building " - "a custom model. If empty it will beusing containers from SageMaker Registry" - ), - ) - model_framework = core.CfnParameter( - self, - "MODEL_FRAMEWORK", - type="String", - description="The ML framework which is used for training the model. E.g., xgboost, kmeans, etc.", - ) - model_framework_version = core.CfnParameter( - self, - "MODEL_FRAMEWORK_VERSION", - type="String", - description="The version of the ML framework which is used for training the model. E.g., 1.1-2", - ) - model_name = core.CfnParameter( - self, "MODEL_NAME", type="String", description="An arbitrary name for the model.", min_length=1 - ) - model_artifact_location = core.CfnParameter( - self, - "MODEL_ARTIFACT_LOCATION", - type="String", - description="Path to model artifact inside assets bucket.", - ) - inference_instance = core.CfnParameter( - self, - "INFERENCE_INSTANCE", - type="String", - description="Inference instance that inference requests will be running on. E.g., ml.m5.large", - allowed_pattern="^[a-zA-Z0-9_.+-]+\.[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$", - min_length=7, - ) - # Resources # - assets_bucket = s3.Bucket.from_bucket_name(self, "AssetsBucket", assets_bucket_name.value_as_string) - # getting blueprint bucket object from its name - will be used later in the stack - blueprint_bucket = s3.Bucket.from_bucket_name(self, "BlueprintBucket", blueprint_bucket_name.value_as_string) - - # Defining pipeline stages - # source stage - source_output, source_action_definition = source_action(model_artifact_location, assets_bucket) - - # deploy stage - sm_layer = sagemaker_layer(self, blueprint_bucket) - # creating a sagemaker model - model_lambda_arn, create_model_definition = create_model( - self, - blueprint_bucket, - assets_bucket, - model_name, - model_artifact_location, - custom_container, - model_framework, - model_framework_version, - "", - sm_layer, - ) - # creating a sagemaker endpoint - endpoint_lambda_arn, create_endpoint_definition = create_endpoint( - self, blueprint_bucket, assets_bucket, model_name, inference_instance - ) - # Share stage - configure_lambda_arn, configure_inference_definition = configure_inference(self, blueprint_bucket) - - # create invoking lambda policy - invoke_lambdas_policy = iam.PolicyStatement( - actions=[ - "lambda:InvokeFunction", - ], - resources=[model_lambda_arn, endpoint_lambda_arn, configure_lambda_arn], - ) - - # createing pipeline stages - source_stage = codepipeline.StageProps(stage_name="Source", actions=[source_action_definition]) - deploy_stage_realtime = codepipeline.StageProps( - stage_name="Deploy", - actions=[ - create_model_definition, - create_endpoint_definition, - ], - ) - share_stage = codepipeline.StageProps(stage_name="Share", actions=[configure_inference_definition]) - - pipeline_notification_topic = sns.Topic( - self, - "PipelineNotification", - ) - pipeline_notification_topic.node.default_child.cfn_options.metadata = suppress_sns() - pipeline_notification_topic.add_subscription( - subscriptions.EmailSubscription(email_address=notification_email.value_as_string) - ) - - # constructing pipelines based on batch/realtime and custom build container or built-in - realtime_nobuild_pipeline = codepipeline.Pipeline( - self, - "BYOMPipelineReatimeBuiltIn", - stages=[source_stage, deploy_stage_realtime, share_stage], - cross_account_keys=False, - ) - realtime_nobuild_pipeline.on_state_change( - "NotifyUser", - description="Notify user of the outcome of the pipeline", - target=targets.SnsTopic( - pipeline_notification_topic, - message=events.RuleTargetInput.from_text( - ( - f"Pipeline {events.EventField.from_path('$.detail.pipeline')} finished executing. " - f"Pipeline execution result is {events.EventField.from_path('$.detail.state')}" - ) - ), - ), - event_pattern=events.EventPattern(detail={"state": ["SUCCEEDED", "FAILED"]}), - ) - realtime_nobuild_pipeline.add_to_role_policy( - iam.PolicyStatement( - actions=["events:PutEvents"], - resources=[ - f"arn:{core.Aws.PARTITION}:events:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:event-bus/*", - ], - ) - ) - - # add lambda permissons - realtime_nobuild_pipeline.add_to_role_policy(invoke_lambdas_policy) - - # Enhancement: This is to find CDK object nodes so that unnecessary cfn-nag warnings can be suppressed - # There is room for improving the method in future versions to find CDK nodes without having to use - # hardocded index numbers - pipeline_child_nodes = realtime_nobuild_pipeline.node.find_all() - pipeline_child_nodes[1].node.default_child.cfn_options.metadata = suppress_pipeline_bucket() - pipeline_child_nodes[6].node.default_child.cfn_options.metadata = suppress_iam_complex() - pipeline_child_nodes[13].node.default_child.cfn_options.metadata = suppress_iam_complex() - pipeline_child_nodes[19].node.default_child.cfn_options.metadata = suppress_list_function_policy() - pipeline_child_nodes[24].node.default_child.cfn_options.metadata = suppress_list_function_policy() - pipeline_child_nodes[30].node.default_child.cfn_options.metadata = suppress_list_function_policy() - # attaching iam permissions to the pipelines - pipeline_permissions(realtime_nobuild_pipeline, assets_bucket) - - # Outputs # - core.CfnOutput( - self, - id="Pipelines", - value=( - f"https://console.aws.amazon.com/codesuite/codepipeline/pipelines/" - f"{realtime_nobuild_pipeline.pipeline_name}/view?region={core.Aws.REGION}" - ), - ) - core.CfnOutput( - self, - id="SageMakerModelName", - value=model_name.value_as_string, - ) - core.CfnOutput( - self, - id="SageMakerEndpointConfigName", - value=f"{model_name.value_as_string}-endpoint-config", - ) - core.CfnOutput( - self, - id="SageMakerEndpointName", - value=f"{model_name.value_as_string}-endpoint", - ) - core.CfnOutput( - self, - id="EndpointDataCaptureLocation", - value=f"https://s3.console.aws.amazon.com/s3/buckets/{assets_bucket.bucket_name}/datacapture", - description="Endpoint data capture location (to be used by Model Monitor)", - ) \ No newline at end of file diff --git a/source/lib/blueprints/byom/lambdas/batch_transform/main.py b/source/lib/blueprints/byom/lambdas/batch_transform/main.py index d1f8a90..76b3985 100644 --- a/source/lib/blueprints/byom/lambdas/batch_transform/main.py +++ b/source/lib/blueprints/byom/lambdas/batch_transform/main.py @@ -1,5 +1,5 @@ # ##################################################################################################################### -# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # # # # Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # # with the License. A copy of the License is located at # @@ -11,58 +11,48 @@ # and limitations under the License. # # ##################################################################################################################### import os -from time import gmtime, strftime -import boto3 import uuid -from shared.wrappers import code_pipeline_exception_handler from shared.logger import get_logger from shared.helper import get_client logger = get_logger(__name__) sm_client = get_client("sagemaker") -cp_client = get_client("codepipeline") -@code_pipeline_exception_handler def handler(event, context): - # todo: change the way to mock boto3 clients for unit tests without passing clients in input - - # Extract the Job ID - job_id = event["CodePipeline.job"]["id"] - prefix = "batch_transform" - model_name = os.environ.get("model_name") - assets_bucket = os.environ.get("assets_bucket") - batch_data = os.environ.get("batch_inference_data") - inference_instance = os.environ.get("inference_instance") - - batch_job_name = f"{model_name}-batch-transform-{str(uuid.uuid4())[:8]}" - output_location = f"s3://{assets_bucket}/{prefix}/output/{batch_job_name}" - - request = { - "TransformJobName": batch_job_name, - "ModelName": model_name, - "TransformOutput": { - "S3OutputPath": output_location, - "Accept": "text/csv", - "AssembleWith": "Line", - }, - "TransformInput": { - "DataSource": {"S3DataSource": {"S3DataType": "S3Prefix", "S3Uri": f"s3://{assets_bucket}/{batch_data}"}}, - "ContentType": "text/csv", - "SplitType": "Line", - "CompressionType": "None", - }, - "TransformResources": {"InstanceType": inference_instance, "InstanceCount": 1}, - } - - response = sm_client.create_transform_job(**request) - logger.info(f"Response from create transform job request. response: {response}") - logger.info(f"Created Transform job with name: {batch_job_name}") - - if response["ResponseMetadata"]["HTTPStatusCode"] == 200: - cp_client.put_job_success_result(jobId=job_id) - logger.info(f"Sent success message back to codepipeline with job_id: {job_id}") - else: - cp_client.put_job_failure_result( - jobId=job_id, failureDetails={"message": "Job failed. Check the logs for more info.", "type": "JobFailed"} - ) + try: + model_name = os.environ.get("model_name").lower() + batch_inference_data = os.environ.get("batch_inference_data") + batch_job_output_location = os.environ.get("batch_job_output_location") + inference_instance = os.environ.get("inference_instance") + kms_key_arn = os.environ.get("kms_key_arn") + batch_job_name = f"{model_name}-batch-transform-{str(uuid.uuid4())[:8]}" + + request = { + "TransformJobName": batch_job_name, + "ModelName": model_name, + "TransformOutput": { + "S3OutputPath": f"s3://{batch_job_output_location}", + "Accept": "text/csv", + "AssembleWith": "Line", + }, + "TransformInput": { + "DataSource": {"S3DataSource": {"S3DataType": "S3Prefix", "S3Uri": f"s3://{batch_inference_data}"}}, + "ContentType": "text/csv", + "SplitType": "Line", + "CompressionType": "None", + }, + "TransformResources": {"InstanceType": inference_instance, "InstanceCount": 1}, + } + # add KmsKey if provided by the customer + if kms_key_arn: + request["TransformOutput"].update({"KmsKeyId": kms_key_arn}) + request["TransformResources"].update({"VolumeKmsKeyId": kms_key_arn}) + + response = sm_client.create_transform_job(**request) + logger.info(f"Response from create transform job request. response: {response}") + logger.info(f"Created Transform job with name: {batch_job_name}") + + except Exception as e: + logger.error(f"Error creating the batch transform job {batch_job_name}: {str(e)}") + raise e diff --git a/source/lib/blueprints/byom/lambdas/batch_transform/tests/test_batch_transform.py b/source/lib/blueprints/byom/lambdas/batch_transform/tests/test_batch_transform.py index 5a6b4a3..0b3d80b 100644 --- a/source/lib/blueprints/byom/lambdas/batch_transform/tests/test_batch_transform.py +++ b/source/lib/blueprints/byom/lambdas/batch_transform/tests/test_batch_transform.py @@ -27,6 +27,8 @@ def mock_env_variables(): "assets_bucket": "testbucket", "batch_inference_data": "test", "inference_instance": "ml.m5.4xlarge", + "batch_job_output_location": "output-location", + "kms_key_arn": "mykey", } os.environ = {**os.environ, **new_env} @@ -36,18 +38,14 @@ def sm_expected_params(): return { "TransformJobName": ANY, "ModelName": "test", - "TransformOutput": { - "S3OutputPath": ANY, - "Accept": "text/csv", - "AssembleWith": "Line", - }, + "TransformOutput": {"S3OutputPath": ANY, "Accept": "text/csv", "AssembleWith": "Line", "KmsKeyId": "mykey"}, "TransformInput": { "DataSource": {"S3DataSource": {"S3DataType": "S3Prefix", "S3Uri": ANY}}, "ContentType": "text/csv", "SplitType": "Line", "CompressionType": "None", }, - "TransformResources": {"InstanceType": ANY, "InstanceCount": 1}, + "TransformResources": {"InstanceType": ANY, "InstanceCount": 1, "VolumeKmsKeyId": "mykey"}, } @@ -67,16 +65,6 @@ def sm_response_500(): } -@pytest.fixture -def cp_expected_params_success(): - return {"jobId": "test_job_id"} - - -@pytest.fixture -def cp_expected_params_failure(): - return {"jobId": "test_job_id", "failureDetails": {"message": ANY, "type": "JobFailed"}} - - @pytest.fixture() def event(): return { @@ -85,55 +73,27 @@ def event(): @mock_sts -def test_handler_success(sm_expected_params, sm_response_200, cp_expected_params_success, event): - +def test_handler_success(sm_expected_params, sm_response_200, event): sm_client = get_client("sagemaker") - cp_client = get_client("codepipeline") - sm_stubber = Stubber(sm_client) - cp_stubber = Stubber(cp_client) - - cp_response = {} # success path sm_stubber.add_response("create_transform_job", sm_response_200, sm_expected_params) - cp_stubber.add_response("put_job_success_result", cp_response, cp_expected_params_success) with sm_stubber: - with cp_stubber: - handler(event, {}) - cp_stubber.assert_no_pending_responses() - reset_client() + handler(event, {}) + reset_client() @mock_sts -def test_handler_fail(sm_expected_params, sm_response_500, cp_expected_params_failure, event): +def test_handler_fail(sm_expected_params, sm_response_500, event): sm_client = get_client("sagemaker") - cp_client = get_client("codepipeline") - sm_stubber = Stubber(sm_client) - cp_stubber = Stubber(cp_client) - cp_response = {} # fail path sm_stubber.add_response("create_transform_job", sm_response_500, sm_expected_params) - cp_stubber.add_response("put_job_failure_result", cp_response, cp_expected_params_failure) - with sm_stubber: - with cp_stubber: - handler(event, {}) - cp_stubber.assert_no_pending_responses() - reset_client() - - -def test_handler_exception(): - with patch("boto3.client") as mock_client: - event = { - "CodePipeline.job": {"id": "test_job_id"}, - } - failure_message = { - "message": "Job failed. Check the logs for more info.", - "type": "JobFailed", - } - handler(event, context={}) - mock_client().put_job_failure_result.assert_called() + with pytest.raises(Exception): + handler(event, {}) + + reset_client() diff --git a/source/lib/blueprints/byom/lambdas/configure_inference_lambda/main.py b/source/lib/blueprints/byom/lambdas/configure_inference_lambda/main.py deleted file mode 100644 index c6f016c..0000000 --- a/source/lib/blueprints/byom/lambdas/configure_inference_lambda/main.py +++ /dev/null @@ -1,62 +0,0 @@ -# ##################################################################################################################### -# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. # -# # -# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # -# with the License. A copy of the License is located at # -# # -# http://www.apache.org/licenses/LICENSE-2.0 # -# # -# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # -# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # -# and limitations under the License. # -# ##################################################################################################################### -import os -import json -import boto3 -from shared.wrappers import code_pipeline_exception_handler -from shared.logger import get_logger -from shared.helper import get_client - -logger = get_logger(__name__) - -lm_client = get_client("lambda") -cp_client = get_client("codepipeline") - - -@code_pipeline_exception_handler -def handler(event, context): - # todo: change the way to mock boto3 clients for unit tests without passing clients in input - - # Extract the Job ID - job_id = event["CodePipeline.job"]["id"] - - # Extract the Job Data - job_data = event["CodePipeline.job"]["data"] - user_parameters = job_data["actionConfiguration"]["configuration"]["UserParameters"] - - logger.debug("user parameters: %s", user_parameters) - # Get the user parameter that was sent from the last action (create sagemaker endpoint) - # Codepipeline formats output variables from previous stages into {"0": {"variableName": "value"}} - endpoint_name = json.loads(user_parameters)["0"]["endpointName"] - logger.debug("Sagemaker endpoint name: %s", endpoint_name) - inference_lambda_arn = os.environ["inference_lambda_arn"] - logger.debug("Inference Lambda ARN: %s", inference_lambda_arn) - - # Sending request to update environment variables for the inference lambda - # so that it knows which inference endpoint to refer to when it gets inference - # request from api gateway - logger.info("Updating inference lambda configuration") - response = lm_client.update_function_configuration( - FunctionName=inference_lambda_arn, - Environment={"Variables": {"ENDPOINT_NAME": endpoint_name, "LOG_LEVEL": "INFO"}}, - ) - logger.info("finished updating inference lambda") - logger.debug(response) - # Send response back to codepipeline success or fail. - if response["ResponseMetadata"]["HTTPStatusCode"] == 200: - cp_client.put_job_success_result(jobId=job_id) - else: - cp_client.put_job_failure_result( - jobId=job_id, - failureDetails={"message": "Job failed. Check the logs for more info.", "type": "JobFailed"}, - ) diff --git a/source/lib/blueprints/byom/lambdas/configure_inference_lambda/tests/test_configure_inference_lambda.py b/source/lib/blueprints/byom/lambdas/configure_inference_lambda/tests/test_configure_inference_lambda.py deleted file mode 100644 index 31193a1..0000000 --- a/source/lib/blueprints/byom/lambdas/configure_inference_lambda/tests/test_configure_inference_lambda.py +++ /dev/null @@ -1,174 +0,0 @@ -####################################################################################################################### -# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. # -# # -# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # -# with the License. A copy of the License is located at # -# # -# http://www.apache.org/licenses/LICENSE-2.0 # -# # -# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # -# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # -# and limitations under the License. # -# ##################################################################################################################### -import os -import json -import datetime -import unittest -from unittest.mock import MagicMock, patch -import pytest -import boto3 -from moto import mock_sts -from botocore.stub import Stubber, ANY -from shared.logger import get_logger -from shared.helper import get_client, reset_client -from main import handler - - -@pytest.fixture(autouse=True) -def mock_env_variables(): - new_env = { - "model_name": "test", - "assets_bucket": "testbucket", - "batch_inference_data": "test", - "inference_instance": "test", - "inference_lambda_arn": "testname", - } - os.environ = {**os.environ, **new_env} - - -@pytest.fixture -def lm_expected_params(): - return { - "FunctionName": ANY, - "Environment": {"Variables": {"ENDPOINT_NAME": "test", "LOG_LEVEL": "INFO"}}, - } - - -@pytest.fixture -def cp_expected_params_success(): - return {"jobId": "test_job_id"} - - -@pytest.fixture -def cp_expected_params_failure(): - return {"jobId": "test_job_id", "failureDetails": {"message": ANY, "type": "JobFailed"}} - - -@pytest.fixture -def lm_response_200(): - return { - "ResponseMetadata": {"HTTPStatusCode": 200}, - "FunctionName": "string", - "FunctionArn": "string", - "Runtime": "nodejs", - "Role": "string", - "Handler": "string", - "CodeSize": 123, - "Description": "string", - "Timeout": 1, - "MemorySize": 129, - "LastModified": "string", - "CodeSha256": "string", - "Version": "string", - "State": "Active", - "StateReason": "string", - "StateReasonCode": "Idle", - "LastUpdateStatus": "Successful", - "LastUpdateStatusReason": "string", - "LastUpdateStatusReasonCode": "EniLimitExceeded", - "FileSystemConfigs": [ - {"Arn": "string", "LocalMountPath": "string"}, - ], - } - - -@pytest.fixture -def lm_response_500(): - return { - "ResponseMetadata": {"HTTPStatusCode": 500}, - "FunctionName": "string", - "FunctionArn": "string", - "Runtime": "nodejs", - "Role": "string", - "Handler": "string", - "CodeSize": 123, - "Description": "string", - "Timeout": 1, - "MemorySize": 129, - "LastModified": "string", - "CodeSha256": "string", - "Version": "string", - "State": "Active", - "StateReason": "string", - "StateReasonCode": "Idle", - "LastUpdateStatus": "Successful", - "LastUpdateStatusReason": "string", - "LastUpdateStatusReasonCode": "EniLimitExceeded", - "FileSystemConfigs": [ - {"Arn": "string", "LocalMountPath": "string"}, - ], - } - - -@pytest.fixture -def event(): - return { - "CodePipeline.job": { - "id": "test_job_id", - "data": { - "actionConfiguration": { - "configuration": {"UserParameters": json.dumps({"0": {"endpointName": "test"}})} - } - }, - }, - } - - -@mock_sts -def test_handler_success(lm_expected_params, lm_response_200, cp_expected_params_success, event): - lm_client = get_client("lambda") - lm_stubber = Stubber(lm_client) - cp_client = get_client("codepipeline") - cp_stubber = Stubber(cp_client) - - cp_response = {} - - lm_stubber.add_response("update_function_configuration", lm_response_200, lm_expected_params) - cp_stubber.add_response("put_job_success_result", cp_response, cp_expected_params_success) - - with lm_stubber: - with cp_stubber: - handler(event, {}) - cp_stubber.assert_no_pending_responses() - reset_client() - - -def test_handler_failure(lm_expected_params, lm_response_500, cp_expected_params_failure, event): - lm_client = get_client("lambda") - lm_stubber = Stubber(lm_client) - cp_client = get_client("codepipeline") - cp_stubber = Stubber(cp_client) - - cp_response = {} - - lm_stubber.add_response("update_function_configuration", lm_response_500, lm_expected_params) - cp_stubber.add_response("put_job_failure_result", cp_response, cp_expected_params_failure) - - with lm_stubber: - with cp_stubber: - handler(event, {}) - cp_stubber.assert_no_pending_responses() - reset_client() - - -def test_handler_exception(): - with patch("boto3.client") as mock_client: - event = { - "CodePipeline.job": {"id": "test_job_id"}, - } - failure_message = { - "message": "Job failed. Check the logs for more info.", - "type": "JobFailed", - } - handler(event, context={}) - mock_client().put_job_failure_result.assert_called() diff --git a/source/lib/blueprints/byom/lambdas/create_data_baseline_job/main.py b/source/lib/blueprints/byom/lambdas/create_data_baseline_job/main.py index 51e3855..eb733f4 100644 --- a/source/lib/blueprints/byom/lambdas/create_data_baseline_job/main.py +++ b/source/lib/blueprints/byom/lambdas/create_data_baseline_job/main.py @@ -1,5 +1,5 @@ # ##################################################################################################################### -# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # # # # Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # # with the License. A copy of the License is located at # @@ -11,25 +11,16 @@ # and limitations under the License. # # ##################################################################################################################### import os -import json -import time import botocore import boto3 -from shared.wrappers import code_pipeline_exception_handler from shared.logger import get_logger from shared.helper import get_client, get_built_in_model_monitor_container_uri logger = get_logger(__name__) - sm_client = get_client("sagemaker") -cp_client = get_client("codepipeline") -@code_pipeline_exception_handler def handler(event, context): - # Extract the Job ID - job_id = event["CodePipeline.job"]["id"] - baseline_job_name = os.environ["BASELINE_JOB_NAME"] assets_bucket = os.environ["ASSETS_BUCKET"] training_data_location = os.environ["TRAINING_DATA_LOCATION"] @@ -37,27 +28,15 @@ def handler(event, context): instance_type = os.environ["INSTANCE_TYPE"] instance_volume_size = int(os.environ["INSTANCE_VOLUME_SIZE"]) role_arn = os.environ["ROLE_ARN"] + kms_key_arn = os.environ.get("KMS_KEY_ARN") stack_name = os.environ["STACK_NAME"] - # optional value, if the client did not provide a value, the orchestraion lambda sets it to -1 max_runtime_seconds = int(os.environ["MAX_RUNTIME_SECONDS"]) - if max_runtime_seconds == -1: - max_runtime_seconds = None - logger.info(f"Checking if model monitor's data baseline processing job {baseline_job_name} exists...") try: - existing_baseline_job = sm_client.describe_processing_job(ProcessingJobName=baseline_job_name) - # Checking if data baseline processing job with the same name exists - if existing_baseline_job["ResponseMetadata"]["HTTPStatusCode"] == 200: - logger.info(f"Baseline processing job {baseline_job_name} already exists, skipping job creation") - check_baseline_job_status(job_id, existing_baseline_job) - - except botocore.exceptions.ClientError as error: - logger.info(str(error)) - logger.info(f"Data baseline processing job {baseline_job_name} doesn't exist. Creating a new one.") - # Sending request to create data baseline processing job - response = sm_client.create_processing_job( - ProcessingJobName=baseline_job_name, - ProcessingInputs=[ + logger.info(f"Creating data baseline processing job {baseline_job_name} ...") + request = { + "ProcessingJobName": baseline_job_name, + "ProcessingInputs": [ { "InputName": "baseline_dataset_input", "S3Input": { @@ -70,64 +49,54 @@ def handler(event, context): }, } ], - ProcessingOutputConfig={ + "ProcessingOutputConfig": { "Outputs": [ { "OutputName": "baseline_dataset_output", "S3Output": { - "S3Uri": f"s3://{assets_bucket}/{baseline_job_output_location}/{baseline_job_name}", + "S3Uri": f"s3://{baseline_job_output_location}/{baseline_job_name}", "LocalPath": "/opt/ml/processing/output", "S3UploadMode": "EndOfJob", }, }, ], }, - ProcessingResources={ + "ProcessingResources": { "ClusterConfig": { "InstanceCount": 1, "InstanceType": instance_type, "VolumeSizeInGB": instance_volume_size, } }, - StoppingCondition={"MaxRuntimeInSeconds": max_runtime_seconds}, - AppSpecification={ + "AppSpecification": { "ImageUri": get_built_in_model_monitor_container_uri(boto3.session.Session().region_name), }, - Environment={ + "Environment": { "dataset_format": '{"csv": {"header": true, "output_columns_position": "START"}}', "dataset_source": "/opt/ml/processing/input/baseline_dataset_input", "output_path": "/opt/ml/processing/output", "publish_cloudwatch_metrics": "Disabled", }, - RoleArn=role_arn, - Tags=[ + "RoleArn": role_arn, + "Tags": [ {"Key": "stack_name", "Value": stack_name}, ], - ) + } + + # optional value, if the client did not provide a value, the orchestraion lambda sets it to -1 + if max_runtime_seconds != -1: + request.update({"StoppingCondition": {"MaxRuntimeInSeconds": max_runtime_seconds}}) + # add kms key if provided + if kms_key_arn: + request["ProcessingOutputConfig"].update({"KmsKeyId": kms_key_arn}) + request["ProcessingResources"]["ClusterConfig"].update({"VolumeKmsKeyId": kms_key_arn}) + + # Sending request to create data baseline processing job + response = sm_client.create_processing_job(**request) logger.info(f"Finished creating data baseline processing job. respons: {response}") logger.info("Data Baseline Processing JobArn: " + response["ProcessingJobArn"]) - logger.debug(response) - resp = sm_client.describe_processing_job(ProcessingJobName=baseline_job_name) - check_baseline_job_status(job_id, resp) - -def check_baseline_job_status(job_id, baseline_job_response): - job_status = baseline_job_response["ProcessingJobStatus"] - logger.info("ProcessingJob Status: " + job_status) - if job_status == "InProgress": - continuation_token = json.dumps({"previous_job_id": job_id}) - logger.info("Putting job continuation") - cp_client.put_job_success_result(jobId=job_id, continuationToken=continuation_token) - elif job_status == "Completed": - cp_client.put_job_success_result( - jobId=job_id, - ) - else: - cp_client.put_job_failure_result( - jobId=job_id, - failureDetails={ - "message": f"Failed to create Data Baseline Processing Job. status: {job_status}", - "type": "JobFailed", - }, - ) + except botocore.exceptions.ClientError as error: + logger.info(str(error)) + logger.info(f"Creation of baseline processing job: {baseline_job_name} faild.") diff --git a/source/lib/blueprints/byom/lambdas/create_data_baseline_job/setup.py b/source/lib/blueprints/byom/lambdas/create_data_baseline_job/setup.py index 6332fa6..7bb64f8 100644 --- a/source/lib/blueprints/byom/lambdas/create_data_baseline_job/setup.py +++ b/source/lib/blueprints/byom/lambdas/create_data_baseline_job/setup.py @@ -1,5 +1,5 @@ -################################################################################################################## -# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +####################################################################################################################### +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # # # # Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # # with the License. A copy of the License is located at # diff --git a/source/lib/blueprints/byom/lambdas/create_data_baseline_job/tests/fixtures/baseline_fixtures.py b/source/lib/blueprints/byom/lambdas/create_data_baseline_job/tests/fixtures/baseline_fixtures.py index f28c09f..ed4af2e 100644 --- a/source/lib/blueprints/byom/lambdas/create_data_baseline_job/tests/fixtures/baseline_fixtures.py +++ b/source/lib/blueprints/byom/lambdas/create_data_baseline_job/tests/fixtures/baseline_fixtures.py @@ -11,10 +11,8 @@ # and limitations under the License. # # ##################################################################################################################### import os -from datetime import datetime import boto3 import pytest -from botocore.stub import ANY from shared.helper import get_built_in_model_monitor_container_uri @@ -29,7 +27,8 @@ def mock_env_variables(): "INSTANCE_VOLUME_SIZE": "20", "ROLE_ARN": "arn:aws:iam::account:role/myrole", "STACK_NAME": "test-stack", - "MAX_RUNTIME_SECONDS": "2600", + "KMS_KEY_ARN": "mykey", + "MAX_RUNTIME_SECONDS": "3600", } os.environ = {**os.environ, **new_env} @@ -39,6 +38,10 @@ def sm_describe_processing_job_params(): return {"ProcessingJobName": os.environ["BASELINE_JOB_NAME"]} +local_path = "/opt/ml/processing/input/baseline_dataset_input" +output_path = "/opt/ml/processing/output" + + @pytest.fixture def sm_create_baseline_expected_params(): return { @@ -48,7 +51,7 @@ def sm_create_baseline_expected_params(): "InputName": "baseline_dataset_input", "S3Input": { "S3Uri": "s3://" + os.environ["ASSETS_BUCKET"] + "/" + os.environ["TRAINING_DATA_LOCATION"], - "LocalPath": "/opt/ml/processing/input/baseline_dataset_input", + "LocalPath": local_path, "S3DataType": "S3Prefix", "S3InputMode": "File", "S3DataDistributionType": "FullyReplicated", @@ -62,22 +65,22 @@ def sm_create_baseline_expected_params(): "OutputName": "baseline_dataset_output", "S3Output": { "S3Uri": "s3://" - + os.environ["ASSETS_BUCKET"] - + "/" + os.environ["BASELINE_JOB_OUTPUT_LOCATION"] + "/" + os.environ["BASELINE_JOB_NAME"], - "LocalPath": "/opt/ml/processing/output", + "LocalPath": output_path, "S3UploadMode": "EndOfJob", }, }, ], + "KmsKeyId": "mykey", }, "ProcessingResources": { "ClusterConfig": { "InstanceCount": 1, "InstanceType": os.environ["INSTANCE_TYPE"], "VolumeSizeInGB": int(os.environ["INSTANCE_VOLUME_SIZE"]), + "VolumeKmsKeyId": "mykey", } }, "StoppingCondition": {"MaxRuntimeInSeconds": int(os.environ["MAX_RUNTIME_SECONDS"])}, @@ -86,8 +89,8 @@ def sm_create_baseline_expected_params(): }, "Environment": { "dataset_format": '{"csv": {"header": true, "output_columns_position": "START"}}', - "dataset_source": "/opt/ml/processing/input/baseline_dataset_input", - "output_path": "/opt/ml/processing/output", + "dataset_source": local_path, + "output_path": output_path, "publish_cloudwatch_metrics": "Disabled", }, "RoleArn": os.environ["ROLE_ARN"], @@ -105,86 +108,8 @@ def sm_create_job_response_200(): } -@pytest.fixture -def sm_describe_job_response(): - return { - "ProcessingInputs": [ - { - "InputName": "baseline_dataset_input", - "S3Input": { - "S3Uri": "s3://" + os.environ["ASSETS_BUCKET"] + "/" + os.environ["TRAINING_DATA_LOCATION"], - "LocalPath": "/opt/ml/processing/input/baseline_dataset_input", - "S3DataType": "S3Prefix", - "S3InputMode": "File", - "S3DataDistributionType": "FullyReplicated", - "S3CompressionType": "None", - }, - } - ], - "ProcessingOutputConfig": { - "Outputs": [ - { - "OutputName": "baseline_dataset_output", - "S3Output": { - "S3Uri": "s3://" - + os.environ["ASSETS_BUCKET"] - + "/" - + os.environ["BASELINE_JOB_OUTPUT_LOCATION"] - + "/" - + os.environ["BASELINE_JOB_NAME"], - "LocalPath": "/opt/ml/processing/output", - "S3UploadMode": "EndOfJob", - "S3UploadMode": "EndOfJob", - }, - } - ] - }, - "ProcessingJobName": os.environ["BASELINE_JOB_NAME"], - "ProcessingResources": { - "ClusterConfig": {"InstanceCount": 1, "InstanceType": "ml.m5.xlarge", "VolumeSizeInGB": 20} - }, - "StoppingCondition": {"MaxRuntimeInSeconds": 3600}, - "AppSpecification": {"ImageUri": get_built_in_model_monitor_container_uri(boto3.session.Session().region_name)}, - "Environment": { - "dataset_format": '{"csv": {"header": true, "output_columns_position": "START"}}', - "dataset_source": "/opt/ml/processing/input/baseline_dataset_input", - "output_path": "/opt/ml/processing/output", - "publish_cloudwatch_metrics": "Disabled", - }, - "RoleArn": os.environ["ROLE_ARN"], - "ProcessingJobArn": "arn:processing-job/my-baseline-jobe", - "ProcessingJobStatus": "Completed", - "ExitMessage": "Completed: Job completed successfully with no violations.", - "ProcessingEndTime": datetime(2020, 12, 16), - "ProcessingStartTime": datetime(2020, 12, 16), - "LastModifiedTime": datetime(2020, 12, 16), - "CreationTime": datetime(2020, 12, 16), - "ResponseMetadata": { - "RequestId": "3a485433-aea7-428c-8dc1-e70e9e4994d6", - "HTTPStatusCode": 200, - "HTTPHeaders": { - "x-amzn-requestid": "3a485433-aea7-428c-8dc1-e70e9e4994d6", - "content-type": "application/x-amz-json-1.1", - "content-length": "1637", - "date": "Mon, 21 Dec 2020 16:58:35 GMT", - }, - "RetryAttempts": 0, - }, - } - - -@pytest.fixture -def cp_expected_params_success(): - return {"jobId": "test_job_id"} - - -@pytest.fixture -def cp_expected_params_failure(): - return {"jobId": "test_job_id", "failureDetails": {"message": ANY, "type": "JobFailed"}} - - @pytest.fixture() def event(): return { - "CodePipeline.job": {"id": "test_job_id"}, + "message": "Start data baseline job", } diff --git a/source/lib/blueprints/byom/lambdas/create_data_baseline_job/tests/test_create_data_baseline.py b/source/lib/blueprints/byom/lambdas/create_data_baseline_job/tests/test_create_data_baseline.py index dbef71c..26014ee 100644 --- a/source/lib/blueprints/byom/lambdas/create_data_baseline_job/tests/test_create_data_baseline.py +++ b/source/lib/blueprints/byom/lambdas/create_data_baseline_job/tests/test_create_data_baseline.py @@ -10,24 +10,15 @@ # OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # # and limitations under the License. # # ##################################################################################################################### -import os -from unittest.mock import MagicMock, patch -import pytest -import boto3 -from datetime import datetime +from unittest.mock import patch from moto import mock_sts -import botocore.session -from botocore.stub import Stubber, ANY +from botocore.stub import Stubber from main import handler from shared.helper import get_client, reset_client, get_built_in_model_monitor_container_uri from tests.fixtures.baseline_fixtures import ( mock_env_variables, - sm_describe_processing_job_params, sm_create_baseline_expected_params, sm_create_job_response_200, - sm_describe_job_response, - cp_expected_params_success, - cp_expected_params_failure, event, ) @@ -36,55 +27,21 @@ def test_handler_success( sm_create_baseline_expected_params, sm_create_job_response_200, - sm_describe_job_response, - cp_expected_params_success, - sm_describe_processing_job_params, event, ): sm_client = get_client("sagemaker") - cp_client = get_client("codepipeline") - sm_stubber = Stubber(sm_client) - cp_stubber = Stubber(cp_client) - - cp_response = {} - - # job creation - sm_stubber.add_client_error( - "describe_processing_job", - service_error_code="ProcessingJobExists", - service_message="Could not find requested job with name", - http_status_code=400, - expected_params=sm_describe_processing_job_params, - ) # success path sm_stubber.add_response("create_processing_job", sm_create_job_response_200, sm_create_baseline_expected_params) - sm_stubber.add_response( - "describe_processing_job", - sm_describe_job_response, - sm_describe_processing_job_params, - ) - - cp_stubber.add_response("put_job_success_result", cp_response, cp_expected_params_success) - with sm_stubber: - with cp_stubber: - handler(event, {}) - cp_stubber.assert_no_pending_responses() - reset_client() + handler(event, {}) + reset_client() -def test_handler_exception(): - with patch("boto3.client") as mock_client: - event = { - "CodePipeline.job": {"id": "test_job_id"}, - } - failure_message = { - "message": "Job failed. Check the logs for more info.", - "type": "JobFailed", - } +def test_handler_exception(event): + with patch("boto3.client"): handler(event, context={}) - mock_client().put_job_failure_result.assert_called() + reset_client() diff --git a/source/lib/blueprints/byom/lambdas/create_model_monitoring_schedule/main.py b/source/lib/blueprints/byom/lambdas/create_model_monitoring_schedule/main.py deleted file mode 100644 index 938a68b..0000000 --- a/source/lib/blueprints/byom/lambdas/create_model_monitoring_schedule/main.py +++ /dev/null @@ -1,146 +0,0 @@ -# ##################################################################################################################### -# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # -# # -# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # -# with the License. A copy of the License is located at # -# # -# http://www.apache.org/licenses/LICENSE-2.0 # -# # -# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # -# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # -# and limitations under the License. # -# ##################################################################################################################### -import os -import json -import time -import botocore -import boto3 -from shared.wrappers import code_pipeline_exception_handler -from shared.logger import get_logger -from shared.helper import get_client, get_built_in_model_monitor_container_uri - -logger = get_logger(__name__) - -sm_client = get_client("sagemaker") -cp_client = get_client("codepipeline") - - -@code_pipeline_exception_handler -def handler(event, context): - # Extract the Job ID - job_id = event["CodePipeline.job"]["id"] - baseline_job_name = os.environ["BASELINE_JOB_NAME"] - baseline_job_output_location = os.environ["BASELINE_JOB_OUTPUT_LOCATION"] - monitoring_schedule_name = os.environ["MONITORING_SCHEDULE_NAME"] - assets_bucket = os.environ["ASSETS_BUCKET"] - endpoint_name = os.environ["SAGEMAKER_ENDPOINT_NAME"] - monitoring_output_location = os.environ["MONITORING_OUTPUT_LOCATION"] - schedule_expression = os.environ["SCHEDULE_EXPRESSION"] - instance_type = os.environ["INSTANCE_TYPE"] - instance_volume_size = int(os.environ["INSTANCE_VOLUME_SIZE"]) - role_arn = os.environ["ROLE_ARN"] - monitoring_type = os.environ["MONITORING_TYPE"] - stack_name = os.environ["STACK_NAME"] - # optional value, if the client did not provide a value, the orchestraion lambda sets it to -1 - max_runtime_seconds = int(os.environ["MAX_RUNTIME_SECONDS"]) - if max_runtime_seconds == -1: - max_runtime_seconds = None - - allowed_monitoring_types = { - "dataquality": "DataQuality", - "modelquality": "ModelQuality", - "modelbias": "ModelBias", - "modelexplainability": "ModelExplainability", - } - - monitoring_type = allowed_monitoring_types[monitoring_type] - logger.info(f"Checking if monitoring schedule {monitoring_schedule_name} exists...") - try: - existing_monitoring_schedule = sm_client.describe_monitoring_schedule( - MonitoringScheduleName=monitoring_schedule_name - ) - # Checking if data baseline processing job with the same name exists - if existing_monitoring_schedule["ResponseMetadata"]["HTTPStatusCode"] == 200: - logger.info(f"Monitoring schedule {monitoring_schedule_name} already exists, skipping job creation") - check_monitoring_schedule_status(job_id, existing_monitoring_schedule) - - except botocore.exceptions.ClientError as error: - logger.info(str(error)) - logger.info(f"Monitoring schedule {monitoring_schedule_name} doesn't exist. Creating a new one.") - # Sending request to create Monitoring schedule - response = sm_client.create_monitoring_schedule( - MonitoringScheduleName=monitoring_schedule_name, - MonitoringScheduleConfig={ - "ScheduleConfig": {"ScheduleExpression": schedule_expression}, - "MonitoringJobDefinition": { - "BaselineConfig": { - "ConstraintsResource": { - "S3Uri": f"s3://{assets_bucket}/{baseline_job_output_location}/{baseline_job_name}/constraints.json" - }, - "StatisticsResource": { - "S3Uri": f"s3://{assets_bucket}/{baseline_job_output_location}/{baseline_job_name}/statistics.json" - }, - }, - "MonitoringInputs": [ - { - "EndpointInput": { - "EndpointName": endpoint_name, - "LocalPath": "/opt/ml/processing/input/monitoring_dataset_input", - "S3InputMode": "File", - "S3DataDistributionType": "FullyReplicated", - } - }, - ], - "MonitoringOutputConfig": { - "MonitoringOutputs": [ - { - "S3Output": { - "S3Uri": f"s3://{assets_bucket}/{monitoring_output_location}", - "LocalPath": "/opt/ml/processing/output", - "S3UploadMode": "EndOfJob", - } - }, - ], - }, - "MonitoringResources": { - "ClusterConfig": { - "InstanceCount": 1, - "InstanceType": instance_type, - "VolumeSizeInGB": instance_volume_size, - } - }, - "MonitoringAppSpecification": { - "ImageUri": get_built_in_model_monitor_container_uri(boto3.session.Session().region_name), - }, - "StoppingCondition": {"MaxRuntimeInSeconds": max_runtime_seconds}, - "RoleArn": role_arn, - }, - }, - Tags=[ - {"Key": "stack_name", "Value": stack_name}, - ], - ) - logger.info(f"Finished monitoring Schedule. respons: {response}") - logger.info("Monitoring Schedule Arn: " + response["MonitoringScheduleArn"]) - logger.debug(response) - resp = sm_client.describe_monitoring_schedule(MonitoringScheduleName=monitoring_schedule_name) - check_monitoring_schedule_status(job_id, resp) - - -def check_monitoring_schedule_status(job_id, monitoring_schedule_response): - job_status = monitoring_schedule_response["MonitoringScheduleStatus"] - logger.info("MonitoringScheduleStatus Status: " + job_status) - if job_status == "Pending": - continuation_token = json.dumps({"previous_job_id": job_id}) - logger.info("Putting job continuation") - cp_client.put_job_success_result(jobId=job_id, continuationToken=continuation_token) - elif job_status == "Scheduled": - cp_client.put_job_success_result(jobId=job_id) - else: - cp_client.put_job_failure_result( - jobId=job_id, - failureDetails={ - "message": f"Failed to create Monitoring Schedule. status: {job_status}", - "type": "JobFailed", - }, - ) diff --git a/source/lib/blueprints/byom/lambdas/create_model_monitoring_schedule/tests/fixtures/monitoring_fixtures.py b/source/lib/blueprints/byom/lambdas/create_model_monitoring_schedule/tests/fixtures/monitoring_fixtures.py deleted file mode 100644 index c9b4eba..0000000 --- a/source/lib/blueprints/byom/lambdas/create_model_monitoring_schedule/tests/fixtures/monitoring_fixtures.py +++ /dev/null @@ -1,209 +0,0 @@ -####################################################################################################################### -# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # -# # -# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # -# with the License. A copy of the License is located at # -# # -# http://www.apache.org/licenses/LICENSE-2.0 # -# # -# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # -# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # -# and limitations under the License. # -# ##################################################################################################################### -import os -from datetime import datetime -import boto3 -import pytest -from botocore.stub import ANY -from shared.helper import get_built_in_model_monitor_container_uri - - -@pytest.fixture(autouse=True) -def mock_env_variables(): - new_env = { - "BASELINE_JOB_NAME": "test-baseline-job", - "MONITORING_SCHEDULE_NAME": "test-monitoring-schedule", - "ASSETS_BUCKET": "testbucket", - "SAGEMAKER_ENDPOINT_NAME": "test-model-endpoint", - "MONITORING_OUTPUT_LOCATION": "monitor_output", - "BASELINE_JOB_OUTPUT_LOCATION": "baseline_output", - "SCHEDULE_EXPRESSION": "cron(0 * ? * * *)", - "INSTANCE_TYPE": "ml.m5.4xlarge", - "INSTANCE_VOLUME_SIZE": "20", - "ROLE_ARN": "arn:aws:iam::account:role/myrole", - "MONITORING_TYPE": "dataquality", - "STACK_NAME": "test-stack", - "MAX_RUNTIME_SECONDS": "2600", - } - os.environ = {**os.environ, **new_env} - - -@pytest.fixture -def sm_describe_monitoring_scheduale_params(): - return {"MonitoringScheduleName": os.environ["MONITORING_SCHEDULE_NAME"]} - - -@pytest.fixture -def sm_create_monitoring_expected_params(): - baseline_job_name = os.environ["BASELINE_JOB_NAME"] - baseline_job_output_location = os.environ["BASELINE_JOB_OUTPUT_LOCATION"] - assets_bucket = os.environ["ASSETS_BUCKET"] - monitoring_output_location = os.environ["MONITORING_OUTPUT_LOCATION"] - return { - "MonitoringScheduleName": os.environ["MONITORING_SCHEDULE_NAME"], - "MonitoringScheduleConfig": { - "ScheduleConfig": {"ScheduleExpression": os.environ["SCHEDULE_EXPRESSION"]}, - "MonitoringJobDefinition": { - "BaselineConfig": { - "ConstraintsResource": { - "S3Uri": f"s3://{assets_bucket}/{baseline_job_output_location}/{baseline_job_name}/constraints.json" - }, - "StatisticsResource": { - "S3Uri": f"s3://{assets_bucket}/{baseline_job_output_location}/{baseline_job_name}/statistics.json" - }, - }, - "MonitoringInputs": [ - { - "EndpointInput": { - "EndpointName": os.environ["SAGEMAKER_ENDPOINT_NAME"], - "LocalPath": "/opt/ml/processing/input/monitoring_dataset_input", - "S3InputMode": "File", - "S3DataDistributionType": "FullyReplicated", - } - }, - ], - "MonitoringOutputConfig": { - "MonitoringOutputs": [ - { - "S3Output": { - "S3Uri": f"s3://{assets_bucket}/{monitoring_output_location}", - "LocalPath": "/opt/ml/processing/output", - "S3UploadMode": "EndOfJob", - } - }, - ], - }, - "MonitoringResources": { - "ClusterConfig": { - "InstanceCount": 1, - "InstanceType": os.environ["INSTANCE_TYPE"], - "VolumeSizeInGB": int(os.environ["INSTANCE_VOLUME_SIZE"]), - } - }, - "MonitoringAppSpecification": { - "ImageUri": get_built_in_model_monitor_container_uri(boto3.session.Session().region_name), - }, - "StoppingCondition": {"MaxRuntimeInSeconds": int(os.environ["MAX_RUNTIME_SECONDS"])}, - "RoleArn": os.environ["ROLE_ARN"], - }, - }, - "Tags": [ - {"Key": "stack_name", "Value": os.environ["STACK_NAME"]}, - ], - } - - -@pytest.fixture -def sm_create_monitoring_response_200(): - return { - "ResponseMetadata": {"HTTPStatusCode": 200}, - "MonitoringScheduleArn": "arn:aws:sagemaker:region:account:monitoring-schedule/name", - } - - -@pytest.fixture -def sm_describe_monitoring_schedule_response(): - baseline_job_name = os.environ["BASELINE_JOB_NAME"] - baseline_job_output_location = os.environ["BASELINE_JOB_OUTPUT_LOCATION"] - assets_bucket = os.environ["ASSETS_BUCKET"] - monitoring_output_location = os.environ["MONITORING_OUTPUT_LOCATION"] - return { - "MonitoringScheduleArn": "arn:aws:sagemaker:us-east-1:account:monitoring-schedule/monitoring-schedule", - "MonitoringScheduleName": os.environ["MONITORING_SCHEDULE_NAME"], - "MonitoringScheduleStatus": "Scheduled", - "CreationTime": datetime(2021, 1, 15), - "LastModifiedTime": datetime(2021, 1, 15), - "MonitoringScheduleConfig": { - "ScheduleConfig": {"ScheduleExpression": "cron(0 * ? * * *)"}, - "MonitoringJobDefinition": { - "BaselineConfig": { - "ConstraintsResource": { - "S3Uri": f"s3://{assets_bucket}/{baseline_job_output_location}/{baseline_job_name}/constraints.json" - }, - "StatisticsResource": { - "S3Uri": f"s3://{assets_bucket}/{baseline_job_output_location}/{baseline_job_name}/statistics.json" - }, - }, - "MonitoringInputs": [ - { - "EndpointInput": { - "EndpointName": os.environ["SAGEMAKER_ENDPOINT_NAME"], - "LocalPath": "/opt/ml/processing/input/monitoring_dataset_input", - "S3InputMode": "File", - "S3DataDistributionType": "FullyReplicated", - } - } - ], - "MonitoringOutputConfig": { - "MonitoringOutputs": [ - { - "S3Output": { - "S3Uri": f"s3://{assets_bucket}/{monitoring_output_location}", - "LocalPath": "/opt/ml/processing/output", - "S3UploadMode": "EndOfJob", - } - } - ] - }, - "MonitoringResources": { - "ClusterConfig": { - "InstanceCount": 1, - "InstanceType": os.environ["INSTANCE_TYPE"], - "VolumeSizeInGB": int(os.environ["INSTANCE_VOLUME_SIZE"]), - } - }, - "MonitoringAppSpecification": { - "ImageUri": get_built_in_model_monitor_container_uri(boto3.session.Session().region_name) - }, - "StoppingCondition": {"MaxRuntimeInSeconds": int(os.environ["MAX_RUNTIME_SECONDS"])}, - "RoleArn": os.environ["ROLE_ARN"], - }, - }, - "EndpointName": os.environ["SAGEMAKER_ENDPOINT_NAME"], - "LastMonitoringExecutionSummary": { - "MonitoringScheduleName": os.environ["MONITORING_SCHEDULE_NAME"], - "ScheduledTime": datetime(2021, 1, 15), - "CreationTime": datetime(2021, 1, 15), - "LastModifiedTime": datetime(2021, 1, 15), - "MonitoringExecutionStatus": "CompletedWithViolations", - "EndpointName": os.environ["SAGEMAKER_ENDPOINT_NAME"], - }, - "ResponseMetadata": { - "RequestId": "958ef6e6-f062-44b8-8a9c-c13f25f60050", - "HTTPStatusCode": 200, - "HTTPHeaders": { - "x-amzn-requestid": "958ef6e6-f062-44b8-8a9c-c13f25f60050", - "content-type": "application/x-amz-json-1.1", - "content-length": "2442", - "date": "Fri, 15 Jan 2021 07:33:21 GMT", - }, - "RetryAttempts": 0, - }, - } - - -@pytest.fixture -def cp_expected_params_success(): - return {"jobId": "test_job_id"} - - -@pytest.fixture -def cp_expected_params_failure(): - return {"jobId": "test_job_id", "failureDetails": {"message": ANY, "type": "JobFailed"}} - - -@pytest.fixture() -def event(): - return { - "CodePipeline.job": {"id": "test_job_id"}, - } diff --git a/source/lib/blueprints/byom/lambdas/create_model_monitoring_schedule/tests/test_model_monitoring.py b/source/lib/blueprints/byom/lambdas/create_model_monitoring_schedule/tests/test_model_monitoring.py deleted file mode 100644 index ed177c4..0000000 --- a/source/lib/blueprints/byom/lambdas/create_model_monitoring_schedule/tests/test_model_monitoring.py +++ /dev/null @@ -1,92 +0,0 @@ -####################################################################################################################### -# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # -# # -# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # -# with the License. A copy of the License is located at # -# # -# http://www.apache.org/licenses/LICENSE-2.0 # -# # -# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # -# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # -# and limitations under the License. # -# ##################################################################################################################### -import os -from unittest.mock import MagicMock, patch -import pytest -import boto3 -from datetime import datetime -from moto import mock_sts -import botocore.session -from botocore.stub import Stubber, ANY -from main import handler -from shared.helper import get_client, reset_client, get_built_in_model_monitor_container_uri -from tests.fixtures.monitoring_fixtures import ( - mock_env_variables, - sm_describe_monitoring_scheduale_params, - sm_create_monitoring_expected_params, - sm_create_monitoring_response_200, - sm_describe_monitoring_schedule_response, - cp_expected_params_success, - cp_expected_params_failure, - event, -) - - -@mock_sts -def test_handler_success( - sm_create_monitoring_expected_params, - sm_create_monitoring_response_200, - sm_describe_monitoring_schedule_response, - cp_expected_params_success, - sm_describe_monitoring_scheduale_params, - event, -): - - sm_client = get_client("sagemaker") - cp_client = get_client("codepipeline") - - sm_stubber = Stubber(sm_client) - cp_stubber = Stubber(cp_client) - - cp_response = {} - - # job creation - sm_stubber.add_client_error( - "describe_monitoring_schedule", - service_error_code="MonitorJobExists", - service_message="Could not find requested job with name", - http_status_code=400, - expected_params=sm_describe_monitoring_scheduale_params, - ) - - # success path - sm_stubber.add_response( - "create_monitoring_schedule", sm_create_monitoring_response_200, sm_create_monitoring_expected_params - ) - - sm_stubber.add_response( - "describe_monitoring_schedule", - sm_describe_monitoring_schedule_response, - sm_describe_monitoring_scheduale_params, - ) - - cp_stubber.add_response("put_job_success_result", cp_response, cp_expected_params_success) - - with sm_stubber: - with cp_stubber: - handler(event, {}) - cp_stubber.assert_no_pending_responses() - reset_client() - - -def test_handler_exception(): - with patch("boto3.client") as mock_client: - event = { - "CodePipeline.job": {"id": "test_job_id"}, - } - failure_message = { - "message": "Job failed. Check the logs for more info.", - "type": "JobFailed", - } - handler(event, context={}) - mock_client().put_job_failure_result.assert_called() diff --git a/source/lib/blueprints/byom/lambdas/create_sagemaker_endpoint/main.py b/source/lib/blueprints/byom/lambdas/create_sagemaker_endpoint/main.py deleted file mode 100644 index 23413cf..0000000 --- a/source/lib/blueprints/byom/lambdas/create_sagemaker_endpoint/main.py +++ /dev/null @@ -1,110 +0,0 @@ -# ##################################################################################################################### -# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. # -# # -# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # -# with the License. A copy of the License is located at # -# # -# http://www.apache.org/licenses/LICENSE-2.0 # -# # -# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # -# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # -# and limitations under the License. # -# ##################################################################################################################### -import os -import json -import botocore -import boto3 -from shared.wrappers import code_pipeline_exception_handler -from shared.logger import get_logger -from shared.helper import get_client - -logger = get_logger(__name__) - -sm_client = get_client("sagemaker") -cp_client = get_client("codepipeline") - - -@code_pipeline_exception_handler -def handler(event, context): - # todo: change the way to mock boto3 clients for unit tests without passing clients in input - # Extract the Job ID - job_id = event["CodePipeline.job"]["id"] - - endpointconfig_name = os.environ["model_name"] + "-endpoint-config" - logger.info(f"Checking if sagemaker endpoint config {endpointconfig_name} exists...") - try: - endpointconfig_old = sm_client.describe_endpoint_config(EndpointConfigName=endpointconfig_name) - # Checking if endpoint config with the same name exists - if endpointconfig_old["ResponseMetadata"]["HTTPStatusCode"] == 200: - logger.info(f"Endpoint config {endpointconfig_name} already exists, skipping endpoint creation") - except botocore.exceptions.ClientError as error: - logger.info(str(error)) - logger.info(f"Endpoint config {endpointconfig_name} doesn't exist. Creating a new one.") - # Sending request to create sagemeker endpoint config - response = sm_client.create_endpoint_config( - EndpointConfigName=os.environ["model_name"] + "-endpoint-config", - ProductionVariants=[ - { - "VariantName": os.environ["model_name"] + "-variant", - "ModelName": os.environ["model_name"], - "InitialInstanceCount": 1, - "InstanceType": os.environ["inference_instance"], - }, - ], - DataCaptureConfig={ - "EnableCapture": True, - "InitialSamplingPercentage": 100, - "DestinationS3Uri": f's3://{os.environ["assets_bucket"]}/datacapture', - "CaptureOptions": [{"CaptureMode": "Output"}, {"CaptureMode": "Input"}], - "CaptureContentTypeHeader": {"CsvContentTypes": ["text/csv"]}, - }, - ) - logger.info(f"Finished creating sagemaker endpoint config. respons: {response}") - - # Sending request to create sagemeker endpoint - endpoint_name = os.environ["model_name"] + "-endpoint" - try: - logger.info(f"Checking if endpoint {endpoint_name} exists...") - endpoint_old = sm_client.describe_endpoint(EndpointName=endpoint_name) - # Checking if endpoint with the same name exists - if endpoint_old["ResponseMetadata"]["HTTPStatusCode"] == 200: - logger.info(f"Endpoint {endpoint_name} already exists, skipping endpoint creation") - check_endpoint_status(job_id, endpoint_old, endpoint_name) - - except botocore.exceptions.ClientError as error: - logger.info(str(error)) - logger.info(f"Endpoint {endpoint_name} doesn't exist. Creating a new one") - response = sm_client.create_endpoint( - EndpointName=endpoint_name, - EndpointConfigName=endpointconfig_name, - ) - resp = sm_client.describe_endpoint(EndpointName=endpoint_name) - logger.info("Finished sending request to create sagemaker endpoint") - logger.info("Endpoint Arn: " + resp["EndpointArn"]) - logger.debug(response) - - check_endpoint_status(job_id, resp, endpoint_name) - - -def check_endpoint_status(job_id, endpoint_response, endpoint_name): - endpoint_status = endpoint_response["EndpointStatus"] - logger.info("Endpoint Status: " + endpoint_status) - if endpoint_status == "Creating": - continuation_token = json.dumps({"previous_job_id": job_id}) - logger.info("Putting job continuation") - cp_client.put_job_success_result(jobId=job_id, continuationToken=continuation_token) - elif endpoint_status == "InService": - cp_client.put_job_success_result( - jobId=job_id, - outputVariables={ - "endpointName": endpoint_name, - }, - ) - else: - cp_client.put_job_failure_result( - jobId=job_id, - failureDetails={ - "message": f"Failed to create endpoint. Endpoint status: {endpoint_status}", - "type": "JobFailed", - }, - ) diff --git a/source/lib/blueprints/byom/lambdas/create_sagemaker_endpoint/tests/__init__.py b/source/lib/blueprints/byom/lambdas/create_sagemaker_endpoint/tests/__init__.py deleted file mode 100644 index e69de29..0000000 diff --git a/source/lib/blueprints/byom/lambdas/create_sagemaker_endpoint/tests/test_sagemaker_endpoint.py b/source/lib/blueprints/byom/lambdas/create_sagemaker_endpoint/tests/test_sagemaker_endpoint.py deleted file mode 100644 index 39d5604..0000000 --- a/source/lib/blueprints/byom/lambdas/create_sagemaker_endpoint/tests/test_sagemaker_endpoint.py +++ /dev/null @@ -1,274 +0,0 @@ -####################################################################################################################### -# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # -# # -# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # -# with the License. A copy of the License is located at # -# # -# http://www.apache.org/licenses/LICENSE-2.0 # -# # -# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # -# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # -# and limitations under the License. # -# ##################################################################################################################### -import os -import json -from datetime import datetime -import unittest -from unittest.mock import MagicMock, patch -import pytest -from moto import mock_sts -from botocore.stub import Stubber, ANY -from shared.logger import get_logger -from shared.helper import get_client, reset_client -from main import handler - - -@pytest.fixture(autouse=True) -def mock_env_variables(): - new_env = { - "model_name": "test", - "assets_bucket": "testbucket", - "batch_inference_data": "test", - "inference_instance": "test", - } - os.environ = {**os.environ, **new_env} - - -@pytest.fixture -def sm_describe_endpoint_config_params(): - return {"EndpointConfigName": "test-endpoint-config"} - - -@pytest.fixture -def sm_describe_endpoint_config_response(): - return { - "ResponseMetadata": {"HTTPStatusCode": 200}, - "EndpointConfigName": "string", - "EndpointConfigArn": "arn:aws:sagemaker:region:account:transform-job/name", - "ProductionVariants": [ - { - "VariantName": "string", - "ModelName": "string", - "InitialInstanceCount": 123, - "InstanceType": "ml.t2.medium", - "InitialVariantWeight": 1.0, - "AcceleratorType": "ml.eia1.medium", - }, - ], - "DataCaptureConfig": { - "EnableCapture": True, - "InitialSamplingPercentage": 123, - "DestinationS3Uri": "string", - "KmsKeyId": "string", - "CaptureOptions": [ - {"CaptureMode": "Input"}, - ], - "CaptureContentTypeHeader": { - "CsvContentTypes": [ - "string", - ], - }, - }, - "KmsKeyId": "string", - "CreationTime": datetime(2015, 1, 1), - } - - -@pytest.fixture -def sm_create_endpoint_config_params(): - return { - "EndpointConfigName": "test-endpoint-config", - "ProductionVariants": [ - { - "VariantName": "test-variant", - "ModelName": "test", - "InitialInstanceCount": 1, - "InstanceType": "test", - }, - ], - "DataCaptureConfig": { - "EnableCapture": True, - "InitialSamplingPercentage": 100, - "DestinationS3Uri": f"s3://testbucket/datacapture", - "CaptureOptions": [{"CaptureMode": "Output"}, {"CaptureMode": "Input"}], - "CaptureContentTypeHeader": { - "CsvContentTypes": ["text/csv"], - }, - }, - } - - -@pytest.fixture -def sm_create_endpoint_config_response(): - return { - "ResponseMetadata": {"HTTPStatusCode": 200}, - "EndpointConfigArn": "arn:aws:sagemaker:region:account:transform-job/name", - } - - -@pytest.fixture -def sm_create_endpoint_config_response_500(): - return { - "ResponseMetadata": {"HTTPStatusCode": 200}, - "EndpointConfigArn": "arn:aws:sagemaker:region:account:transform-job/name", - } - - -@pytest.fixture -def sm_describe_endpoint_params(): - return {"EndpointName": "test-endpoint"} - - -@pytest.fixture -def sm_create_endpoint_params(): - return { - "EndpointName": "test-endpoint", - "EndpointConfigName": "test-endpoint-config", - } - - -@pytest.fixture -def sm_create_endpoint_response(): - return { - "ResponseMetadata": {"HTTPStatusCode": 200}, - "EndpointArn": "arn:aws:sagemaker:region:account:endpoint/name", - } - - -@pytest.fixture -def sm_describe_endpoint_response_2(): - return { - "EndpointName": "string", - "EndpointArn": "arn:aws:sagemaker:region:account:endpoint/name", - "EndpointConfigName": "string", - "ProductionVariants": [ - { - "VariantName": "string", - "DeployedImages": [ - { - "SpecifiedImage": "string", - "ResolvedImage": "string", - "ResolutionTime": datetime(2015, 1, 1), - }, - ], - "CurrentWeight": 1.0, - "DesiredWeight": 1.0, - "CurrentInstanceCount": 123, - "DesiredInstanceCount": 123, - }, - ], - "DataCaptureConfig": { - "EnableCapture": True, - "CaptureStatus": "Started", - "CurrentSamplingPercentage": 123, - "DestinationS3Uri": "string", - "KmsKeyId": "string", - }, - "EndpointStatus": "InService", - "FailureReason": "string", - "CreationTime": datetime(2015, 1, 1), - "LastModifiedTime": datetime(2015, 1, 1), - } - - -@pytest.fixture -def cp_expected_params(): - return { - "jobId": "test_job_id", - "outputVariables": { - "endpointName": "test-endpoint", - }, - } - - -@pytest.fixture -def cp_expected_params_failure(): - return { - "jobId": "test_job_id", - "failureDetails": {"message": "Job failed. Check the logs for more info.", "type": "JobFailed"}, - } - - -@pytest.fixture -def event(): - return { - "CodePipeline.job": {"id": "test_job_id"}, - } - - -@mock_sts -def test_handler_success( - sm_describe_endpoint_config_params, - sm_create_endpoint_config_params, - sm_create_endpoint_config_response, - cp_expected_params, - sm_describe_endpoint_params, - sm_create_endpoint_params, - sm_create_endpoint_response, - sm_describe_endpoint_response_2, - event, -): - - sm_client = get_client("sagemaker") - cp_client = get_client("codepipeline") - - sm_stubber = Stubber(sm_client) - cp_stubber = Stubber(cp_client) - - # endpoint config creation - sm_describe_endpoint_config_response = {} - - cp_response = {} - - sm_stubber.add_client_error( - "describe_endpoint_config", - service_error_code="EndpointConfigExists", - service_message="Could not find endpoint configuration", - http_status_code=400, - expected_params=sm_describe_endpoint_config_params, - ) - sm_stubber.add_response( - "create_endpoint_config", - sm_create_endpoint_config_response, - sm_create_endpoint_config_params, - ) - - # endpoint creation - sm_stubber.add_client_error( - "describe_endpoint", - service_error_code="EndpointExists", - service_message="Could not find endpoint", - http_status_code=400, - expected_params=sm_describe_endpoint_params, - ) - - sm_stubber.add_response("create_endpoint", sm_create_endpoint_response, sm_create_endpoint_params) - - sm_stubber.add_response( - "describe_endpoint", - sm_describe_endpoint_response_2, - sm_describe_endpoint_params, - ) - - cp_stubber.add_response("put_job_success_result", cp_response, cp_expected_params) - - expected_log_message = "Sent success message back to codepipeline with job_id: test_job_id" - with sm_stubber: - with cp_stubber: - handler(event, {}) - cp_stubber.assert_no_pending_responses() - reset_client() - - -# wrapper exception -def test_handler_exception(): - with patch("boto3.client") as mock_client: - event = { - "CodePipeline.job": {"id": "test_job_id"}, - } - failure_message = { - "message": "Job failed. Check the logs for more info.", - "type": "JobFailed", - } - handler(event, context={}) - mock_client().put_job_failure_result.assert_called() \ No newline at end of file diff --git a/source/lib/blueprints/byom/lambdas/create_sagemaker_model/.coveragerc b/source/lib/blueprints/byom/lambdas/create_sagemaker_model/.coveragerc deleted file mode 100644 index 721e6ba..0000000 --- a/source/lib/blueprints/byom/lambdas/create_sagemaker_model/.coveragerc +++ /dev/null @@ -1,11 +0,0 @@ -[run] -omit = - tests/* - setup.py - */.venv-test/* - cdk.out/* - conftest.py - test_*.py - *wrappers.py -source = - . \ No newline at end of file diff --git a/source/lib/blueprints/byom/lambdas/create_sagemaker_model/main.py b/source/lib/blueprints/byom/lambdas/create_sagemaker_model/main.py deleted file mode 100644 index cc16413..0000000 --- a/source/lib/blueprints/byom/lambdas/create_sagemaker_model/main.py +++ /dev/null @@ -1,100 +0,0 @@ -# ##################################################################################################################### -# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. # -# # -# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # -# with the License. A copy of the License is located at # -# # -# http://www.apache.org/licenses/LICENSE-2.0 # -# # -# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # -# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # -# and limitations under the License. # -# ##################################################################################################################### -import os -import botocore -import boto3 -import sagemaker -from shared.wrappers import code_pipeline_exception_handler -from shared.logger import get_logger -from shared.helper import get_client - -logger = get_logger(__name__) - -sm_client = get_client("sagemaker") -cp_client = get_client("codepipeline") - - -@code_pipeline_exception_handler -def handler(event, context): - # todo: change the way to mock boto3 clients for unit tests without passing clients in input - - # Extract the Job ID - job_id = event["CodePipeline.job"]["id"] - - logger.info("Creating sagemaker model...") - model_name = os.environ["model_name"] - try: - logger.info(f"Checking if model {model_name} exists") - model_old = sm_client.describe_model(ModelName=model_name) - # Checking if endpoint config with the same name exists - if model_old["ResponseMetadata"]["HTTPStatusCode"] == 200: - logger.info(f"Model {model_name} exists. Deleting the model before creating a new one.") - delete_response = sm_client.delete_model(ModelName=model_name) - logger.info(f"Delete model response: {delete_response}") - logger.info(f"Model {model_name} deleted. Creating the new model.") - create_sm_model(job_id) - except botocore.exceptions.ClientError as error: - logger.info(str(error)) - logger.info(f"Model {model_name} does not exist. Creating the new model.") - create_sm_model(job_id) - - -def create_sm_model(job_id): - # Get Container image uri - container_image_uri = "" - container_params = {} - if os.environ["container_uri"] == "": # using built in model - - container_image_uri = sagemaker.image_uris.retrieve( - framework=os.environ["model_framework"], - region=boto3.session.Session().region_name, - version=os.environ["model_framework_version"], - ) - container_params = { - "Image": container_image_uri, - "ImageConfig": {"RepositoryAccessMode": "Platform"}, - "Mode": "SingleModel", - "ModelDataUrl": os.environ["model_artifact_location"], - } - else: # using custom model - container_image_uri = os.environ["container_uri"] - container_params = { - "Image": container_image_uri, - "ImageConfig": {"RepositoryAccessMode": "Platform"}, - "Mode": "SingleModel", - "ModelDataUrl": os.environ["model_artifact_location"], - } - - logger.debug(f"Got Container image uri: {container_image_uri}") - - # Sending request to create sagemaker model - response = sm_client.create_model( - ModelName=os.environ["model_name"], - PrimaryContainer=container_params, - ExecutionRoleArn=os.environ["create_model_role_arn"], - EnableNetworkIsolation=False, - ) - logger.info("Sent request to create sagemaker model") - logger.debug(response) - - # Send response back to codepipeline success or fail. - if response["ResponseMetadata"]["HTTPStatusCode"] == 200: - cp_client.put_job_success_result(jobId=job_id) - else: - cp_client.put_job_failure_result( - jobId=job_id, - failureDetails={ - "message": "Job failed. Check the logs for more info.", - "type": "JobFailed", - }, - ) diff --git a/source/lib/blueprints/byom/lambdas/create_sagemaker_model/requirements-test.txt b/source/lib/blueprints/byom/lambdas/create_sagemaker_model/requirements-test.txt deleted file mode 100644 index ecf975e..0000000 --- a/source/lib/blueprints/byom/lambdas/create_sagemaker_model/requirements-test.txt +++ /dev/null @@ -1 +0,0 @@ --e . \ No newline at end of file diff --git a/source/lib/blueprints/byom/lambdas/create_sagemaker_model/tests/__init__.py b/source/lib/blueprints/byom/lambdas/create_sagemaker_model/tests/__init__.py deleted file mode 100644 index e69de29..0000000 diff --git a/source/lib/blueprints/byom/lambdas/create_sagemaker_model/tests/test_create_sagemaker_model.py b/source/lib/blueprints/byom/lambdas/create_sagemaker_model/tests/test_create_sagemaker_model.py deleted file mode 100644 index df4614f..0000000 --- a/source/lib/blueprints/byom/lambdas/create_sagemaker_model/tests/test_create_sagemaker_model.py +++ /dev/null @@ -1,171 +0,0 @@ -################################################################################################################## -# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. # -# # -# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # -# with the License. A copy of the License is located at # -# # -# http://www.apache.org/licenses/LICENSE-2.0 # -# # -# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # -# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # -# and limitations under the License. # -# ##################################################################################################################### -import os -from datetime import datetime -from unittest.mock import MagicMock, patch -import pytest -from moto import mock_sts -from botocore.stub import Stubber, ANY -from shared.helper import get_client, reset_client -from main import handler - - -@pytest.fixture(autouse=True) -def mock_env_variables(): - new_env = { - "model_name": "test", - "assets_bucket": "testbucket", - "batch_inference_data": "test", - "inference_instance": "test", - "container_uri": "test", - "model_artifact_location": "test", - "create_model_role_arn": "arn:aws:sagemaker:region:account:model/name", - } - os.environ = {**os.environ, **new_env} - - -@pytest.fixture -def sm_describe_model_expected_params(): - return {"ModelName": "test"} - - -@pytest.fixture -def sm_describe_model_response(): - return { - "ResponseMetadata": {"HTTPStatusCode": 200}, - "ModelName": "string", - "PrimaryContainer": { - "ContainerHostname": "string", - "Image": "string", - "ImageConfig": {"RepositoryAccessMode": "Platform"}, - "Mode": "SingleModel", - "ModelDataUrl": "string", - "Environment": {"string": "string"}, - "ModelPackageName": "string", - }, - "Containers": [ - { - "ContainerHostname": "string", - "Image": "string", - "ImageConfig": {"RepositoryAccessMode": "Platform"}, - "Mode": "SingleModel", - "ModelDataUrl": "string", - "Environment": {"string": "string"}, - "ModelPackageName": "string", - }, - ], - "ExecutionRoleArn": "arn:aws:sagemaker:region:account:model/name", - "VpcConfig": { - "SecurityGroupIds": [ - "string", - ], - "Subnets": [ - "string", - ], - }, - "CreationTime": datetime(2015, 1, 1), - "ModelArn": "arn:aws:sagemaker:region:account:model/name", - "EnableNetworkIsolation": True, - } - - -@pytest.fixture -def sm_delete_model_expected_params(): - return {"ModelName": "test"} - - -@pytest.fixture -def sm_create_model_expected_params(): - return { - "ModelName": "test", - "PrimaryContainer": { - "Image": "test", - "ImageConfig": {"RepositoryAccessMode": "Platform"}, - "ModelDataUrl": "test", - "Mode": "SingleModel", - }, - "ExecutionRoleArn": ANY, - "EnableNetworkIsolation": False, - } - - -@pytest.fixture -def sm_create_model_response(): - return { - "ResponseMetadata": {"HTTPStatusCode": 200}, - "ModelArn": "arn:aws:sagemaker:region:account:model/name", - } - - -@pytest.fixture -def cp_expected_params(): - return {"jobId": "test_job_id"} - - -@pytest.fixture -def event(): - return { - "CodePipeline.job": {"id": "test_job_id"}, - } - - -@mock_sts -def test_handler_success( - sm_describe_model_expected_params, - sm_describe_model_response, - sm_delete_model_expected_params, - sm_create_model_expected_params, - sm_create_model_response, - cp_expected_params, - event, -): - - sm_client = get_client("sagemaker") - cp_client = get_client("codepipeline") - - sm_stubber = Stubber(sm_client) - cp_stubber = Stubber(cp_client) - - # describe model - - sm_stubber.add_response("describe_model", sm_describe_model_response, sm_describe_model_expected_params) - - # delete model - sm_delete_model_response = {} - sm_stubber.add_response("delete_model", sm_delete_model_response, sm_delete_model_expected_params) - - # create model - sm_stubber.add_response("create_model", sm_create_model_response, sm_create_model_expected_params) - - # codepipeline - cp_response = {} - cp_stubber.add_response("put_job_success_result", cp_response, cp_expected_params) - - with sm_stubber: - with cp_stubber: - handler(event, {}) - cp_stubber.assert_no_pending_responses() - reset_client() - - -def test_handler_exception(): - with patch("boto3.client") as mock_client: - event = { - "CodePipeline.job": {"id": "test_job_id"}, - } - failure_message = { - "message": "Job failed. Check the logs for more info.", - "type": "JobFailed", - } - handler(event, context={}) - mock_client().put_job_failure_result.assert_called() diff --git a/source/lib/blueprints/byom/lambdas/create_model_monitoring_schedule/.coveragerc b/source/lib/blueprints/byom/lambdas/create_update_cf_stackset/.coveragerc similarity index 100% rename from source/lib/blueprints/byom/lambdas/create_model_monitoring_schedule/.coveragerc rename to source/lib/blueprints/byom/lambdas/create_update_cf_stackset/.coveragerc diff --git a/source/lib/blueprints/byom/lambdas/create_update_cf_stackset/main.py b/source/lib/blueprints/byom/lambdas/create_update_cf_stackset/main.py new file mode 100644 index 0000000..7b2ff6c --- /dev/null +++ b/source/lib/blueprints/byom/lambdas/create_update_cf_stackset/main.py @@ -0,0 +1,109 @@ +# ##################################################################################################################### +# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +import json +import traceback +from stackset_helpers import ( + find_artifact, + get_template, + put_job_failure, + start_stackset_update_or_create, + check_stackset_update_status, + get_user_params, + setup_s3_client, +) +from shared.logger import get_logger +from shared.helper import get_client + +logger = get_logger(__name__) + +logger.info("Loading stackset helpers...") + +cf_client = get_client("cloudformation") +cp_client = get_client("codepipeline") + + +def lambda_handler(event, context): + """The Lambda function handler + + If a continuing job then checks the CloudFormation stackset and its instances status + and updates the job accordingly. + + If a new job then kick of an update or creation of the target + CloudFormation stackset and its instances. + + Args: + event: The event passed by Lambda + context: The context passed by Lambda + + """ + job_id = None + try: + # Extract the Job ID + job_id = event["CodePipeline.job"]["id"] + + # Extract the Job Data + job_data = event["CodePipeline.job"]["data"] + + # Get user paramameters + # User data is expected to be passed to lambda , for example: + # {"stackset_name": "model2", "artifact":"SourceArtifact", + # "template_file":"realtime-inference-pipeline.yaml", + # "stage_params_file":"staging-config.json", + # "accound_ids":[""], "org_ids":[""], + # "regions":["us-east-1"]} + params = get_user_params(job_data) + + # Get the list of artifacts passed to the function + artifacts = job_data["inputArtifacts"] + # Extract parameters + stackset_name = params["stackset_name"] + artifact = params["artifact"] + template_file = params["template_file"] + stage_params_file = params["stage_params_file"] + accound_ids = params["accound_ids"] + org_ids = params["org_ids"] + regions = params["regions"] + + if "continuationToken" in job_data: + logger.info(f"Ckecking the status of {stackset_name}") + # If we're continuing then the create/update has already been triggered + # we just need to check if it has finished. + check_stackset_update_status(job_id, stackset_name, accound_ids[0], regions[0], cf_client, cp_client) + + else: + logger.info(f"Creating StackSet {stackset_name} and its instances") + # Get the artifact details + artifact_data = find_artifact(artifacts, artifact) + # Get S3 client to access artifact with + s3 = setup_s3_client(job_data) + # Get the JSON template file out of the artifact + template, stage_params = get_template(s3, artifact_data, template_file, stage_params_file) + logger.info(stage_params) + # Kick off a stackset update or create + start_stackset_update_or_create( + job_id, + stackset_name, + template, + json.loads(stage_params), + accound_ids, + org_ids, + regions, + cf_client, + cp_client, + ) + + except Exception as e: + logger.error(f"Error in create_update_cf_stackset lambda functions: {str(e)}") + traceback.print_exc() + put_job_failure(job_id, "Function exception", cp_client) + raise e diff --git a/source/lib/blueprints/byom/lambdas/create_model_monitoring_schedule/requirements-test.txt b/source/lib/blueprints/byom/lambdas/create_update_cf_stackset/requirements-test.txt similarity index 100% rename from source/lib/blueprints/byom/lambdas/create_model_monitoring_schedule/requirements-test.txt rename to source/lib/blueprints/byom/lambdas/create_update_cf_stackset/requirements-test.txt diff --git a/source/lib/blueprints/byom/lambdas/create_model_monitoring_schedule/setup.py b/source/lib/blueprints/byom/lambdas/create_update_cf_stackset/setup.py similarity index 87% rename from source/lib/blueprints/byom/lambdas/create_model_monitoring_schedule/setup.py rename to source/lib/blueprints/byom/lambdas/create_update_cf_stackset/setup.py index 684ffa7..7e34da7 100644 --- a/source/lib/blueprints/byom/lambdas/create_model_monitoring_schedule/setup.py +++ b/source/lib/blueprints/byom/lambdas/create_update_cf_stackset/setup.py @@ -1,5 +1,5 @@ ####################################################################################################################### -# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # # # # Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # # with the License. A copy of the License is located at # @@ -12,4 +12,4 @@ # ##################################################################################################################### from setuptools import setup, find_packages -setup(name="create_model_monitoring_schedule", packages=find_packages()) \ No newline at end of file +setup(name="create_update_cf_stackset", packages=find_packages()) \ No newline at end of file diff --git a/source/lib/blueprints/byom/lambdas/create_update_cf_stackset/stackset_helpers.py b/source/lib/blueprints/byom/lambdas/create_update_cf_stackset/stackset_helpers.py new file mode 100644 index 0000000..c8573b2 --- /dev/null +++ b/source/lib/blueprints/byom/lambdas/create_update_cf_stackset/stackset_helpers.py @@ -0,0 +1,441 @@ +# ##################################################################################################################### +# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +from boto3.session import Session +import json +import zipfile +import tempfile +import botocore +from shared.logger import get_logger +from shared.helper import get_client + +logger = get_logger(__name__) + + +def find_artifact(artifacts, name): + """Finds the artifact 'name' among the 'artifacts' + + Args: + artifacts: The list of artifacts available to the function + name: The artifact we wish to use + Returns: + The artifact dictionary found + Raises: + Exception: If no matching artifact is found + + """ + for artifact in artifacts: + if artifact["name"] == name: + return artifact + + raise Exception(f"Input artifact named {name} not found in lambda's event") + + +def get_template(s3_client, artifact, template_file_in_zip, params_file_in_zip): + """Gets the template artifact + + Downloads the artifact from the S3 artifact store to a temporary file + then extracts the zip and returns the file containing the CloudFormation + template and temaplate parameters. + + Args: + s3_client: boto3 configured S3 client + artifact: The artifact to download + template_file_in_zip: The path to the file within the zip containing the template + params_file_in_zip: The path to the file within the zip containing the template parameters + + Returns: + The (CloudFormation template as a string, template paramaters as json) + + Raises: + Exception: Any exception thrown while downloading the artifact or unzipping it + + """ + bucket = artifact["location"]["s3Location"]["bucketName"] + key = artifact["location"]["s3Location"]["objectKey"] + + with tempfile.NamedTemporaryFile() as tmp_file: + s3_client.download_file(bucket, key, tmp_file.name) + with zipfile.ZipFile(tmp_file.name, "r") as zip: + template = zip.read(template_file_in_zip).decode() + params = zip.read(params_file_in_zip).decode() + return (template, params) + + +def update_stackset(stackset_name, template, parameters, org_ids, regions, cf_client): + """Start a CloudFormation stack update + + Args: + stackset_name: The stackset name to update + template: The template to apply + parameters: template parameters + org_ids: list of target org_ids + regions: list of target regions + cf_client: Boto3 CloudFormation client + + Returns: + True if an update was started, false if there were no changes + to the template since the last update. + + Raises: + Exception: Any exception besides "No updates are to be performed." + + """ + try: + cf_client.update_stack_set( + StackSetName=stackset_name, + TemplateBody=template, + Parameters=parameters, + Capabilities=["CAPABILITY_NAMED_IAM"], + PermissionModel="SERVICE_MANAGED", + # If PermissionModel="SERVICE_MANAGED", "OrganizationalUnitIds" must be used + # If PermissionModel="SELF_MANAGED", "AccountIds" must be used + DeploymentTargets={"OrganizationalUnitIds": org_ids}, + AutoDeployment={"Enabled": False}, + Regions=regions, + ) + return True + + except botocore.exceptions.ClientError as e: + logger.error(f"Error updating CloudFormation StackSet {stackset_name}. Error message: {str(e)}") + raise e + + +def stackset_exists(stackset_name, cf_client): + """Check if a stack exists or not + + Args: + stackset_name: The stackset name to check + cf_client: Boto3 CloudFormation client + + Returns: + True or False depending on whether the stack exists + + Raises: + Any exceptions raised .describe_stack_set() besides that + the stackset doesn't exist. + + """ + try: + logger.info(f"Checking if StackSet {stackset_name} exits.") + cf_client.describe_stack_set(StackSetName=stackset_name) + return True + except botocore.exceptions.ClientError as e: + if f"StackSet {stackset_name} not found" in e.response["Error"]["Message"]: + logger.info(f"StackSet {stackset_name} does not exist.") + return False + else: + raise e + + +def create_stackset_and_instances(stackset_name, template, parameteres, org_ids, regions, cf_client): + """Starts a new CloudFormation stackset and its instances creation + + Args: + stackset_name: The stackset to be created + template: The template for the stackset to be created with + parameters: template parameters + org_ids: list of target org_ids + regions: list of target regions + cf_client: Boto3 CloudFormation client + + Throws: + Exception: Any exception thrown by .create_stack_set() or .create_stack_instances() + """ + try: + logger.info(f"creating stackset {stackset_name}") + # create StackSet first + cf_client.create_stack_set( + StackSetName=stackset_name, + TemplateBody=template, + Parameters=parameteres, + Capabilities=["CAPABILITY_NAMED_IAM"], + PermissionModel="SERVICE_MANAGED", + AutoDeployment={"Enabled": False}, + ) + + # Then create StackSet instances + logger.info(f"creating instances for {stackset_name} StckSet") + cf_client.create_stack_instances( + StackSetName=stackset_name, + DeploymentTargets={"OrganizationalUnitIds": org_ids}, + Regions=regions, + ) + + except botocore.exceptions.ClientError as e: + logger.error(f"Error creating StackSet {stackset_name} and its inatances") + raise e + + +def get_stackset_instance_status(stackset_name, stack_instance_account_id, region, cf_client): + """Get the status of an existing CloudFormation stackset's instance + + Args: + stackset_name: The name of the stackset to check + stack_instance_account_id: the account id, where the stack instance is deployed + region: The region of the stackset's instance + cf_client: Boto3 CloudFormation client + + Returns: + The CloudFormation status string of the stackset instance + ('PENDING'|'RUNNING'|'SUCCEEDED'|'FAILED'|'CANCELLED'|'INOPERABLE') + + Raises: + Exception: Any exception thrown by .describe_stack_instance() + + """ + try: + logger.info(f"Checking the status of {stackset_name} instance") + stack_instance_description = cf_client.describe_stack_instance( + StackSetName=stackset_name, StackInstanceAccount=stack_instance_account_id, StackInstanceRegion=region + ) + # Status could be on of 'PENDING'|'RUNNING'|'SUCCEEDED'|'FAILED'|'CANCELLED'|'INOPERABLE' + return stack_instance_description["StackInstance"]["StackInstanceStatus"]["DetailedStatus"] + + except botocore.exceptions.ClientError as e: + logger.error( + f"Error describing StackSet {stackset_name} instance in {region} for account {stack_instance_account_id}" + ) + raise e + + +def put_job_success(job_id, message, cp_client): + """Notify CodePipeline of a successful job + + Args: + job_id: The CodePipeline job ID + message: A message to be logged relating to the job status + cp_client: Boto3 CodePipeline client + + Raises: + Exception: Any exception thrown by .put_job_success_result() + + """ + logger.info(f"Putting job success for jobId: {job_id} with message: {message}") + cp_client.put_job_success_result(jobId=job_id) + + +def put_job_failure(job_id, message, cp_client): + """Notify CodePipeline of a failed job + + Args: + job_id: The CodePipeline job ID + message: A message to be logged relating to the job status + cp_client: Boto3 CodePipeline client + + Raises: + Exception: Any exception thrown by .put_job_failure_result() + + """ + logger.info(f"Putting job failure for jobId: {job_id} with message: {message}") + cp_client.put_job_failure_result(jobId=job_id, failureDetails={"message": message, "type": "JobFailed"}) + + +def put_job_continuation(job_id, message, cp_client): + """Notify CodePipeline of a continuing job + + This will cause CodePipeline to invoke the function again with the + supplied continuation token. + + Args: + job_id: The JobID + message: A message to be logged relating to the job status + continuation_token: The continuation token + cp_client: Boto3 CodePipeline client + + Raises: + Exception: Any exception thrown by .put_job_success_result() + + """ + logger.info(f"Putting continuation token for jobId: {job_id} with message: {message}") + # This data will be available when a new job is scheduled to continue the current execution + continuation_token = json.dumps({"previous_job_id": job_id}) + cp_client.put_job_success_result(jobId=job_id, continuationToken=continuation_token) + + +def start_stackset_update_or_create( + job_id, # NOSONAR:S107 this function is designed to take many arguments + stackset_name, + template, + parameteres, + stack_instance_account_ids, + org_ids, + regions, + cf_client, + cp_client, +): + """Starts the stackset update or create process + + If the stackset exists then update, otherwise create. + + Args: + job_id: The ID of the CodePipeline job + stackset_name: The stackset to create or update + template: The template to create/update the stackset with + parameters: template parameters + stack_instance_account_ids: list of target account ids + org_ids: list of target org_ids + regions: list of target regions + cf_client: Boto3 CloudFormation client + cp_client: Boto3 CodePipeline client + + """ + if stackset_exists(stackset_name, cf_client): + logger.info(f"Stackset {stackset_name} exists") + status = get_stackset_instance_status(stackset_name, stack_instance_account_ids[0], regions[0], cf_client) + # If the CloudFormation stackset instance is not in a 'SUCCEEDED' state, it can not be updated + if status != "SUCCEEDED": + # if the StackSet instance in a failed state, fail the job + put_job_failure( + job_id, + ( + f"StackSet cannot be updated when status is: {status}. Delete the faild stackset/instance," + " fix the issue, and retry." + ), + cp_client, + ) + return + + # Update the StackSet and its instances + were_updates = update_stackset(stackset_name, template, parameteres, org_ids, regions, cf_client) + + if were_updates: + # If there were updates then continue the job so it can monitor + # the progress of the update. + logger.info(f"Starting update for {stackset_name} StackSet") + put_job_continuation(job_id, "StackSet update started", cp_client) + + else: + # If there were no updates then succeed the job immediately + logger.info(f"No updates for {stackset_name} StackSet") + put_job_success(job_id, "There were no StackSet updates", cp_client) + else: + # If the StackSet doesn't already exist then create it and its instances + create_stackset_and_instances(stackset_name, template, parameteres, org_ids, regions, cf_client) + logger.info(f"Creatiation of {stackset_name} StackSet and its instances started") + # Continue the job so the pipeline will wait for the StackSet and its instances to be created + put_job_continuation(job_id, "StackSet and its instances creatiation started", cp_client) + + +def check_stackset_update_status(job_id, stackset_name, stack_instance_account_id, region, cf_client, cp_client): + """Monitor an already-running CloudFormation StackSet and its instance update/create + + Succeeds, fails or continues the job depending on the stack status. + + Args: + job_id: The CodePipeline job ID + stackset_name: The stackset to monitor + stack_instance_account_id: the account id + region: The region, where the StackSet's instance is deployed + cf_client: Boto3 CloudFormation client + cp_client: Boto3 CodePipeline client + + """ + status = get_stackset_instance_status(stackset_name, stack_instance_account_id, region, cf_client) + if status == "SUCCEEDED": + # If the update/create finished successfully then + # succeed the job and don't continue. + put_job_success(job_id, "StackSet and its instance update complete", cp_client) + + elif status in [ + "RUNNING", + "PENDING", + ]: + # If the job isn't finished yet then continue it + put_job_continuation(job_id, "StackSet update still in progress", cp_client) + + else: + # The stackSet update/create has failed so end the job with + # a failed result. + put_job_failure(job_id, f"Update failed: {status}", cp_client) + + +def validate_user_params(decoded_params, list_of_required_params): + """Validate user provided parameters via codepipline event + + Raise an exception if one of the required parameters is missing. + + Args: + decoded_params: json object of user parameters passed via codepipline's event + list_of_required_params: list of reqyured parameters + + Raises: + Your UserParameters JSON must include + """ + for param in list_of_required_params: + if param not in decoded_params: + raise Exception(f"Your UserParameters JSON must include {param}") + + +def get_user_params(job_data): + """Decodes the JSON user parameters passed by codepipeline's event. + + Args: + job_data: The job data structure containing the UserParameters string which should be a valid JSON structure + + Returns: + The JSON parameters decoded as a dictionary. + + Raises: + Exception: The JSON can't be decoded. + + """ + required_params = [ + "stackset_name", + "artifact", + "template_file", + "stage_params_file", + "accound_ids", + "org_ids", + "regions", + ] + try: + # Get the user parameters which contain the stackset_name, artifact, template_name, + # stage_params, accound_ids, org_ids, and regions + user_parameters = job_data["actionConfiguration"]["configuration"]["UserParameters"] + decoded_parameters = json.loads(user_parameters) + + except Exception as e: + # We're expecting the user parameters to be encoded as JSON + # so we can pass multiple values. If the JSON can't be decoded + # then fail the job with a helpful message. + raise Exception("UserParameters could not be decoded as JSON", e) + + # Validate required params were provided + validate_user_params( + decoded_parameters, + required_params, + ) + + return decoded_parameters + + +def setup_s3_client(job_data): + """Creates an S3 client + + Uses the credentials passed in the event by CodePipeline. These + credentials can be used to access the artifact bucket. + + Args: + job_data: The job data structure + + Returns: + An S3 client with the appropriate credentials + + """ + key_id = job_data["artifactCredentials"]["accessKeyId"] + key_secret = job_data["artifactCredentials"]["secretAccessKey"] + session_token = job_data["artifactCredentials"]["sessionToken"] + + session = Session(aws_access_key_id=key_id, aws_secret_access_key=key_secret, aws_session_token=session_token) + + return session.client("s3", config=botocore.client.Config(signature_version="s3v4")) diff --git a/source/lib/blueprints/byom/lambdas/configure_inference_lambda/tests/__init__.py b/source/lib/blueprints/byom/lambdas/create_update_cf_stackset/tests/__init__.py similarity index 100% rename from source/lib/blueprints/byom/lambdas/configure_inference_lambda/tests/__init__.py rename to source/lib/blueprints/byom/lambdas/create_update_cf_stackset/tests/__init__.py diff --git a/source/lib/blueprints/byom/lambdas/create_update_cf_stackset/tests/fixtures/stackset_fixtures.py b/source/lib/blueprints/byom/lambdas/create_update_cf_stackset/tests/fixtures/stackset_fixtures.py new file mode 100644 index 0000000..febd2a3 --- /dev/null +++ b/source/lib/blueprints/byom/lambdas/create_update_cf_stackset/tests/fixtures/stackset_fixtures.py @@ -0,0 +1,181 @@ +####################################################################################################################### +# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +import json +import pytest + + +@pytest.fixture() +def stackset_name(): + return "mlops-stackset" + + +@pytest.fixture() +def mocked_org_ids(): + return ["Test-Org-Id"] + + +@pytest.fixture() +def mocked_account_ids(): + return ["Test_Account_Id"] + + +@pytest.fixture() +def mocked_regions(): + return ["us-east-1"] + + +@pytest.fixture() +def mocked_job_id(): + return "mocked_job_id" + + +@pytest.fixture() +def mocked_cp_success_message(): + return "StackSet Job SUCCEEDED" + + +@pytest.fixture() +def mocked_cp_failure_message(): + return "StackSet Job Failed" + + +@pytest.fixture() +def mocked_cp_continuation_message(): + return "StackSet Job is continued" + + +@pytest.fixture() +def required_user_params(): + return [ + "stackset_name", + "artifact", + "template_file", + "stage_params_file", + "accound_ids", + "org_ids", + "regions", + ] + + +@pytest.fixture() +def mocked_decoded_parameters(): + return { + "stackset_name": "model2", + "artifact": "SourceArtifact", + "template_file": "template.yaml", + "stage_params_file": "staging-config-test.json", + "accound_ids": ["moceked_account_id"], + "org_ids": ["mocked_org_unit_id"], + "regions": ["us-east-1"], + } + + +@pytest.fixture() +def mocked_codepipeline_event(mocked_decoded_parameters): + return { + "CodePipeline.job": { + "id": "11111111-abcd-1111-abcd-111111abcdef", + "accountId": "test-account-id", + "data": { + "actionConfiguration": { + "configuration": { + "FunctionName": "stacketset-lambda", + "UserParameters": json.dumps(mocked_decoded_parameters), + } + }, + "inputArtifacts": [ + { + "location": { + "s3Location": { + "bucketName": "test-bucket", + "objectKey": "template.zip", + }, + "type": "S3", + }, + "revision": None, + "name": "SourceArtifact", + } + ], + "outputArtifacts": [], + "artifactCredentials": { + "secretAccessKey": "test-secretkey", + "sessionToken": "test-tockedn", + "accessKeyId": "test-accesskey", + }, + }, + } + } + + +@pytest.fixture() +def mocked_invalid_user_parms(mocked_decoded_parameters): + return { + "CodePipeline.job": { + "id": "11111111-abcd-1111-abcd-111111abcdef", + "accountId": "test-account-id", + "data": { + "actionConfiguration": { + "configuration": { + "FunctionName": "stacketset-lambda", + "UserParameters": mocked_decoded_parameters, + } + } + }, + } + } + + +@pytest.fixture() +def mocked_template_parameters(): + return json.dumps( + [ + {"ParameterKey": "TagDescription", "ParameterValue": "StackSetValue"}, + {"ParameterKey": "TagName", "ParameterValue": "StackSetValue2"}, + ] + ) + + +@pytest.fixture() +def mocked_template(): + template = """--- + AWSTemplateFormatVersion: 2010-09-09 +Description: Stack1 with yaml template +Parameters: + TagDescription: + Type: String + TagName: + Type: String +Resources: + EC2Instance1: + Type: AWS::EC2::Instance + Properties: + ImageId: ami-03cf127a + KeyName: dummy + InstanceType: t2.micro + Tags: + - Key: Description + Value: + Ref: TagDescription + - Key: Name + Value: !Ref TagName + """ + return template + + +@pytest.fixture(scope="function") +def mocked_stackset(cf_client, stackset_name, mocked_template_parameters): + cf_client.create_stack_set( + StackSetName=stackset_name, + TemplateBody=stackset_name, + Parameters=mocked_template_parameters, + ) \ No newline at end of file diff --git a/source/lib/blueprints/byom/lambdas/create_update_cf_stackset/tests/test_create_update_cf_stackset.py b/source/lib/blueprints/byom/lambdas/create_update_cf_stackset/tests/test_create_update_cf_stackset.py new file mode 100644 index 0000000..4d56b21 --- /dev/null +++ b/source/lib/blueprints/byom/lambdas/create_update_cf_stackset/tests/test_create_update_cf_stackset.py @@ -0,0 +1,540 @@ +####################################################################################################################### +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +import boto3 +import json +import tempfile +import pytest +from botocore.stub import Stubber +import botocore.session +from tests.fixtures.stackset_fixtures import ( + stackset_name, + mocked_template_parameters, + mocked_template, + mocked_org_ids, + mocked_account_ids, + mocked_regions, + mocked_job_id, + mocked_cp_success_message, + mocked_cp_failure_message, + mocked_cp_continuation_message, + required_user_params, + mocked_decoded_parameters, + mocked_codepipeline_event, + mocked_invalid_user_parms, +) +from moto import mock_cloudformation, mock_s3 +from unittest.mock import patch +from stackset_helpers import ( + find_artifact, + get_template, + update_stackset, + stackset_exists, + create_stackset_and_instances, + get_stackset_instance_status, + put_job_success, + put_job_failure, + put_job_continuation, + start_stackset_update_or_create, + check_stackset_update_status, + validate_user_params, + get_user_params, + setup_s3_client, +) +from main import lambda_handler + +cp_job = "CodePipeline.job" +client_to_patch = "boto3.client" + + +@mock_cloudformation +def test_create_stackset_and_instances( + stackset_name, + mocked_template, + mocked_template_parameters, + mocked_org_ids, + mocked_account_ids, + mocked_regions, +): + cf_client = boto3.client("cloudformation", region_name=mocked_regions[0]) + create_stackset_and_instances( + stackset_name, + mocked_template, + json.loads(mocked_template_parameters), + mocked_org_ids, + mocked_regions, + cf_client, + ) + stacksets = cf_client.list_stack_sets() + # print(stacksets) + # assert one StackSet has been created + assert len(stacksets["Summaries"]) == 1 + # assert the created name has the passed name + assert stacksets["Summaries"][0]["StackSetName"] == stackset_name + # assert the status of the stackset is ACTIVE + assert stacksets["Summaries"][0]["Status"] == "ACTIVE" + + # describe stackset instance + instance = cf_client.describe_stack_instance( + StackSetName=stackset_name, + StackInstanceAccount=mocked_account_ids[0], + StackInstanceRegion=mocked_regions[0], + ) + assert instance["ResponseMetadata"]["HTTPStatusCode"] == 200 + + # assert the function will throw an exception + with pytest.raises(Exception): + create_stackset_and_instances( + stackset_name, + mocked_template, + mocked_template_parameters, + mocked_org_ids, + mocked_regions, + cf_client, + ) + + +def test_create_stackset_and_instances_client_error( + stackset_name, + mocked_template, + mocked_template_parameters, + mocked_org_ids, + mocked_account_ids, + mocked_regions, +): + cf_client = boto3.client("cloudformation", region_name=mocked_regions[0]) + + with pytest.raises(botocore.exceptions.ClientError): + create_stackset_and_instances( + stackset_name, + mocked_template, + json.loads(mocked_template_parameters), + mocked_org_ids, + mocked_regions, + cf_client, + ) + + +def test_get_stackset_instance_status_client_error( + stackset_name, + mocked_template, + mocked_template_parameters, + mocked_org_ids, + mocked_account_ids, + mocked_regions, +): + + cf_client = boto3.client("cloudformation", region_name=mocked_regions[0]) + with pytest.raises(botocore.exceptions.ClientError): + get_stackset_instance_status(stackset_name, mocked_account_ids[0], mocked_regions[0], cf_client) + + +@mock_cloudformation +def test_get_stackset_instance_status( + stackset_name, + mocked_template, + mocked_template_parameters, + mocked_org_ids, + mocked_account_ids, + mocked_regions, +): + + cf_client = boto3.client("cloudformation", region_name=mocked_regions[0]) + # create a mocked stackset and instance + create_stackset_and_instances( + stackset_name, + mocked_template, + json.loads(mocked_template_parameters), + mocked_org_ids, + mocked_regions, + cf_client, + ) + # should throw an KeyError exception + with pytest.raises(KeyError): + get_stackset_instance_status(stackset_name, mocked_account_ids[0], mocked_regions[0], cf_client) + + +@mock_cloudformation +def test_update_stackset( + stackset_name, + mocked_template, + mocked_template_parameters, + mocked_org_ids, + mocked_account_ids, + mocked_regions, +): + cf_client = boto3.client("cloudformation", region_name=mocked_regions[0]) + # Case 1: stack exists and there is an update + # create a mocked stackset + cf_client.create_stack_set( + StackSetName=stackset_name, + TemplateBody=mocked_template, + Parameters=json.loads(mocked_template_parameters), + ) + res = update_stackset( + stackset_name, + mocked_template, + json.loads(mocked_template_parameters), + mocked_org_ids, + mocked_regions, + cf_client, + ) + assert res is True + + # Case 2: stack exists and there is no update + with pytest.raises(Exception): + update_stackset( + stackset_name, mocked_template, mocked_template_parameters, mocked_org_ids, mocked_regions, cf_client + ) + + +def test_update_stackset_error( + stackset_name, + mocked_template, + mocked_template_parameters, + mocked_org_ids, + mocked_account_ids, + mocked_regions, +): + cf_client = boto3.client("cloudformation", region_name=mocked_regions[0]) + with pytest.raises(Exception): + update_stackset( + stackset_name, + mocked_template, + json.loads(mocked_template_parameters), + mocked_org_ids, + mocked_regions, + cf_client, + ) + + +@mock_cloudformation +def test_stackset_exists(stackset_name, mocked_template, mocked_template_parameters, mocked_regions): + cf_client = boto3.client("cloudformation", region_name=mocked_regions[0]) + # assert the stackset does not exist + with pytest.raises(Exception): + stackset_exists(stackset_name, cf_client) + + # create mocked stackset + cf_client.create_stack_set( + StackSetName=stackset_name, + TemplateBody=mocked_template, + Parameters=json.loads(mocked_template_parameters), + ) + # assert the stackset does exist + assert stackset_exists(stackset_name, cf_client) is True + + # assert for other exceptions (e.g. cf_client is None) + with pytest.raises(Exception): + stackset_exists(stackset_name, None) + + +def test_put_job_success_faiure(mocked_job_id, mocked_cp_success_message, mocked_regions): + with patch(client_to_patch) as patched_client: + cp_client = boto3.client("codepipeline", region_name=mocked_regions[0]) + put_job_success(mocked_job_id, mocked_cp_success_message, cp_client) + # assert the put_job_success_result is called + patched_client().put_job_success_result.assert_called_once() + # assert the function is called passed arguments + patched_client().put_job_success_result.assert_called_with(jobId=mocked_job_id) + + +def test_put_job_faiure(mocked_job_id, mocked_cp_failure_message, mocked_regions): + with patch(client_to_patch) as patched_client: + cp_client = boto3.client("codepipeline", region_name=mocked_regions[0]) + put_job_failure(mocked_job_id, mocked_cp_failure_message, cp_client) + # assert the put_job_failure_result is called + patched_client().put_job_failure_result.assert_called_once() + # assert the function is called passed arguments + patched_client().put_job_failure_result.assert_called_with( + jobId=mocked_job_id, failureDetails={"message": mocked_cp_failure_message, "type": "JobFailed"} + ) + + +def test_put_job_continuation(mocked_job_id, mocked_cp_continuation_message, mocked_regions): + with patch(client_to_patch) as patched_client: + cp_client = boto3.client("codepipeline", region_name=mocked_regions[0]) + put_job_continuation(mocked_job_id, mocked_cp_continuation_message, cp_client) + # assert the put_job_success_result is called + patched_client().put_job_success_result.assert_called_once() + # assert the function is called passed arguments + continuation_token = json.dumps({"previous_job_id": mocked_job_id}) + patched_client().put_job_success_result.assert_called_with( + jobId=mocked_job_id, continuationToken=continuation_token + ) + + +@patch("stackset_helpers.put_job_failure") +@patch("stackset_helpers.put_job_continuation") +@patch("stackset_helpers.put_job_success") +@patch("stackset_helpers.get_stackset_instance_status") +def test_check_stackset_update_status( + mocked_get_stackset_instance_status, # NOSONAR:S107 this test function is designed to take many fixtures + mocked_put_job_success, + mocked_put_job_continuation, + mocked_put_job_failure, + mocked_job_id, + stackset_name, + mocked_account_ids, + mocked_regions, +): + # Case 1: asserting first branch if status == "SUCCEEDED" + mocked_get_stackset_instance_status.return_value = "SUCCEEDED" + check_stackset_update_status(mocked_job_id, stackset_name, mocked_account_ids[0], mocked_regions[0], None, None) + # assert get_stackset_instance_status function is called + mocked_get_stackset_instance_status.assert_called_once() + # assert get_stackset_instance_status is called with the passed arguments + mocked_get_stackset_instance_status.assert_called_with( + stackset_name, mocked_account_ids[0], mocked_regions[0], None + ) + # assert the put_job_success is called + mocked_put_job_success.assert_called_once() + # assert it was called with the exoected arguments + mocked_put_job_success.assert_called_with(mocked_job_id, "StackSet and its instance update complete", None) + + # Case 2: asserting for the second branch status in ["RUNNING","PENDING"]: + mocked_get_stackset_instance_status.return_value = "RUNNING" + check_stackset_update_status(mocked_job_id, stackset_name, mocked_account_ids[0], mocked_regions[0], None, None) + # assert get_stackset_instance_status function is called + mocked_get_stackset_instance_status.assert_called() + # assert get_stackset_instance_status is called with the passed arguments + mocked_get_stackset_instance_status.assert_called_with( + stackset_name, mocked_account_ids[0], mocked_regions[0], None + ) + # assert the put_job_continuation is called + mocked_put_job_continuation.assert_called_once() + # assert it was called with the exoected arguments + mocked_put_job_continuation.assert_called_with(mocked_job_id, "StackSet update still in progress", None) + + # Case 3: asserting for the last branch status not one of ["RUNNING","PENDING", "SUCCEEDED"]: + mocked_get_stackset_instance_status.return_value = "FAILED" + check_stackset_update_status(mocked_job_id, stackset_name, mocked_account_ids[0], mocked_regions[0], None, None) + # assert get_stackset_instance_status function is called + mocked_get_stackset_instance_status.assert_called() + # assert get_stackset_instance_status is called with the passed arguments + mocked_get_stackset_instance_status.assert_called_with( + stackset_name, mocked_account_ids[0], mocked_regions[0], None + ) + # assert the put_job_continuation is called + mocked_put_job_failure.assert_called_once() + # assert it was called with the exoected arguments + mocked_put_job_failure.assert_called_with(mocked_job_id, "Update failed: FAILED", None) + + +def test_validate_user_params(required_user_params, mocked_decoded_parameters): + # assert function will throw an exception if a required parameter is missing (e.g. template_file) + required_parm = "template_file" + decoded_parameters = mocked_decoded_parameters + # remove the required parameter template_file + del decoded_parameters[required_parm] + with pytest.raises(Exception) as validation_error: + validate_user_params(decoded_parameters, required_user_params) + # assert the error message + assert f"Your UserParameters JSON must include {required_parm}" in str(validation_error.value) + + +def test_get_user_params( + required_user_params, mocked_decoded_parameters, mocked_codepipeline_event, mocked_invalid_user_parms +): + # get the job data + job_data = mocked_codepipeline_event[cp_job]["data"] + # assert the user parameters are decoded correctly + params = get_user_params(job_data) + assert params == mocked_decoded_parameters + # assert the decoding will fail if teh input can not be decoded + with pytest.raises(Exception) as decoding_error: + get_user_params(mocked_invalid_user_parms) + # assert the error message + assert "UserParameters could not be decoded as JSON" in str(decoding_error.value) + + +@patch("stackset_helpers.create_stackset_and_instances") +@patch("stackset_helpers.put_job_success") +@patch("stackset_helpers.put_job_continuation") +@patch("stackset_helpers.update_stackset") +@patch("stackset_helpers.put_job_failure") +@patch("stackset_helpers.get_stackset_instance_status") +@patch("stackset_helpers.stackset_exists") +def test_start_stackset_update_or_create( + mocked_stackset_exists, # NOSONAR:S107 this test function is designed to take many fixtures + mocked_get_stackset_instance_status, + mocked_put_job_failure, + mocked_update_stackset, + mocked_put_job_continuation, + mocked_put_job_success, + mocked_create_stackset_and_instances, + mocked_job_id, + stackset_name, + mocked_template, + mocked_template_parameters, + mocked_org_ids, + mocked_account_ids, + mocked_regions, +): + # Case 1: stack exists and status != "SUCCEEDED" + mocked_stackset_exists.return_value = True + mocked_get_stackset_instance_status.return_value = "FAILED" + # Call the function + start_stackset_update_or_create( + mocked_job_id, + stackset_name, + mocked_template, + mocked_template_parameters, + mocked_account_ids, + mocked_org_ids, + mocked_regions, + None, + None, + ) + mocked_stackset_exists.assert_called() + mocked_get_stackset_instance_status.assert_called() + mocked_put_job_failure.assert_called() + + # Case 2: stack exists, status == "SUCCEEDED" and update_stackset returns True + mocked_get_stackset_instance_status.return_value = "SUCCEEDED" + mocked_update_stackset.return_value = True + # call the function + start_stackset_update_or_create( + mocked_job_id, + stackset_name, + mocked_template, + json.loads(mocked_template_parameters), + mocked_account_ids, + mocked_org_ids, + mocked_regions, + None, + None, + ) + # the update_stackset should be called. Returns True + mocked_update_stackset.assert_called() + # since there are updates to the stackset, put_job_continuation should be called + mocked_put_job_continuation.assert_called() + + # Case 3: stack exists, status == "SUCCEEDED" and update_stackset returns False (no updates to be performed) + mocked_update_stackset.return_value = False + # call the function + start_stackset_update_or_create( + mocked_job_id, + stackset_name, + mocked_template, + json.loads(mocked_template_parameters), + mocked_account_ids, + mocked_org_ids, + mocked_regions, + None, + None, + ) + # the update_stackset should be called. Returns True + mocked_update_stackset.assert_called() + # since there are updates to the stackset, put_job_success should be called + mocked_put_job_success.assert_called() + + # Case 4: stack does not exist + mocked_stackset_exists.return_value = False + # call the function + start_stackset_update_or_create( + mocked_job_id, + stackset_name, + mocked_template, + json.loads(mocked_template_parameters), + mocked_account_ids, + mocked_org_ids, + mocked_regions, + None, + None, + ) + # The create_stackset_and_instances should be called + mocked_create_stackset_and_instances.assert_called() + # Since stackset and its instnace creation has started, put_job_continuation should be called + mocked_put_job_continuation.assert_called() + + +def test_find_artifact(mocked_codepipeline_event, mocked_decoded_parameters): + # Get the list of artifacts passed to the function + artifacts = mocked_codepipeline_event[cp_job]["data"]["inputArtifacts"] + # Case 1: artifact exists in the event + existing_artifact = mocked_decoded_parameters["artifact"] + artifact = find_artifact(artifacts, existing_artifact) + assert artifact == artifacts[0] + + # Case 2: artifact does not exist (should throw an exception) + missing_artifact = "MISSING_ARTIFACT" + with pytest.raises(Exception) as artifact_error: + find_artifact(artifacts, missing_artifact) + # Assert the exception message + assert f"Input artifact named {missing_artifact} not found in lambda's event" in str(artifact_error.value) + + +@patch("main.put_job_failure") +@patch("main.start_stackset_update_or_create") +@patch("main.get_template") +@patch("main.setup_s3_client") +@patch("main.find_artifact") +@patch("main.check_stackset_update_status") +@patch("main.get_user_params") +def test_lambda_handler( + mocked_get_user_params, # NOSONAR:S107 this test function is designed to take many fixtures + mocked_check_stackset_update_status, + mocked_find_artifact, + mocked_setup_s3_client, + mocked_get_template, + mocked_start_stackset_update_or_create, + mocked_put_job_failure, + mocked_codepipeline_event, + mocked_template, + mocked_template_parameters, +): + # Case 1: Lambda was called for the first time, no continuationToken in the job data + mocked_get_template.return_value = (mocked_template, mocked_template_parameters) + # call the function + lambda_handler(mocked_codepipeline_event, {}) + # the following functions should be called + mocked_get_user_params.assert_called() + mocked_find_artifact.assert_called() + mocked_setup_s3_client.assert_called() + mocked_get_template.assert_called() + mocked_start_stackset_update_or_create.assert_called() + + # Case 2: ContinuationToken in the job data + # add ContinuationToken + mocked_codepipeline_event[cp_job]["data"].update({"continuationToken": "A continuation token"}) + # call the function + lambda_handler(mocked_codepipeline_event, {}) + # assert the check_stackset_update_status is called + mocked_check_stackset_update_status.assert_called() + + # Case 3: An exception is thrown by one of the functions, and put_job_failure + del mocked_codepipeline_event[cp_job] + with pytest.raises(Exception): + lambda_handler(mocked_codepipeline_event, {}) + mocked_put_job_failure.assert_called() + + +@mock_s3 +def test_setup_s3_client(mocked_codepipeline_event): + job_data = mocked_codepipeline_event[cp_job]["data"] + s3_clinet = setup_s3_client(job_data) + assert s3_clinet is not None + + +@mock_s3 +@patch("zipfile.ZipFile") +def test_get_template(mocked_zipfile, mocked_codepipeline_event, mocked_regions): + job_data = mocked_codepipeline_event[cp_job]["data"] + temp_file = tempfile.NamedTemporaryFile() + s3_clinet = boto3.client("s3", region_name=mocked_regions[0]) + s3_clinet.create_bucket(Bucket="test-bucket") + s3_clinet.upload_file(temp_file.name, "test-bucket", "template.zip") + artifact = job_data["inputArtifacts"][0] + template, params = get_template(s3_clinet, artifact, "template.yaml", "staging-config-test.json") + assert template is not None + assert params is not None diff --git a/source/lib/blueprints/byom/lambdas/inference/main.py b/source/lib/blueprints/byom/lambdas/inference/main.py index d38e647..24e5c4b 100644 --- a/source/lib/blueprints/byom/lambdas/inference/main.py +++ b/source/lib/blueprints/byom/lambdas/inference/main.py @@ -15,21 +15,22 @@ import boto3 from shared.wrappers import api_exception_handler from shared.logger import get_logger +from shared.helper import get_client logger = get_logger(__name__) -sagemaker_client = boto3.client("sagemaker-runtime") +sagemaker_client = get_client("sagemaker-runtime") @api_exception_handler def handler(event, context): event_body = json.loads(event["body"]) - endpoint_name = os.environ["ENDPOINT_NAME"] + endpoint_name = os.environ["SAGEMAKER_ENDPOINT_NAME"] return invoke(event_body, endpoint_name) def invoke(event_body, endpoint_name, sm_client=sagemaker_client): response = sm_client.invoke_endpoint( - EndpointName=endpoint_name, Body=event_body["payload"], ContentType=event_body["ContentType"] + EndpointName=endpoint_name, Body=event_body["payload"], ContentType=event_body["content_type"] ) logger.info(response) predictions = response["Body"].read().decode() diff --git a/source/lib/blueprints/byom/lambdas/inference/tests/test_inference.py b/source/lib/blueprints/byom/lambdas/inference/tests/test_inference.py index 2e8d83c..86eb794 100644 --- a/source/lib/blueprints/byom/lambdas/inference/tests/test_inference.py +++ b/source/lib/blueprints/byom/lambdas/inference/tests/test_inference.py @@ -22,26 +22,35 @@ from shared.logger import get_logger from main import handler, invoke -mock_env_variables = { - "ENDPOINT_URI": "test/test", -} +mock_env_variables = {"ENDPOINT_URI": "test/test", "SAGEMAKER_ENDPOINT_NAME": "test-endpoint"} -@patch.dict(os.environ, mock_env_variables) -def test_invoke(): +@pytest.fixture +def event(): + return {"body": '{"payload": "test", "content_type": "text/csv"}'} + - sm_invoke_endpoint_expected_params = { - "EndpointName": "test", - "Body": "test", - "ContentType": "text/csv", +@pytest.fixture +def expected_response(): + return { + "statusCode": 200, + "isBase64Encoded": False, + "body": [1, 0, 1, 0], + "headers": {"Content-Type": "plain/text"}, } - event_body = {"payload": "test", "ContentType": "text/csv"} - with patch('boto3.client') as mock_client: - invoke(event_body, "test", sm_client=mock_client) - mock_client.invoke_endpoint.assert_called_with( - EndpointName="test", - Body="test", - ContentType="text/csv" - ) +@patch.dict(os.environ, mock_env_variables) +def test_invoke(event): + with patch("boto3.client") as mock_client: + invoke(json.loads(event["body"]), "test", sm_client=mock_client) + mock_client.invoke_endpoint.assert_called_with(EndpointName="test", Body="test", ContentType="text/csv") + + +@patch("main.invoke") +@patch("boto3.client") +@patch.dict(os.environ, mock_env_variables) +def test_handler(mocked_client, mocked_invoke, event, expected_response): + mocked_invoke.return_value = expected_response + response = handler(event, {}) + assert response == expected_response diff --git a/source/lib/blueprints/byom/lambdas/create_sagemaker_endpoint/.coveragerc b/source/lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/.coveragerc similarity index 87% rename from source/lib/blueprints/byom/lambdas/create_sagemaker_endpoint/.coveragerc rename to source/lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/.coveragerc index 721e6ba..a3b5310 100644 --- a/source/lib/blueprints/byom/lambdas/create_sagemaker_endpoint/.coveragerc +++ b/source/lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/.coveragerc @@ -6,6 +6,5 @@ omit = cdk.out/* conftest.py test_*.py - *wrappers.py source = . \ No newline at end of file diff --git a/source/lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/index.py b/source/lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/index.py new file mode 100644 index 0000000..f47f467 --- /dev/null +++ b/source/lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/index.py @@ -0,0 +1,58 @@ +# ##################################################################################################################### +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +import logging +import uuid +import json +from crhelper import CfnResource +from shared.helper import get_client + +logger = logging.getLogger(__name__) + +lambda_client = get_client("lambda") +helper = CfnResource(json_logging=True, log_level="INFO") + + +def handler(event, context): + helper(event, context) + + +@helper.update +@helper.create +def invoke_lambda(event, _, lm_client=lambda_client): + try: + logger.info(f"Event received: {event}") + resource_properties = event["ResourceProperties"] + resource = resource_properties["Resource"] + if resource == "InvokeLambda": + logger.info("Invoking lambda function is initiated...") + resource_id = str(uuid.uuid4()) + lm_client.invoke( + FunctionName=resource_properties["function_name"], + InvocationType="Event", + Payload=json.dumps({"message": resource_properties["message"]}), + ) + helper.Data.update({"ResourceId": resource_id}) + + return resource_id + + else: + raise Exception(f"The Resource {resource} is unsupported by the Invoke Lambda custom resource.") + + except Exception as e: + logger.error(f"Custom resource failed: {str(e)}") + raise e + + +@helper.delete +def no_op(_, __): + pass # No action is required when stack is deleted diff --git a/source/lib/blueprints/byom/lambdas/create_sagemaker_endpoint/requirements-test.txt b/source/lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/requirements-test.txt similarity index 100% rename from source/lib/blueprints/byom/lambdas/create_sagemaker_endpoint/requirements-test.txt rename to source/lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/requirements-test.txt diff --git a/source/lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/requirements.txt b/source/lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/requirements.txt new file mode 100644 index 0000000..fa256a1 --- /dev/null +++ b/source/lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/requirements.txt @@ -0,0 +1 @@ +crhelper==2.0.10 \ No newline at end of file diff --git a/source/lib/blueprints/byom/lambdas/create_sagemaker_endpoint/setup.py b/source/lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/setup.py similarity index 84% rename from source/lib/blueprints/byom/lambdas/create_sagemaker_endpoint/setup.py rename to source/lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/setup.py index c7067f6..25b4cd6 100644 --- a/source/lib/blueprints/byom/lambdas/create_sagemaker_endpoint/setup.py +++ b/source/lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/setup.py @@ -1,5 +1,5 @@ -################################################################################################################## -# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +####################################################################################################################### +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # # # # Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # # with the License. A copy of the License is located at # @@ -12,4 +12,4 @@ # ##################################################################################################################### from setuptools import setup, find_packages -setup(name="create_sagemaker_endpoint", packages=find_packages()) \ No newline at end of file +setup(name="invoke_lambda_custom_resource", packages=find_packages()) \ No newline at end of file diff --git a/source/lib/blueprints/byom/lambdas/create_model_monitoring_schedule/tests/__init__.py b/source/lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/tests/__init__.py similarity index 100% rename from source/lib/blueprints/byom/lambdas/create_model_monitoring_schedule/tests/__init__.py rename to source/lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/tests/__init__.py diff --git a/source/lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/tests/test_invoke_lambda.py b/source/lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/tests/test_invoke_lambda.py new file mode 100644 index 0000000..1119ecc --- /dev/null +++ b/source/lib/blueprints/byom/lambdas/invoke_lambda_custom_resource/tests/test_invoke_lambda.py @@ -0,0 +1,72 @@ +# ##################################################################################################################### +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +import boto3 +import pytest +from unittest.mock import patch +from moto import mock_lambda +from index import invoke_lambda, no_op, handler + + +@pytest.fixture() +def invoke_event(): + return { + "RequestType": "Create", + "ResourceProperties": { + "Resource": "InvokeLambda", + "function_name": "myfunction", + "message": "Start batch transform job", + }, + } + + +@pytest.fixture() +def invoke_bad_event(): + return { + "RequestType": "Create", + "ResourceProperties": { + "Resource": "NotSupported", + }, + } + + +@patch("boto3.client") +def test_invoke_lambda(mocked_client, invoke_event, invoke_bad_event): + response = invoke_lambda(invoke_event, None, mocked_client) + assert response is not None + # unsupported + with pytest.raises(Exception) as error: + invoke_lambda(invoke_bad_event, None, mocked_client) + assert str(error.value) == ( + f"The Resource {invoke_bad_event['ResourceProperties']['Resource']} " + f"is unsupported by the Invoke Lambda custom resource." + ) + + +@mock_lambda +def test_invoke_lambda_error(invoke_event): + mocked_client = boto3.client("lambda") + with pytest.raises(Exception): + invoke_lambda(invoke_event, None, mocked_client) + + +@patch("index.invoke_lambda") +def test_no_op(mocked_invoke, invoke_event): + response = no_op(invoke_event, {}) + assert response is None + mocked_invoke.assert_not_called() + + +@patch("index.helper") +def test_handler(mocked_helper, invoke_event): + handler(invoke_event, {}) + mocked_helper.assert_called_with(invoke_event, {}) diff --git a/source/lib/blueprints/byom/lambdas/sagemaker_layer/requirements.txt b/source/lib/blueprints/byom/lambdas/sagemaker_layer/requirements.txt index 536cf34..0718cb4 100644 --- a/source/lib/blueprints/byom/lambdas/sagemaker_layer/requirements.txt +++ b/source/lib/blueprints/byom/lambdas/sagemaker_layer/requirements.txt @@ -1 +1 @@ -sagemaker==2.15.3 \ No newline at end of file +sagemaker==2.39.0 \ No newline at end of file diff --git a/source/lib/blueprints/byom/model_monitor.py b/source/lib/blueprints/byom/model_monitor.py index 5a6f7c6..937bb10 100644 --- a/source/lib/blueprints/byom/model_monitor.py +++ b/source/lib/blueprints/byom/model_monitor.py @@ -10,33 +10,38 @@ # OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # # and limitations under the License. # # ##################################################################################################################### -import uuid from aws_cdk import ( - aws_iam as iam, aws_s3 as s3, - aws_sns as sns, - aws_sns_subscriptions as subscriptions, - aws_events_targets as targets, - aws_events as events, - aws_codepipeline as codepipeline, core, ) -from lib.blueprints.byom.pipeline_definitions.source_actions import source_action_model_monitor from lib.blueprints.byom.pipeline_definitions.deploy_actions import ( create_data_baseline_job, - create_monitoring_schedule, - sagemaker_layer, + create_invoke_lambda_custom_resource, ) - -from lib.blueprints.byom.pipeline_definitions.helpers import ( - suppress_assets_bucket, - pipeline_permissions, - suppress_list_function_policy, - suppress_pipeline_bucket, - suppress_iam_complex, - suppress_sns, +from lib.blueprints.byom.pipeline_definitions.templates_parameters import ( + create_blueprint_bucket_name_parameter, + create_assets_bucket_name_parameter, + create_baseline_job_name_parameter, + create_monitoring_schedule_name_parameter, + create_endpoint_name_parameter, + create_baseline_job_output_location_parameter, + create_monitoring_output_location_parameter, + create_instance_type_parameter, + create_training_data_parameter, + create_monitoring_type_parameter, + create_instance_volume_size_parameter, + create_max_runtime_seconds_parameter, + create_kms_key_arn_parameter, + create_kms_key_arn_provided_condition, + create_data_capture_bucket_name_parameter, + create_data_capture_location_parameter, + create_schedule_expression_parameter, + create_algorithm_image_uri_parameter, + create_baseline_output_bucket_name_parameter, ) -from time import strftime, gmtime + +from lib.blueprints.byom.pipeline_definitions.sagemaker_monitor_role import create_sagemaker_monitor_role +from lib.blueprints.byom.pipeline_definitions.sagemaker_monitoring_schedule import create_sagemaker_monitoring_scheduale class ModelMonitorStack(core.Stack): @@ -44,216 +49,123 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: super().__init__(scope, id, **kwargs) # Parameteres # - notification_email = core.CfnParameter( - self, - "NOTIFICATION_EMAIL", - type="String", - description="email for pipeline outcome notifications", - allowed_pattern="^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$", - constraint_description="Please enter an email address with correct format (example@exmaple.com)", - min_length=5, - max_length=320, - ) - blueprint_bucket_name = core.CfnParameter( - self, - "BLUEPRINT_BUCKET", - type="String", - description="Bucket name for blueprints of different types of ML Pipelines.", - min_length=3, - ) - assets_bucket_name = core.CfnParameter( - self, "ASSETS_BUCKET", type="String", description="Bucket name for access logs.", min_length=3 - ) - endpoint_name = core.CfnParameter( - self, "ENDPOINT_NAME", type="String", description="The name of the ednpoint to monitor", min_length=1 - ) - baseline_job_output_location = core.CfnParameter( - self, - "BASELINE_JOB_OUTPUT_LOCATION", - type="String", - description="S3 prefix to store the Data Baseline Job's output.", - ) - monitoring_output_location = core.CfnParameter( - self, - "MONITORING_OUTPUT_LOCATION", - type="String", - description="S3 prefix to store the Monitoring Schedule output.", - ) - schedule_expression = core.CfnParameter( - self, - "SCHEDULE_EXPRESSION", - type="String", - description="cron expression to run the monitoring schedule. E.g., cron(0 * ? * * *), cron(0 0 ? * * *), etc.", - allowed_pattern="^cron(\\S+\\s){5}\\S+$", - ) - training_data = core.CfnParameter( - self, - "TRAINING_DATA", - type="String", - description="Location of the training data in PipelineAssets S3 Bucket.", - ) - instance_type = core.CfnParameter( - self, - "INSTANCE_TYPE", - type="String", - description="Inference instance that inference requests will be running on. E.g., ml.m5.large", - allowed_pattern="^[a-zA-Z0-9_.+-]+\.[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$", - min_length=7, - ) - instance_volume_size = core.CfnParameter( - self, - "INSTANCE_VOLUME_SIZE", - type="Number", - description="Instance volume size used in model moniroing jobs. E.g., 20", - ) - monitoring_type = core.CfnParameter( - self, - "MONITORING_TYPE", - type="String", - allowed_values=["dataquality", "modelquality", "modelbias", "modelexplainability"], - default="dataquality", - description="Type of model monitoring. Possible values: DataQuality | ModelQuality | ModelBias | ModelExplainability ", - ) - max_runtime_seconds = core.CfnParameter( - self, - "MAX_RUNTIME_SIZE", - type="Number", - description="Max runtime in secodns the job is allowed to run. E.g., 3600", - ) - baseline_job_name = core.CfnParameter( - self, - "BASELINE_JOB_NAME", - type="String", - description="Unique name of the data baseline job", - min_length=3, - max_length=63, - ) - monitoring_schedule_name = core.CfnParameter( - self, - "MONITORING_SCHEDULE_NAME", - type="String", - description="Unique name of the monitoring schedule job", - min_length=3, - max_length=63, - ) + blueprint_bucket_name = create_blueprint_bucket_name_parameter(self) + assets_bucket_name = create_assets_bucket_name_parameter(self) + endpoint_name = create_endpoint_name_parameter(self) + baseline_job_output_location = create_baseline_job_output_location_parameter(self) + training_data = create_training_data_parameter(self) + instance_type = create_instance_type_parameter(self) + instance_volume_size = create_instance_volume_size_parameter(self) + monitoring_type = create_monitoring_type_parameter(self) + max_runtime_seconds = create_max_runtime_seconds_parameter(self) + kms_key_arn = create_kms_key_arn_parameter(self) + baseline_job_name = create_baseline_job_name_parameter(self) + monitoring_schedule_name = create_monitoring_schedule_name_parameter(self) + data_capture_bucket = create_data_capture_bucket_name_parameter(self) + baseline_output_bucket = create_baseline_output_bucket_name_parameter(self) + data_capture_s3_location = create_data_capture_location_parameter(self) + monitoring_output_location = create_monitoring_output_location_parameter(self) + schedule_expression = create_schedule_expression_parameter(self) + image_uri = create_algorithm_image_uri_parameter(self) + + # conditions + kms_key_arn_provided = create_kms_key_arn_provided_condition(self, kms_key_arn) + # Resources # assets_bucket = s3.Bucket.from_bucket_name(self, "AssetsBucket", assets_bucket_name.value_as_string) # getting blueprint bucket object from its name - will be used later in the stack blueprint_bucket = s3.Bucket.from_bucket_name(self, "BlueprintBucket", blueprint_bucket_name.value_as_string) - # Defining pipeline stages - # source stage - source_output, source_action_definition = source_action_model_monitor(training_data, assets_bucket) - - # deploy stage # creating data baseline job - baseline_lambda_arn, create_baseline_job_definition = create_data_baseline_job( + baseline_job_lambda = create_data_baseline_job( self, blueprint_bucket, assets_bucket, - baseline_job_name, - training_data, - baseline_job_output_location, - endpoint_name, - instance_type, - instance_volume_size, - max_runtime_seconds, + baseline_job_name.value_as_string, + training_data.value_as_string, + baseline_job_output_location.value_as_string, + endpoint_name.value_as_string, + instance_type.value_as_string, + instance_volume_size.value_as_string, + max_runtime_seconds.value_as_string, + core.Fn.condition_if( + kms_key_arn_provided.logical_id, kms_key_arn.value_as_string, core.Aws.NO_VALUE + ).to_string(), + kms_key_arn_provided, core.Aws.STACK_NAME, ) - # creating monitoring schedule - monitor_lambda_arn, create_monitoring_schedule_definition = create_monitoring_schedule( + + # create custom resource to invoke the batch transform lambda + invoke_lambda_custom_resource = create_invoke_lambda_custom_resource( self, + "InvokeBaselineLambda", + baseline_job_lambda.function_arn, + baseline_job_lambda.function_name, blueprint_bucket, - assets_bucket, - baseline_job_output_location, - baseline_job_name, - monitoring_schedule_name, - monitoring_output_location, - schedule_expression, - endpoint_name, - instance_type, - instance_volume_size, - max_runtime_seconds, - monitoring_type, - core.Aws.STACK_NAME, - ) - # create invoking lambda policy - invoke_lambdas_policy = iam.PolicyStatement( - actions=[ - "lambda:InvokeFunction", - ], - resources=[baseline_lambda_arn, monitor_lambda_arn], - ) - # createing pipeline stages - source_stage = codepipeline.StageProps(stage_name="Source", actions=[source_action_definition]) - deploy_stage_model_monitor = codepipeline.StageProps( - stage_name="Deploy", - actions=[ - create_baseline_job_definition, - create_monitoring_schedule_definition, - ], + { + "Resource": "InvokeLambda", + "function_name": baseline_job_lambda.function_name, + "assets_bucket_name": assets_bucket_name.value_as_string, + "endpoint_name": endpoint_name.value_as_string, + "instance_type": instance_type.value_as_string, + "baseline_job_output_location": baseline_job_output_location.value_as_string, + "training_data": training_data.value_as_string, + "instance_volume_size": instance_volume_size.value_as_string, + "monitoring_schedule_name": monitoring_schedule_name.value_as_string, + "baseline_job_name": baseline_job_name.value_as_string, + "max_runtime_seconds": max_runtime_seconds.value_as_string, + "data_capture_s3_location": data_capture_s3_location.value_as_string, + "monitoring_output_location": monitoring_output_location.value_as_string, + "schedule_expression": schedule_expression.value_as_string, + "image_uri": image_uri.value_as_string, + "kms_key_arn": kms_key_arn.value_as_string, + }, ) - pipeline_notification_topic = sns.Topic( - self, - "ModelMonitorPipelineNotification", - ) - pipeline_notification_topic.node.default_child.cfn_options.metadata = suppress_sns() - pipeline_notification_topic.add_subscription( - subscriptions.EmailSubscription(email_address=notification_email.value_as_string) - ) + # add dependency on baseline lambda + invoke_lambda_custom_resource.node.add_dependency(baseline_job_lambda) - # constructing Model Monitor pipelines - model_monitor_pipeline = codepipeline.Pipeline( - self, - "ModelMonitorPipeline", - stages=[source_stage, deploy_stage_model_monitor], - cross_account_keys=False, - ) - model_monitor_pipeline.on_state_change( - "NotifyUser", - description="Notify user of the outcome of the pipeline", - target=targets.SnsTopic( - pipeline_notification_topic, - message=events.RuleTargetInput.from_text( - ( - f"Pipeline {events.EventField.from_path('$.detail.pipeline')} finished executing. " - f"Pipeline execution result is {events.EventField.from_path('$.detail.state')}" - ) - ), - ), - event_pattern=events.EventPattern(detail={"state": ["SUCCEEDED", "FAILED"]}), + # creating monitoring schedule + sagemaker_role = create_sagemaker_monitor_role( + self, + "MLOpsSagemakerMonitorRole", + kms_key_arn=kms_key_arn.value_as_string, + assets_bucket_name=assets_bucket_name.value_as_string, + data_capture_bucket=data_capture_bucket.value_as_string, + data_capture_s3_location=data_capture_s3_location.value_as_string, + baseline_output_bucket=baseline_output_bucket.value_as_string, + baseline_job_output_location=baseline_job_output_location.value_as_string, + output_s3_location=monitoring_output_location.value_as_string, + kms_key_arn_provided_condition=kms_key_arn_provided, + baseline_job_name=baseline_job_name.value_as_string, + monitoring_schedual_name=monitoring_schedule_name.value_as_string, ) - model_monitor_pipeline.add_to_role_policy( - iam.PolicyStatement( - actions=["events:PutEvents"], - resources=[ - f"arn:{core.Aws.PARTITION}:events:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:event-bus/*", - ], - ) + + # create Sagemaker monitoring Schedule + sagemaker_monitoring_scheduale = create_sagemaker_monitoring_scheduale( + self, + "MonitoringSchedule", + monitoring_schedule_name.value_as_string, + endpoint_name.value_as_string, + baseline_job_name.value_as_string, + baseline_job_output_location.value_as_string, + schedule_expression.value_as_string, + monitoring_output_location.value_as_string, + instance_type.value_as_string, + instance_volume_size.value_as_number, + max_runtime_seconds.value_as_number, + core.Fn.condition_if( + kms_key_arn_provided.logical_id, kms_key_arn.value_as_string, core.Aws.NO_VALUE + ).to_string(), + sagemaker_role.role_arn, + image_uri.value_as_string, + core.Aws.STACK_NAME, ) - # add lambda permissons - model_monitor_pipeline.add_to_role_policy(invoke_lambdas_policy) - pipeline_child_nodes = model_monitor_pipeline.node.find_all() - pipeline_child_nodes[1].node.default_child.cfn_options.metadata = suppress_pipeline_bucket() - pipeline_child_nodes[6].node.default_child.cfn_options.metadata = suppress_iam_complex() - pipeline_child_nodes[13].node.default_child.cfn_options.metadata = suppress_iam_complex() - pipeline_child_nodes[19].node.default_child.cfn_options.metadata = suppress_list_function_policy() - pipeline_child_nodes[24].node.default_child.cfn_options.metadata = suppress_list_function_policy() - # attaching iam permissions to the pipelines - pipeline_permissions(model_monitor_pipeline, assets_bucket) + # add dependency on invoke_lambda_custom_resource + sagemaker_monitoring_scheduale.node.add_dependency(invoke_lambda_custom_resource) # Outputs # - core.CfnOutput( - self, - id="MonitorPipeline", - value=( - f"https://console.aws.amazon.com/codesuite/codepipeline/pipelines/" - f"{model_monitor_pipeline.pipeline_name}/view?region={core.Aws.REGION}" - ), - ) - core.CfnOutput( self, id="DataBaselineJobName", @@ -269,3 +181,32 @@ def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: id="MonitoringScheduleType", value=monitoring_type.value_as_string, ) + core.CfnOutput( + self, + id="BaselineJobOutputLocation", + value=( + f"https://s3.console.aws.amazon.com/s3/buckets/{baseline_job_output_location.value_as_string}" + f"/{baseline_job_name.value_as_string}/" + ), + ) + core.CfnOutput( + self, + id="MonitoringScheduleOutputLocation", + value=( + f"https://s3.console.aws.amazon.com/s3/buckets/{monitoring_output_location.value_as_string}/" + f"{endpoint_name.value_as_string}/{monitoring_schedule_name.value_as_string}/" + ), + ) + core.CfnOutput( + self, + id="MonitoredSagemakerEndpoint", + value=endpoint_name.value_as_string, + ) + core.CfnOutput( + self, + id="DataCaptureLocation", + value=( + f"https://s3.console.aws.amazon.com/s3/buckets/{data_capture_s3_location.value_as_string}" + f"/{endpoint_name.value_as_string}/" + ), + ) diff --git a/source/lib/blueprints/byom/multi_account_codepipeline.py b/source/lib/blueprints/byom/multi_account_codepipeline.py new file mode 100644 index 0000000..d0130fa --- /dev/null +++ b/source/lib/blueprints/byom/multi_account_codepipeline.py @@ -0,0 +1,289 @@ +# ##################################################################################################################### +# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +from aws_cdk import ( + aws_iam as iam, + aws_s3 as s3, + aws_sns as sns, + aws_sns_subscriptions as subscriptions, + aws_events_targets as targets, + aws_events as events, + aws_codepipeline as codepipeline, + core, +) +from lib.blueprints.byom.pipeline_definitions.source_actions import source_action_template +from lib.blueprints.byom.pipeline_definitions.deploy_actions import create_stackset_action, create_cloudformation_action +from lib.blueprints.byom.pipeline_definitions.approval_actions import approval_action +from lib.blueprints.byom.pipeline_definitions.helpers import ( + pipeline_permissions, + suppress_list_function_policy, + suppress_pipeline_bucket, + suppress_iam_complex, + suppress_sns, + suppress_cloudformation_action, +) +from lib.blueprints.byom.pipeline_definitions.templates_parameters import ( + create_notification_email_parameter, + create_template_zip_name_parameter, + create_template_file_name_parameter, + create_stage_params_file_name_parameter, + create_blueprint_bucket_name_parameter, + create_assets_bucket_name_parameter, + create_stack_name_parameter, + create_account_id_parameter, + create_org_id_parameter, +) + + +class MultiAccountCodePipelineStack(core.Stack): + def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: + super().__init__(scope, id, **kwargs) + + # Parameteres # + notification_email = create_notification_email_parameter(self) + template_zip_name = create_template_zip_name_parameter(self) + template_file_name = create_template_file_name_parameter(self) + dev_params_file_name = create_stage_params_file_name_parameter(self, "DEV_PARAMS_NAME", "development") + staging_params_file_name = create_stage_params_file_name_parameter(self, "STAGING_PARAMS_NAME", "staging") + prod_params_file_name = create_stage_params_file_name_parameter(self, "PROD_PARAMS_NAME", "production") + # create development parameters + account_type = "development" + dev_account_id = create_account_id_parameter(self, "DEV_ACCOUNT_ID", account_type) + dev_org_id = create_org_id_parameter(self, "DEV_ORG_ID", account_type) + # create staging parameters + account_type = "staging" + staging_account_id = create_account_id_parameter(self, "STAGING_ACCOUNT_ID", account_type) + staging_org_id = create_org_id_parameter(self, "STAGING_ORG_ID", account_type) + # create production parameters + account_type = "production" + prod_account_id = create_account_id_parameter(self, "PROD_ACCOUNT_ID", account_type) + prod_org_id = create_org_id_parameter(self, "PROD_ORG_ID", account_type) + # assets parameters + blueprint_bucket_name = create_blueprint_bucket_name_parameter(self) + assets_bucket_name = create_assets_bucket_name_parameter(self) + stack_name = create_stack_name_parameter(self) + + # Resources # + assets_bucket = s3.Bucket.from_bucket_name(self, "AssetsBucket", assets_bucket_name.value_as_string) + + # getting blueprint bucket object from its name - will be used later in the stack + blueprint_bucket = s3.Bucket.from_bucket_name(self, "BlueprintBucket", blueprint_bucket_name.value_as_string) + + # create sns topic and subscription + pipeline_notification_topic = sns.Topic( + self, + "PipelineNotification", + ) + pipeline_notification_topic.node.default_child.cfn_options.metadata = suppress_sns() + pipeline_notification_topic.add_subscription( + subscriptions.EmailSubscription(email_address=notification_email.value_as_string) + ) + + # Defining pipeline stages + # source stage + source_output, source_action_definition = source_action_template(template_zip_name, assets_bucket) + + # DeployDev stage + dev_deploy_lambda_arn, dev_stackset_action = create_stackset_action( + self, + "DeployDevStackSet", + blueprint_bucket, + source_output, + "Artifact_Source_S3Source", + template_file_name.value_as_string, + dev_params_file_name.value_as_string, + [dev_account_id.value_as_string], + [dev_org_id.value_as_string], + [core.Aws.REGION], + assets_bucket, + f"{stack_name.value_as_string}-dev", + ) + + # DeployStaging manual approval + deploy_staging_approval = approval_action( + "DeployStaging", + pipeline_notification_topic, + [notification_email.value_as_string], + "Please approve to deploy to staging account", + ) + + # DeployStaging stage + staging_deploy_lambda_arn, staging_stackset_action = create_stackset_action( + self, + "DeployStagingStackSet", + blueprint_bucket, + source_output, + "Artifact_Source_S3Source", + template_file_name.value_as_string, + staging_params_file_name.value_as_string, + [staging_account_id.value_as_string], + [staging_org_id.value_as_string], + [core.Aws.REGION], + assets_bucket, + f"{stack_name.value_as_string}-staging", + ) + + # DeployProd manual approval + deploy_prod_approval = approval_action( + "DeployProd", + pipeline_notification_topic, + [notification_email.value_as_string], + "Please approve to deploy to production account", + ) + + # DeployProd stage + prod_deploy_lambda_arn, prod_stackset_action = create_stackset_action( + self, + "DeployProdStackSet", + blueprint_bucket, + source_output, + "Artifact_Source_S3Source", + template_file_name.value_as_string, + prod_params_file_name.value_as_string, + [prod_account_id.value_as_string], + [prod_org_id.value_as_string], + [core.Aws.REGION], + assets_bucket, + f"{stack_name.value_as_string}-prod", + ) + + # create invoking lambda policy + invoke_lambdas_policy = iam.PolicyStatement( + actions=[ + "lambda:InvokeFunction", + ], + resources=[dev_deploy_lambda_arn, staging_deploy_lambda_arn, prod_deploy_lambda_arn], + ) + + # createing pipeline stages + source_stage = codepipeline.StageProps(stage_name="Source", actions=[source_action_definition]) + + deploy_dev_stage = codepipeline.StageProps( + stage_name="DeployDev", + actions=[dev_stackset_action, deploy_staging_approval], + ) + + deploy_staging_stage = codepipeline.StageProps( + stage_name="DeployStaging", + actions=[staging_stackset_action, deploy_prod_approval], + ) + + deploy_prod_stage = codepipeline.StageProps( + stage_name="DeployProd", + actions=[prod_stackset_action], + ) + + # constructing multi-account pipeline + multi_account_pipeline = codepipeline.Pipeline( + self, + "MultiAccountPipeline", + stages=[source_stage, deploy_dev_stage, deploy_staging_stage, deploy_prod_stage], + cross_account_keys=False, + ) + # add notification to the development stackset action + dev_stackset_action.on_state_change( + "NotifyUserDevDeployment", + description="Notify user of the outcome of the DeployDev action", + target=targets.SnsTopic( + pipeline_notification_topic, + message=events.RuleTargetInput.from_text( + ( + f"DeployDev action {events.EventField.from_path('$.detail.action')} in the Pipeline " + f"{events.EventField.from_path('$.detail.pipeline')} finished executing. " + f"Action execution result is {events.EventField.from_path('$.detail.state')}" + ) + ), + ), + event_pattern=events.EventPattern(detail={"state": ["SUCCEEDED", "FAILED"]}), + ) + + # add notification to the staging stackset action + staging_stackset_action.on_state_change( + "NotifyUserStagingDeployment", + description="Notify user of the outcome of the DeployStaging action", + target=targets.SnsTopic( + pipeline_notification_topic, + message=events.RuleTargetInput.from_text( + ( + f"DeployStaging action {events.EventField.from_path('$.detail.action')} in the Pipeline " + f"{events.EventField.from_path('$.detail.pipeline')} finished executing. " + f"Action execution result is {events.EventField.from_path('$.detail.state')}" + ) + ), + ), + event_pattern=events.EventPattern(detail={"state": ["SUCCEEDED", "FAILED"]}), + ) + + # add notification to the production stackset action + prod_stackset_action.on_state_change( + "NotifyUserProdDeployment", + description="Notify user of the outcome of the DeployProd action", + target=targets.SnsTopic( + pipeline_notification_topic, + message=events.RuleTargetInput.from_text( + ( + f"DeployProd action {events.EventField.from_path('$.detail.action')} in the Pipeline " + f"{events.EventField.from_path('$.detail.pipeline')} finished executing. " + f"Action execution result is {events.EventField.from_path('$.detail.state')}" + ) + ), + ), + event_pattern=events.EventPattern(detail={"state": ["SUCCEEDED", "FAILED"]}), + ) + + # add notification to the multi-account pipeline + multi_account_pipeline.on_state_change( + "NotifyUser", + description="Notify user of the outcome of the pipeline", + target=targets.SnsTopic( + pipeline_notification_topic, + message=events.RuleTargetInput.from_text( + ( + f"Pipeline {events.EventField.from_path('$.detail.pipeline')} finished executing. " + f"Pipeline execution result is {events.EventField.from_path('$.detail.state')}" + ) + ), + ), + event_pattern=events.EventPattern(detail={"state": ["SUCCEEDED", "FAILED"]}), + ) + multi_account_pipeline.add_to_role_policy( + iam.PolicyStatement( + actions=["events:PutEvents"], + resources=[ + f"arn:{core.Aws.PARTITION}:events:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:event-bus/*", + ], + ) + ) + + # add lambda permissons + multi_account_pipeline.add_to_role_policy(invoke_lambdas_policy) + + # add cfn supressions + + pipeline_child_nodes = multi_account_pipeline.node.find_all() + pipeline_child_nodes[1].node.default_child.cfn_options.metadata = suppress_pipeline_bucket() + pipeline_child_nodes[6].node.default_child.cfn_options.metadata = suppress_iam_complex() + pipeline_child_nodes[19].node.default_child.cfn_options.metadata = suppress_list_function_policy() + pipeline_child_nodes[32].node.default_child.cfn_options.metadata = suppress_list_function_policy() + pipeline_child_nodes[45].node.default_child.cfn_options.metadata = suppress_list_function_policy() + # attaching iam permissions to the pipelines + pipeline_permissions(multi_account_pipeline, assets_bucket) + + # Outputs # + core.CfnOutput( + self, + id="Pipelines", + value=( + f"https://console.aws.amazon.com/codesuite/codepipeline/pipelines/" + f"{multi_account_pipeline.pipeline_name}/view?region={core.Aws.REGION}" + ), + ) diff --git a/source/lib/blueprints/byom/pipeline_definitions/approval_actions.py b/source/lib/blueprints/byom/pipeline_definitions/approval_actions.py new file mode 100644 index 0000000..af83a2b --- /dev/null +++ b/source/lib/blueprints/byom/pipeline_definitions/approval_actions.py @@ -0,0 +1,34 @@ +# ##################################################################################################################### +# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +from aws_cdk import ( + aws_codepipeline_actions as codepipeline_actions, +) + + +def approval_action(approval_name, sns_topic, notification_emails_list, description): + """ + approval_action configures a codepipeline manual approval + + :approval_name: name of the manual approval action + :sns_topic: sns topic to use for notifications + :notification_emails_list: a list of emails to notify to approve the action + :description: description of the manual approval action + :return: codepipeline action in a form of a CDK object that can be attached to a codepipeline stage + """ + return codepipeline_actions.ManualApprovalAction( + action_name=approval_name, + notification_topic=sns_topic, + notify_emails=notification_emails_list, + additional_information=description, + run_order=2, + ) diff --git a/source/lib/blueprints/byom/pipeline_definitions/build_actions.py b/source/lib/blueprints/byom/pipeline_definitions/build_actions.py index 6cf570f..fa29c22 100644 --- a/source/lib/blueprints/byom/pipeline_definitions/build_actions.py +++ b/source/lib/blueprints/byom/pipeline_definitions/build_actions.py @@ -12,29 +12,23 @@ # ##################################################################################################################### from aws_cdk import ( aws_iam as iam, - aws_ecr as ecr, aws_codebuild as codebuild, aws_codepipeline as codepipeline, aws_codepipeline_actions as codepipeline_actions, core, ) -from lib.blueprints.byom.pipeline_definitions.helpers import suppress_pipeline_policy, suppress_ecr_scan_on_push +from lib.blueprints.byom.pipeline_definitions.helpers import suppress_pipeline_policy -def build_action(scope, source_output): +def build_action(scope, ecr_repository_name, image_tag, source_output): """ - build_action configures a codepipeline action that takes Dockerfile and creates a container image + build_action configures a codepipeline action with repository name and tag :scope: CDK Construct scope that's needed to create CDK resources - :source_output: output of the source stage in codepipeline - :is_custom_container: a CDK CfnCondition object, if true, it creates resources for the pipeline action + :ecr_repository_name: name of Amazon ECR repository where the image will be stored + :image_tag: docker image tag to be assigned. :return: codepipeline action in a form of a CDK object that can be attached to a codepipeline stage """ - model_containers = ecr.Repository(scope, "awsmlopsmodels") - # Enable ECR image scanOnPush - model_containers.node.default_child.add_override("Properties.ImageScanningConfiguration.ScanOnPush", "true") - # ECR scanOnPush property has changed to ScanOnPush, bbut seems cfn_nag still checking for scanOnPush - model_containers.node.default_child.cfn_options.metadata = suppress_ecr_scan_on_push() codebuild_role = iam.Role(scope, "codebuildRole", assumed_by=iam.ServicePrincipal("codebuild.amazonaws.com")) @@ -47,7 +41,7 @@ def build_action(scope, source_output): "ecr:UploadLayerPart", ], resources=[ - model_containers.repository_arn, + f"arn:{core.Aws.PARTITION}:ecr:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:repository/{ecr_repository_name}", ], ) codebuild_role.add_to_policy(codebuild_policy) @@ -103,8 +97,8 @@ def build_action(scope, source_output): environment_variables={ "AWS_DEFAULT_REGION": {"value": core.Aws.REGION}, "AWS_ACCOUNT_ID": {"value": core.Aws.ACCOUNT_ID}, - "IMAGE_REPO_NAME": {"value": model_containers.repository_name}, - "IMAGE_TAG": {"value": "latest"}, + "IMAGE_REPO_NAME": {"value": ecr_repository_name}, + "IMAGE_TAG": {"value": image_tag}, }, privileged=True, ), @@ -116,7 +110,5 @@ def build_action(scope, source_output): input=source_output, outputs=[codepipeline.Artifact()], ) - container_uri = ( - f"{core.Aws.ACCOUNT_ID}.dkr.ecr.{core.Aws.REGION}.amazonaws.com/{model_containers.repository_name}:latest" - ) + container_uri = f"{core.Aws.ACCOUNT_ID}.dkr.ecr.{core.Aws.REGION}.amazonaws.com/{ecr_repository_name}:{image_tag}" return build_action_definition, container_uri diff --git a/source/lib/blueprints/byom/pipeline_definitions/deploy_actions.py b/source/lib/blueprints/byom/pipeline_definitions/deploy_actions.py index 36a7f43..c62dd21 100644 --- a/source/lib/blueprints/byom/pipeline_definitions/deploy_actions.py +++ b/source/lib/blueprints/byom/pipeline_definitions/deploy_actions.py @@ -10,20 +10,36 @@ # OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # # and limitations under the License. # # ##################################################################################################################### +import uuid from aws_cdk import ( aws_iam as iam, aws_lambda as lambda_, aws_codepipeline_actions as codepipeline_actions, + aws_cloudformation as cloudformation, core, ) from lib.blueprints.byom.pipeline_definitions.helpers import ( codepipeline_policy, - suppress_cloudwatch_policy, + suppress_lambda_policies, suppress_pipeline_policy, - suppress_ecr_policy, add_logs_policy, ) -from time import gmtime, strftime +from lib.conditional_resource import ConditionalResources +from lib.blueprints.byom.pipeline_definitions.iam_policies import ( + create_service_role, + sagemaker_baseline_job_policy, + sagemaker_logs_metrics_policy_document, + batch_transform_policy, + s3_policy_write, + s3_policy_read, + cloudformation_stackset_policy, + cloudformation_stackset_instances_policy, + kms_policy_document, +) + + +lambda_service = "lambda.amazonaws.com" +lambda_handler = "main.handler" def sagemaker_layer(scope, blueprint_bucket): @@ -44,224 +60,21 @@ def sagemaker_layer(scope, blueprint_bucket): ) -def create_model( - scope, - blueprint_bucket, - assets_bucket, - model_name, - model_artifact_location, - custom_container, - model_framework, - model_framework_version, - container_uri, - sm_layer, -): - """ - create_model creates a sagemaker model in a lambda invoked codepipeline action - - :scope: CDK Construct scope that's needed to create CDK resources - :blueprint_bucket: CDK object of the blueprint bucket that contains resources for BYOM pipeline - :assets_bucket: the bucket cdk object where pipeline assets are stored - :model_name: name of the sagemaker model to be created, in the form of a CDK CfnParameter object - :model_artifact_location: path to the model artifact in the S3 bucket: assets_bucket - :custom_container: whether to the model is a custom algorithm or a sagemaker algorithmm, in the form of - a CDK CfnParameter object - :model_framework: name of the framework if the model is a sagemaker algorithm, in the form of - a CDK CfnParameter object - :model_framework_version: version of the framework if the model is a sagemaker algorithm, in the form of - a CDK CfnParameter object - :container_uri: URI for the container registry that stores the model if the model is a custom algorithm - :sm_layer: sagemaker lambda layer - :return: codepipeline action in a form of a CDK object that can be attached to a codepipeline stage - """ - create_model_policy = iam.PolicyStatement( - actions=[ - "sagemaker:CreateModel", - "sagemaker:DescribeModel", - "sagemaker:DeleteModel", - ], - resources=[ - # Lambda that uses this polict requires access to all objects in the assets bucket - f"arn:{core.Aws.PARTITION}:s3:::{assets_bucket.bucket_name}/*", - ( - f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}" - f":model/{model_name.value_as_string}" - ), - ], - ) - s3_policy = iam.PolicyStatement( - actions=[ - "s3:GetObject", - "s3:PutObject", - "s3:ListBucket", - ], - resources=[assets_bucket.arn_for_objects("*"), assets_bucket.bucket_arn], - ) - # creating this policy for sagemaker create endpoint in custom model - ecr_policy = iam.PolicyStatement( - actions=[ - "ecr:BatchGetImage", - "ecr:BatchCheckLayerAvailability", - "ecr:DescribeImages", - "ecr:DescribeRepositories", - "ecr:GetDownloadUrlForLayer", - ], - resources=[ - ( - f"arn:{core.Aws.PARTITION}:ecr:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}" - f":repository/mlops-pipeline*-awsmlopsmodels*" - ) - ], - ) - ecr_token_policy = iam.PolicyStatement( - actions=["ecr:GetAuthorizationToken"], - resources=["*"], # GetAuthorizationToken can not be bound to resources other than * - ) - # creating a role for the lambda function so that it can create a model in sagemaker - sagemaker_role = iam.Role( - scope, - "create_model_sagemaker_role", - assumed_by=iam.ServicePrincipal("sagemaker.amazonaws.com"), - description="Role that is create sagemaker model Lambda function assumes to create a model in the pipeline.", - ) - lambda_role = iam.Role( - scope, - "create_model_lambda_role", - assumed_by=iam.ServicePrincipal("lambda.amazonaws.com"), - description="Role that is create sagemaker model Lambda function assumes to create a model in the pipeline.", - ) - sagemaker_role.add_to_policy(create_model_policy) - sagemaker_role.add_to_policy(s3_policy) - sagemaker_role.add_to_policy(ecr_policy) - sagemaker_role.add_to_policy(ecr_token_policy) - sagemaker_role_nodes = sagemaker_role.node.find_all() - sagemaker_role_nodes[2].node.default_child.cfn_options.metadata = suppress_ecr_policy() - lambda_role.add_to_policy(iam.PolicyStatement(actions=["iam:PassRole"], resources=[sagemaker_role.role_arn])) - lambda_role.add_to_policy(create_model_policy) - lambda_role.add_to_policy(s3_policy) - add_logs_policy(lambda_role) - - # defining the lambda function that gets invoked by codepipeline in this step - create_sagemaker_model = lambda_.Function( - scope, - "create_sagemaker_model", - runtime=lambda_.Runtime.PYTHON_3_8, - handler="main.handler", - timeout=core.Duration.seconds(60), - code=lambda_.Code.from_bucket(blueprint_bucket, "blueprints/byom/lambdas/create_sagemaker_model.zip"), - layers=[sm_layer], - role=lambda_role, - environment={ - "custom_container": custom_container.value_as_string, - "model_framework": model_framework.value_as_string, - "model_framework_version": model_framework_version.value_as_string, - "model_name": model_name.value_as_string, - "model_artifact_location": assets_bucket.s3_url_for_object(model_artifact_location.value_as_string), - "create_model_role_arn": sagemaker_role.role_arn, - "container_uri": container_uri, - "LOG_LEVEL": "INFO", - }, - ) - create_sagemaker_model.node.default_child.cfn_options.metadata = suppress_cloudwatch_policy() - role_child_nodes = create_sagemaker_model.role.node.find_all() - role_child_nodes[2].node.default_child.cfn_options.metadata = suppress_pipeline_policy() - - # creating the codepipeline action that invokes create model lambda - create_sagemaker_model_action = codepipeline_actions.LambdaInvokeAction( - action_name="create_sagemaker_model", - inputs=[], - outputs=[], - lambda_=create_sagemaker_model, - run_order=1, # runs first in the Deploy stage - ) - return (create_sagemaker_model.function_arn, create_sagemaker_model_action) - - -def create_endpoint(scope, blueprint_bucket, assets_bucket, model_name, inference_instance): - """ - create_endpoint creates a sagemaker inference endpoint in a lambda invoked codepipeline action - - :scope: CDK Construct scope that's needed to create CDK resources - :blueprint_bucket: CDK object of the blueprint bucket that contains resources for BYOM pipeline - :assets_bucket: the bucket cdk object where pipeline assets are stored - :model_name: name of the sagemaker model to be created, in the form of a CDK CfnParameter object - :inference_instance: compute instance type for the sagemaker inference endpoint, in the form of - a CDK CfnParameter object - :is_realtime_inference: a CDK CfnCondition object that says if inference type is realtime or not - :return: codepipeline action in a form of a CDK object that can be attached to a codepipeline stage - """ - create_endpoint_policy = iam.PolicyStatement( - actions=[ - "sagemaker:CreateEndpoint", - "sagemaker:CreateEndpointConfig", - "sagemaker:DeleteEndpointConfig", - "sagemaker:DescribeEndpointConfig", - "sagemaker:DescribeEndpoint", - ], - resources=[ - ( - f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:" - f"endpoint/{model_name.value_as_string}-endpoint" - ), - ( - f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:" - f"endpoint-config/{model_name.value_as_string}-endpoint-config" - ), - ], - ) - # creating a role so that this lambda can create a sagemaker endpoint and endpoint config - lambda_role = iam.Role( - scope, - "create_endpoint_lambda_role", - assumed_by=iam.ServicePrincipal("lambda.amazonaws.com"), - description="Role that is create sagemaker model Lambda function assumes to create a model in the pipeline.", - ) - lambda_role.add_to_policy(create_endpoint_policy) - add_logs_policy(lambda_role) - - # defining the lambda function that gets invoked in this stage - create_sagemaker_endpoint = lambda_.Function( - scope, - "create_sagemaker_endpoint", - runtime=lambda_.Runtime.PYTHON_3_8, - handler="main.handler", - role=lambda_role, - code=lambda_.Code.from_bucket(blueprint_bucket, "blueprints/byom/lambdas/create_sagemaker_endpoint.zip"), - environment={ - "model_name": model_name.value_as_string, - "inference_instance": inference_instance.value_as_string, - "assets_bucket": assets_bucket.bucket_name, - "LOG_LEVEL": "INFO", - }, - timeout=core.Duration.minutes(10), - ) - create_sagemaker_endpoint.node.default_child.cfn_options.metadata = suppress_cloudwatch_policy() - role_child_nodes = create_sagemaker_endpoint.role.node.find_all() - role_child_nodes[2].node.default_child.cfn_options.metadata = suppress_pipeline_policy() - - # create_endpoint_action = core.Fn.condition_if("isRealtimeInference", - create_endpoint_action = codepipeline_actions.LambdaInvokeAction( - action_name="create_sagemaker_endpoint", - inputs=[], - outputs=[], - variables_namespace="sagemaker_endpoint", - lambda_=create_sagemaker_endpoint, - run_order=2, # this runs second in the deploy stage - ) - return (create_sagemaker_endpoint.function_arn, create_endpoint_action) - - def batch_transform( - scope, + scope, # NOSONAR:S107 this function is designed to take many arguments + id, blueprint_bucket, assets_bucket, model_name, inference_instance, + batch_input_bucket, batch_inference_data, + batch_job_output_location, + kms_key_arn, sm_layer, ): """ - batch_transform creates a sagemaker batch transform job in a lambda invoked codepipeline action + batch_transform creates a sagemaker batch transform job in a lambda :scope: CDK Construct scope that's needed to create CDK resources :blueprint_bucket: CDK object of the blueprint bucket that contains resources for BYOM pipeline @@ -269,75 +82,75 @@ def batch_transform( :model_name: name of the sagemaker model to be created, in the form of a CDK CfnParameter object :inference_instance: compute instance type for the sagemaker inference endpoint, in the form of a CDK CfnParameter object + :batch_input_bucket: bucket name where the batch data is stored :batch_inference_data: location of the batch inference data in assets bucket, in the form of a CDK CfnParameter object - :is_batch_transform: a CDK CfnCondition object that says if inference type is batch transform or not + :batch_job_output_location: S3 bucket location where the result of the batch job will be stored + :kms_key_arn: optionl kmsKeyArn used to encrypt job's output and instance volume. :sm_layer: sagemaker lambda layer - :return: codepipeline action in a form of a CDK object that can be attached to a codepipeline stage + :return: Lambda function """ - batch_transform_policy = iam.PolicyStatement( - actions=[ - "sagemaker:CreateTransformJob", - "s3:ListBucket", - "s3:GetObject", - "s3:PutObject", - ], - resources=[ - assets_bucket.bucket_arn, - assets_bucket.arn_for_objects("*"), - ( - f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:" - f"transform-job/{model_name.value_as_string}-*" - ), - ], + s3_read = s3_policy_read( + list( + set( + [ + f"arn:aws:s3:::{assets_bucket.bucket_name}", + f"arn:aws:s3:::{assets_bucket.bucket_name}/*", + f"arn:aws:s3:::{batch_input_bucket}", + f"arn:aws:s3:::{batch_inference_data}", + ] + ) + ) + ) + s3_write = s3_policy_write( + [ + f"arn:aws:s3:::{batch_job_output_location}/*", + ] ) - lambda_role = iam.Role( + + batch_transform_permissions = batch_transform_policy() + + lambda_role = create_service_role( scope, "batch_transform_lambda_role", - assumed_by=iam.ServicePrincipal("lambda.amazonaws.com"), - description=( + "lambda.amazonaws.com", + ( "Role that creates a lambda function assumes to create a sagemaker batch transform " "job in the aws mlops pipeline." ), ) - lambda_role.add_to_policy(batch_transform_policy) - lambda_role.add_to_policy(codepipeline_policy()) + + lambda_role.add_to_policy(batch_transform_permissions) + lambda_role.add_to_policy(s3_read) + lambda_role.add_to_policy(s3_write) add_logs_policy(lambda_role) - # defining batch transform lambda function - batch_transform = lambda_.Function( + batch_transform_lambda = lambda_.Function( scope, - "batch_transform", + id, runtime=lambda_.Runtime.PYTHON_3_8, handler="main.handler", layers=[sm_layer], role=lambda_role, code=lambda_.Code.from_bucket(blueprint_bucket, "blueprints/byom/lambdas/batch_transform.zip"), environment={ - "model_name": model_name.value_as_string, - "inference_instance": inference_instance.value_as_string, + "model_name": model_name, + "inference_instance": inference_instance, "assets_bucket": assets_bucket.bucket_name, - "batch_inference_data": batch_inference_data.value_as_string, + "batch_inference_data": batch_inference_data, + "batch_job_output_location": batch_job_output_location, + "kms_key_arn": kms_key_arn, "LOG_LEVEL": "INFO", }, ) - batch_transform.node.default_child.cfn_options.metadata = suppress_cloudwatch_policy() - role_child_nodes = batch_transform.role.node.find_all() - role_child_nodes[2].node.default_child.cfn_options.metadata = suppress_pipeline_policy() - batch_transform_action = codepipeline_actions.LambdaInvokeAction( - action_name="batch_transform", - inputs=[], - outputs=[], - variables_namespace="batch_transform", - lambda_=batch_transform, - run_order=2, # this runs second in the deploy stage - ) - return (batch_transform.function_arn, batch_transform_action) + batch_transform_lambda.node.default_child.cfn_options.metadata = suppress_lambda_policies() + + return batch_transform_lambda def create_data_baseline_job( - scope, + scope, # NOSONAR:S107 this function is designed to take many arguments blueprint_bucket, assets_bucket, baseline_job_name, @@ -347,6 +160,8 @@ def create_data_baseline_job( instance_type, instance_volume_size, max_runtime_seconds, + kms_key_arn, + kms_key_arn_provided_condition, stack_name, ): """ @@ -362,70 +177,60 @@ def create_data_baseline_job( :instance_type: compute instance type for the baseline job, in the form of a CDK CfnParameter object :instance_volume_size: volume size of the EC2 instance :max_runtime_seconds: max time the job is allowd to run + :kms_key_arn: kms key arn to encrypt the baseline job's output :stack_name: model monitor stack name :return: codepipeline action in a form of a CDK object that can be attached to a codepipeline stage """ - create_baseline_job_policy = iam.PolicyStatement( - actions=[ - "sagemaker:CreateProcessingJob", - "sagemaker:DescribeProcessingJob", - "sagemaker:StopProcessingJob", - "sagemaker:DeleteProcessingJob", - ], - resources=[ - ( - f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:" - f"processing-job/{baseline_job_name.value_as_string}" - ), - ], + s3_read = s3_policy_read( + [ + f"arn:aws:s3:::{assets_bucket.bucket_name}", + f"arn:aws:s3:::{assets_bucket.bucket_name}/{training_data_location}", + ] ) - - s3_policy = iam.PolicyStatement( - actions=[ - "s3:ListBucket", - "s3:GetObject", - "s3:PutObject", - ], - resources=[ - assets_bucket.bucket_arn, - assets_bucket.arn_for_objects("*"), - ], + s3_write = s3_policy_write( + [ + f"arn:aws:s3:::{baseline_job_output_location}/*", + ] ) - sagemaker_logs_policy = iam.PolicyStatement( - actions=[ - "cloudwatch:PutMetricData", - "logs:CreateLogStream", - "logs:PutLogEvents", - "logs:CreateLogGroup", - "logs:DescribeLogStreams", - ], - resources=["*"], - ) + create_baseline_job_policy = sagemaker_baseline_job_policy(baseline_job_name) + sagemaker_logs_policy = sagemaker_logs_metrics_policy_document(scope, "BaselineLogsMetrcis") + + # Kms Key permissions + kms_policy = kms_policy_document(scope, "BaselineKmsPolicy", kms_key_arn) + # add conditions to KMS and ECR policies + core.Aspects.of(kms_policy).add(ConditionalResources(kms_key_arn_provided_condition)) + # create sagemaker role - sagemaker_role = iam.Role( + sagemaker_role = create_service_role( scope, "create_baseline_sagemaker_role", - assumed_by=iam.ServicePrincipal("sagemaker.amazonaws.com"), - description="Role that is create sagemaker model Lambda function assumes to create a model in the pipeline.", + "sagemaker.amazonaws.com", + "Role that is create sagemaker model Lambda function assumes to create a baseline job.", ) + # attach the conditional policies + kms_policy.attach_to_role(sagemaker_role) + # create a trust relation to assume the Role sagemaker_role.add_to_policy(iam.PolicyStatement(actions=["sts:AssumeRole"], resources=[sagemaker_role.role_arn])) # creating a role so that this lambda can create a baseline job - lambda_role = iam.Role( + lambda_role = create_service_role( scope, "create_baseline_job_lambda_role", - assumed_by=iam.ServicePrincipal("lambda.amazonaws.com"), - description="Role that is create_data_baseline_job Lambda function assumes to create a baseline job in the pipeline.", + lambda_service, + "Role that is create_data_baseline_job Lambda function assumes to create a baseline job in the pipeline.", ) + + sagemaker_logs_policy.attach_to_role(sagemaker_role) sagemaker_role.add_to_policy(create_baseline_job_policy) - sagemaker_role.add_to_policy(sagemaker_logs_policy) - sagemaker_role.add_to_policy(s3_policy) + sagemaker_role.add_to_policy(s3_read) + sagemaker_role.add_to_policy(s3_write) sagemaker_role_nodes = sagemaker_role.node.find_all() sagemaker_role_nodes[2].node.default_child.cfn_options.metadata = suppress_pipeline_policy() lambda_role.add_to_policy(iam.PolicyStatement(actions=["iam:PassRole"], resources=[sagemaker_role.role_arn])) lambda_role.add_to_policy(create_baseline_job_policy) - lambda_role.add_to_policy(s3_policy) + lambda_role.add_to_policy(s3_write) + lambda_role.add_to_policy(s3_read) add_logs_policy(lambda_role) # defining the lambda function that gets invoked in this stage @@ -433,189 +238,199 @@ def create_data_baseline_job( scope, "create_data_baseline_job", runtime=lambda_.Runtime.PYTHON_3_8, - handler="main.handler", + handler=lambda_handler, role=lambda_role, code=lambda_.Code.from_bucket(blueprint_bucket, "blueprints/byom/lambdas/create_data_baseline_job.zip"), environment={ - "BASELINE_JOB_NAME": baseline_job_name.value_as_string, + "BASELINE_JOB_NAME": baseline_job_name, "ASSETS_BUCKET": assets_bucket.bucket_name, - "SAGEMAKER_ENDPOINT_NAME": f"{endpoint_name.value_as_string}", - "TRAINING_DATA_LOCATION": training_data_location.value_as_string, - "BASELINE_JOB_OUTPUT_LOCATION": baseline_job_output_location.value_as_string, - "INSTANCE_TYPE": instance_type.value_as_string, - "INSTANCE_VOLUME_SIZE": instance_volume_size.value_as_string, - "MAX_RUNTIME_SECONDS": max_runtime_seconds.value_as_string, + "SAGEMAKER_ENDPOINT_NAME": endpoint_name, + "TRAINING_DATA_LOCATION": training_data_location, + "BASELINE_JOB_OUTPUT_LOCATION": baseline_job_output_location, + "INSTANCE_TYPE": instance_type, + "INSTANCE_VOLUME_SIZE": instance_volume_size, + "MAX_RUNTIME_SECONDS": max_runtime_seconds, "ROLE_ARN": sagemaker_role.role_arn, + "KMS_KEY_ARN": kms_key_arn, "STACK_NAME": stack_name, "LOG_LEVEL": "INFO", }, timeout=core.Duration.minutes(10), ) - create_baseline_job_lambda.node.default_child.cfn_options.metadata = suppress_cloudwatch_policy() + + create_baseline_job_lambda.node.default_child.cfn_options.metadata = suppress_lambda_policies() role_child_nodes = create_baseline_job_lambda.role.node.find_all() role_child_nodes[2].node.default_child.cfn_options.metadata = suppress_pipeline_policy() - # Create codepipeline action - create_baseline_job_action = codepipeline_actions.LambdaInvokeAction( - action_name="create_data_baseline_job", - inputs=[], - outputs=[], - variables_namespace="data_baseline_job", - lambda_=create_baseline_job_lambda, - run_order=1, # this runs first in the deploy stage - ) - return (create_baseline_job_lambda.function_arn, create_baseline_job_action) + return create_baseline_job_lambda -def create_monitoring_schedule( - scope, +def create_stackset_action( + scope, # NOSONAR:S107 this function is designed to take many arguments + action_name, blueprint_bucket, + source_output, + artifact, + template_file, + stage_params_file, + accound_ids, + org_ids, + regions, assets_bucket, - baseline_job_output_location, - baseline_job_name, - monitoring_schedual_name, - monitoring_output_location, - schedule_expression, - endpoint_name, - instance_type, - instance_volume_size, - max_runtime_seconds, - monitoring_type, stack_name, ): """ - create_monitoring_schedule creates a model monitoring job in a lambda invoked codepipeline action + create_stackset_action an invokeLambda action to be added to AWS Codepipeline stage :scope: CDK Construct scope that's needed to create CDK resources + :action_name: name of the StackSet action :blueprint_bucket: CDK object of the blueprint bucket that contains resources for BYOM pipeline + :source_output: CDK object of the Source action's output + :artifact: name of the input aritifcat to the StackSet action + :template_file: name of the Cloudformation template to be deployed + :stage_params_file: name of the template parameters for the satge + :accound_ids: list of AWS acounts where the stack with be deployed + :org_ids: list of AWS orginizational ids where the stack with be deployed + :regions: list of regions where the stack with be deployed :assets_bucket: the bucket cdk object where pipeline assets are stored - :baseline_job_output_location: S3 prefix in the S3 assets bucket to store the output of the job - :baseline_job_name: name of the baseline job - :monitoring_schedual_name: name of the monitoring job to be created - :schedule_expression cron job expression - :endpoint_name: name of the deployed SageMaker endpoint to be monitored - :instance_type: compute instance type for the baseline job, in the form of a CDK CfnParameter object - :instance_volume_size: volume size of the EC2 instance - :monitoring_type: type of monitoring to be created - :max_runtime_seconds: max time the job is allowd to run - :stack_name: name of the model monitoring satck - :return: codepipeline action in a form of a CDK object that can be attached to a codepipeline stage + :stack_name: name of the stack to be deployed + :return: codepipeline invokeLambda action in a form of a CDK object that can be attached to a codepipeline stage """ - create_monitoring_schedule_policy = iam.PolicyStatement( - actions=[ - "sagemaker:DescribeEndpointConfig", - "sagemaker:DescribeEndpoint", - "sagemaker:CreateMonitoringSchedule", - "sagemaker:DescribeMonitoringSchedule", - "sagemaker:StopMonitoringSchedule", - "sagemaker:DeleteMonitoringSchedule", - "sagemaker:DescribeProcessingJob", - ], - resources=[ - ( - f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:" - f"endpoint/{endpoint_name.value_as_string}*" - ), - ( - f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:" - f"endpoint-config/{endpoint_name.value_as_string}*" - ), - ( - f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:" - f"monitoring-schedule/{monitoring_schedual_name.value_as_string}" - ), - ( - f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:" - f"processing-job/{baseline_job_name.value_as_string}" - ), - ], - ) - - s3_policy = iam.PolicyStatement( - actions=[ - "s3:ListBucket", - "s3:GetObject", - "s3:PutObject", - ], - resources=[ - assets_bucket.bucket_arn, - assets_bucket.arn_for_objects("*"), - ], - ) - - sagemaker_logs_policy = iam.PolicyStatement( - actions=[ - "cloudwatch:PutMetricData", - "logs:CreateLogStream", - "logs:PutLogEvents", - "logs:CreateLogGroup", - "logs:DescribeLogStreams", - ], - resources=["*"], - ) - # create sagemaker role - sagemaker_role = iam.Role( - scope, - "create_monitoring_scheduale_sagemaker_role", - assumed_by=iam.ServicePrincipal("sagemaker.amazonaws.com"), - description="Role that is create sagemaker model Lambda function assumes to create a model in the pipeline.", - ) - # create a trust relation to assume the Role - sagemaker_role.add_to_policy(iam.PolicyStatement(actions=["sts:AssumeRole"], resources=[sagemaker_role.role_arn])) # creating a role so that this lambda can create a baseline job - lambda_role = iam.Role( + lambda_role = create_service_role( scope, - "create_monitoring_scheduale_role", - assumed_by=iam.ServicePrincipal("lambda.amazonaws.com"), - description="Role that is create_data_baseline_job Lambda function assumes to create a baseline job in the pipeline.", + f"{action_name}_role", + lambda_service, + "The role that is assumed by create_update_cf_stackset Lambda function.", ) - sagemaker_role.add_to_policy(create_monitoring_schedule_policy) - sagemaker_role.add_to_policy(sagemaker_logs_policy) - sagemaker_role.add_to_policy(s3_policy) - sagemaker_role_nodes = sagemaker_role.node.find_all() - sagemaker_role_nodes[2].node.default_child.cfn_options.metadata = suppress_pipeline_policy() - lambda_role.add_to_policy(iam.PolicyStatement(actions=["iam:PassRole"], resources=[sagemaker_role.role_arn])) - lambda_role.add_to_policy(create_monitoring_schedule_policy) - lambda_role.add_to_policy(s3_policy) + # make the stackset name unique + stack_name = f"{stack_name}-{str(uuid.uuid4())[:8]}" + # cloudformation stackset permissions + cloudformation_stackset_permissions = cloudformation_stackset_policy(stack_name) + cloudformation_stackset_instances_permissions = cloudformation_stackset_instances_policy(stack_name) + + lambda_role.add_to_policy(cloudformation_stackset_permissions) + lambda_role.add_to_policy(cloudformation_stackset_instances_permissions) add_logs_policy(lambda_role) # defining the lambda function that gets invoked in this stage - create_moniroring_schedule_lambda = lambda_.Function( + create_update_cf_stackset_lambda = lambda_.Function( scope, - "create_moniroring_schedule", + f"{action_name}_stackset_lambda", runtime=lambda_.Runtime.PYTHON_3_8, - handler="main.handler", + handler="main.lambda_handler", role=lambda_role, - code=lambda_.Code.from_bucket(blueprint_bucket, "blueprints/byom/lambdas/create_model_monitoring_schedule.zip"), - environment={ - "BASELINE_JOB_NAME": baseline_job_name.value_as_string, - "BASELINE_JOB_OUTPUT_LOCATION": baseline_job_output_location.value_as_string, - "ASSETS_BUCKET": assets_bucket.bucket_name, - "SAGEMAKER_ENDPOINT_NAME": f"{endpoint_name.value_as_string}", - "MONITORING_SCHEDULE_NAME": monitoring_schedual_name.value_as_string, - "MONITORING_OUTPUT_LOCATION": monitoring_output_location.value_as_string, - "SCHEDULE_EXPRESSION": schedule_expression.value_as_string, - "INSTANCE_TYPE": instance_type.value_as_string, - "INSTANCE_VOLUME_SIZE": instance_volume_size.value_as_string, - "MAX_RUNTIME_SECONDS": max_runtime_seconds.value_as_string, - "ROLE_ARN": sagemaker_role.role_arn, - "MONITORING_TYPE": monitoring_type.value_as_string, - "STACK_NAME": stack_name, - "LOG_LEVEL": "INFO", - }, - timeout=core.Duration.minutes(10), + code=lambda_.Code.from_bucket(blueprint_bucket, "blueprints/byom/lambdas/create_update_cf_stackset.zip"), + timeout=core.Duration.minutes(15), ) - create_moniroring_schedule_lambda.node.default_child.cfn_options.metadata = suppress_cloudwatch_policy() - role_child_nodes = create_moniroring_schedule_lambda.role.node.find_all() + + create_update_cf_stackset_lambda.node.default_child.cfn_options.metadata = suppress_lambda_policies() + role_child_nodes = create_update_cf_stackset_lambda.role.node.find_all() role_child_nodes[2].node.default_child.cfn_options.metadata = suppress_pipeline_policy() # Create codepipeline action - create_moniroring_schedule_action = codepipeline_actions.LambdaInvokeAction( - action_name="create_monitoring_schedule", - inputs=[], - outputs=[], - variables_namespace="monitoring_schedule", - lambda_=create_moniroring_schedule_lambda, - run_order=2, # this runs second in the deploy stage + create_stackset_action = codepipeline_actions.LambdaInvokeAction( + action_name=action_name, + inputs=[source_output], + variables_namespace=f"{action_name}-namespace", + lambda_=create_update_cf_stackset_lambda, + user_parameters={ + "stackset_name": stack_name, + "artifact": artifact, + "template_file": template_file, + "stage_params_file": stage_params_file, + "accound_ids": accound_ids, + "org_ids": org_ids, + "regions": regions, + }, + run_order=1, + ) + return (create_update_cf_stackset_lambda.function_arn, create_stackset_action) + + +def create_cloudformation_action( + scope, action_name, stack_name, source_output, template_file, template_parameters_file, run_order=1 +): + """ + create_cloudformation_actio a CloudFormation action to be added to AWS Codepipeline stage + + :scope: CDK Construct scope that's needed to create CDK resources + :action_name: name of the StackSet action + :stack_name: name of the stack to be deployed + :source_output: CDK object of the Source action's output + :template_file: name of the Cloudformation template to be deployed + :template_parameters_file: name of the template parameters + :return: codepipeline CloudFormation action in a form of a CDK object that can be attached to a codepipeline stage + """ + + # Create codepipeline's cloudformation action + create_cloudformation_action = codepipeline_actions.CloudFormationCreateUpdateStackAction( + action_name=action_name, + stack_name=stack_name, + capabilities=[cloudformation.CloudFormationCapabilities.NAMED_IAM], + template_path=source_output.at_path(template_file), + # Admin permissions are added to the deployement role used by the CF action for simplicity + # and deploy different resources by different MLOps pipelines. Roles are defined by the + # pipelines' cloudformation templates. + admin_permissions=True, + template_configuration=source_output.at_path(template_parameters_file), + variables_namespace=f"{action_name}-namespace", + replace_on_failure=True, + run_order=run_order, + ) + + return create_cloudformation_action + + +def create_invoke_lambda_custom_resource( + scope, # NOSONAR:S107 this function is designed to take many arguments + id, + lambda_function_arn, + lambda_function_name, + blueprint_bucket, + custom_resource_properties, +): + """ + create_invoke_lambda_custom_resource creates a custom resource to invoke lambda function + + :scope: CDK Construct scope that's needed to create CDK resources + :id: the logicalId of teh CDK resource + :lambda_function_arn: arn of the lambda function to be invoked (str) + :lambda_function_name: name of the lambda function to be invoked (str) + :blueprint_bucket: CDK object of the blueprint bucket that contains resources for BYOM pipeline + :custom_resource_properties: user provided properties (dict) + + :return: CDK Custom Resource + """ + custom_resource_lambda_fn = lambda_.Function( + scope, + id, + code=lambda_.Code.from_bucket(blueprint_bucket, "blueprints/byom/lambdas/invoke_lambda_custom_resource.zip"), + handler="index.handler", + runtime=lambda_.Runtime.PYTHON_3_8, + timeout=core.Duration.minutes(5), + ) + + custom_resource_lambda_fn.add_to_role_policy( + iam.PolicyStatement( + actions=[ + "lambda:InvokeFunction", + ], + resources=[lambda_function_arn], + ) ) - return (create_moniroring_schedule_lambda.function_arn, create_moniroring_schedule_action) + custom_resource_lambda_fn.node.default_child.cfn_options.metadata = suppress_lambda_policies() + + invoke_lambda_custom_resource = core.CustomResource( + scope, + f"{id}CustomeResource", + service_token=custom_resource_lambda_fn.function_arn, + properties={ + "function_name": lambda_function_name, + "message": f"Invoking lambda function: {lambda_function_name}", + **custom_resource_properties, + }, + resource_type="Custom::InvokeLambda", + ) + + return invoke_lambda_custom_resource diff --git a/source/lib/blueprints/byom/pipeline_definitions/helpers.py b/source/lib/blueprints/byom/pipeline_definitions/helpers.py index e7731da..caf2145 100644 --- a/source/lib/blueprints/byom/pipeline_definitions/helpers.py +++ b/source/lib/blueprints/byom/pipeline_definitions/helpers.py @@ -12,6 +12,8 @@ # ##################################################################################################################### from aws_cdk import aws_iam as iam, core +logs_str = ":logs:" + def pipeline_permissions(pipeline, assets_bucket): """ @@ -32,7 +34,7 @@ def pipeline_permissions(pipeline, assets_bucket): resources=[ assets_bucket.arn_for_objects("*"), "arn:" + core.Aws.PARTITION + ":lambda:" + core.Aws.REGION + ":" + core.Aws.ACCOUNT_ID + ":function:*", - "arn:" + core.Aws.PARTITION + ":logs:" + core.Aws.REGION + ":" + core.Aws.ACCOUNT_ID + ":log-group:*", + "arn:" + core.Aws.PARTITION + logs_str + core.Aws.REGION + ":" + core.Aws.ACCOUNT_ID + ":log-group:*", ], ) ) @@ -65,14 +67,14 @@ def add_logs_policy(function_role): resources=[ "arn:" + core.Aws.PARTITION - + ":logs:" + + logs_str + core.Aws.REGION + ":" + core.Aws.ACCOUNT_ID + ":log-group:/aws/lambda/*", "arn:" + core.Aws.PARTITION - + ":logs:" + + logs_str + core.Aws.REGION + ":" + core.Aws.ACCOUNT_ID @@ -83,24 +85,11 @@ def add_logs_policy(function_role): function_role.add_to_policy( iam.PolicyStatement( actions=["logs:CreateLogGroup"], - resources=["arn:" + core.Aws.PARTITION + ":logs:" + core.Aws.REGION + ":" + core.Aws.ACCOUNT_ID + ":*"], + resources=["arn:" + core.Aws.PARTITION + logs_str + core.Aws.REGION + ":" + core.Aws.ACCOUNT_ID + ":*"], ) ) -def suppress_cloudwatch_policy(): - return { - "cfn_nag": { - "rules_to_suppress": [ - { - "id": "W58", - "reason": "The lambda functions role already has permissions to write cloudwatch logs", - } - ] - } - } - - def suppress_pipeline_policy(): return { "cfn_nag": { @@ -216,22 +205,49 @@ def suppress_ecr_policy(): } -# The supression is needed because there is a bug in cfn_nag ECR repository rule W79, -# where the rule still checks for scanOnPush instead of the new property's name ScanOnPush -# link to the bug https://github.com/stelligent/cfn_nag/issues/533 -def suppress_ecr_scan_on_push(): +def suppress_cloudwatch_policy(): return { "cfn_nag": { "rules_to_suppress": [ { - "id": "W79", - "reason": "scanOnPush is enabled", + "id": "W12", + "reason": "The cloudwatch:PutMetricData can not have a restricted resource.", } ] } } +def suppress_cloudformation_action(): + return { + "cfn_nag": { + "rules_to_suppress": [ + { + "id": "F4", + "reason": ( + "The cloudformation action is granted PassRole action with * resources to deploy " + "different resources by different MLOps pipelines." + ), + }, + { + "id": "F39", + "reason": ( + "The cloudformation action is granted admin permissions to deploy different resources by " + "different MLOps pipelines. Roles are defined by the pipelines' cloudformation templates." + ), + }, + { + "id": "W12", + "reason": ( + "This cloudformation action's deployement roel needs * resource to deploy different resources" + " by MLOps pipelines. Specific resources are declared in the roles defined by each pipeline." + ), + }, + ] + } + } + + def apply_secure_bucket_policy(bucket): bucket.add_to_resource_policy( iam.PolicyStatement( @@ -243,3 +259,37 @@ def apply_secure_bucket_policy(bucket): conditions={"Bool": {"aws:SecureTransport": "false"}}, ) ) + + +def suppress_lambda_policies(): + return { + "cfn_nag": { + "rules_to_suppress": [ + { + "id": "W89", + "reason": "The lambda function does not need to be attached to a vpc.", + }, + { + "id": "W58", + "reason": "The lambda functions role already has permissions to write cloudwatch logs", + }, + { + "id": "W92", + "reason": "The lambda function does need to define ReservedConcurrentExecutions", + }, + ] + } + } + + +def suppress_lambda_event_mapping(): + return { + "cfn_nag": { + "rules_to_suppress": [ + { + "id": "W12", + "reason": "IAM permissions, lambda:*EventSourceMapping can not be bound to specific resources.", + } + ] + } + } diff --git a/source/lib/blueprints/byom/pipeline_definitions/iam_policies.py b/source/lib/blueprints/byom/pipeline_definitions/iam_policies.py new file mode 100644 index 0000000..c3e5acf --- /dev/null +++ b/source/lib/blueprints/byom/pipeline_definitions/iam_policies.py @@ -0,0 +1,278 @@ +# ##################################################################################################################### +# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +from aws_cdk import aws_iam as iam, core +from lib.blueprints.byom.pipeline_definitions.helpers import suppress_ecr_policy, suppress_cloudwatch_policy + + +def sagemaker_policiy_statement(): + return iam.PolicyStatement( + actions=[ + "sagemaker:CreateModel", + "sagemaker:DescribeModel", + "sagemaker:DeleteModel", + "sagemaker:CreateModel", + "sagemaker:CreateEndpointConfig", + "sagemaker:DescribeEndpointConfig", + "sagemaker:DeleteEndpointConfig", + "sagemaker:CreateEndpoint", + "sagemaker:DescribeEndpoint", + "sagemaker:DeleteEndpoint", + ], + resources=[ + ( + f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:model/" + f"mlopssagemakermodel*" + ), + ( + f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:endpoint-config/" + f"mlopssagemakerendpointconfig*" + ), + ( + f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:endpoint/" + f"mlopssagemakerendpoint*" + ), + ], + ) + + +def sagemaker_baseline_job_policy(baseline_job_name): + return iam.PolicyStatement( + actions=[ + "sagemaker:CreateProcessingJob", + "sagemaker:DescribeProcessingJob", + "sagemaker:StopProcessingJob", + "sagemaker:DeleteProcessingJob", + ], + resources=[ + ( + f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:" + f"processing-job/{baseline_job_name}" + ), + ], + ) + + +def batch_transform_policy(): + return iam.PolicyStatement( + actions=[ + "sagemaker:CreateTransformJob", + ], + resources=[ + ( + f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:" + f"transform-job/mlopssagemakermodel-*-batch-transform-*" + ), + ], + ) + + +def create_service_role(scope, id, service, description): + return iam.Role( + scope, + id, + assumed_by=iam.ServicePrincipal(service), + description=description, + ) + + +def sagemaker_monitor_policiy_statement(baseline_job_name, monitoring_schedual_name): + return iam.PolicyStatement( + actions=[ + "sagemaker:DescribeEndpointConfig", + "sagemaker:DescribeEndpoint", + "sagemaker:CreateMonitoringSchedule", + "sagemaker:DescribeMonitoringSchedule", + "sagemaker:StopMonitoringSchedule", + "sagemaker:DeleteMonitoringSchedule", + "sagemaker:DescribeProcessingJob", + ], + resources=[ + ( + f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:endpoint-config/" + f"mlopssagemakerendpointconfig*" + ), + ( + f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:endpoint/" + f"mlopssagemakerendpoint*" + ), + ( + f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:" + f"monitoring-schedule/{monitoring_schedual_name}" + ), + ( + f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:" + f"processing-job/{baseline_job_name}" + ), + ], + ) + + +def sagemaker_tags_policy_statement(): + return iam.PolicyStatement( + actions=[ + "sagemaker:AddTags", + "sagemaker:DeleteTags", + ], + resources=[f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:*"], + ) + + +def sagemaker_logs_metrics_policy_document(scope, id): + policy = iam.Policy( + scope, + id, + statements=[ + iam.PolicyStatement( + actions=[ + "logs:CreateLogGroup", + "logs:CreateLogStream", + "logs:DescribeLogStreams", + "logs:GetLogEvents", + "logs:PutLogEvents", + ], + resources=[ + f"arn:{core.Aws.PARTITION}:logs:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:log-group:/aws/sagemaker/*" + ], + ), + iam.PolicyStatement( + actions=[ + "cloudwatch:PutMetricData", + ], + resources=["*"], + ), + ], + ) + policy.node.default_child.cfn_options.metadata = suppress_cloudwatch_policy() + + return policy + + +def s3_policy_read_write(resources_list): + return iam.PolicyStatement( + actions=["s3:GetObject", "s3:PutObject", "s3:ListBucket"], + resources=resources_list, + ) + + +def s3_policy_read(resources_list): + return iam.PolicyStatement( + actions=["s3:GetObject", "s3:ListBucket"], + resources=resources_list, + ) + + +def s3_policy_write(resources_list): + return iam.PolicyStatement( + actions=["s3:PutObject"], + resources=resources_list, + ) + + +def pass_role_policy_statement(role): + return iam.PolicyStatement( + actions=["iam:PassRole"], + resources=[ + role.role_arn, + ], + conditions={ + "StringLike": {"iam:PassedToService": "sagemaker.amazonaws.com"}, + }, + ) + + +def get_role_policy_statement(role): + return iam.PolicyStatement( + actions=["iam:GetRole"], + resources=[ + role.role_arn, + ], + ) + + +def ecr_policy_document(scope, id, repo_arn): + ecr_policy = iam.Policy( + scope, + id, + statements=[ + iam.PolicyStatement( + actions=[ + "ecr:BatchCheckLayerAvailability", + "ecr:GetDownloadUrlForLayer", + "ecr:DescribeRepositories", + "ecr:DescribeImages", + "ecr:BatchGetImage", + ], + resources=[repo_arn], + ), + iam.PolicyStatement( + actions=[ + "ecr:GetAuthorizationToken", + ], + # it can not be bound to resources other than * + resources=["*"], + ), + ], + ) + # add supression for * + ecr_policy.node.default_child.cfn_options.metadata = suppress_ecr_policy() + + return ecr_policy + + +def kms_policy_document(scope, id, kms_key_arn): + return iam.Policy( + scope, + id, + statements=[ + iam.PolicyStatement( + actions=[ + "kms:Encrypt", + "kms:Decrypt", + "kms:CreateGrant", + "kms:ReEncrypt*", + "kms:GenerateDataKey*", + "kms:DescribeKey", + ], + resources=[kms_key_arn], + ) + ], + ) + + +def cloudformation_stackset_policy(stack_name): + return iam.PolicyStatement( + actions=[ + "cloudformation:DescribeStackSet", + "cloudformation:DescribeStackInstance", + "cloudformation:CreateStackSet", + ], + resources=[ + f"arn:aws:cloudformation:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:stackset/{stack_name}:*", + "arn:aws:cloudformation:*::type/resource/*", + ], + ) + + +def cloudformation_stackset_instances_policy(stack_name): + return iam.PolicyStatement( + actions=[ + "cloudformation:CreateStackInstances", + "cloudformation:DeleteStackInstances", + "cloudformation:UpdateStackSet", + ], + resources=[ + f"arn:aws:cloudformation::{core.Aws.ACCOUNT_ID}:stackset-target/{stack_name}:*", + f"arn:aws:cloudformation:{core.Aws.REGION}::type/resource/*", + f"arn:aws:cloudformation:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:stackset/{stack_name}:*", + ], + ) \ No newline at end of file diff --git a/source/lib/blueprints/byom/pipeline_definitions/sagemaker_endpoint.py b/source/lib/blueprints/byom/pipeline_definitions/sagemaker_endpoint.py new file mode 100644 index 0000000..d524c8b --- /dev/null +++ b/source/lib/blueprints/byom/pipeline_definitions/sagemaker_endpoint.py @@ -0,0 +1,28 @@ +# ##################################################################################################################### +# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +from aws_cdk import ( + aws_sagemaker as sagemaker, +) + + +def create_sagemaker_endpoint(scope, id, endpoint_config_name, model_name, **kwargs): + # create Sagemaker endpoint + sagemaker_endpoint = sagemaker.CfnEndpoint( + scope, + id, + endpoint_config_name=endpoint_config_name, + tags=[{"key": "endpoint-name", "value": f"{model_name}-endpoint"}], + **kwargs, + ) + + return sagemaker_endpoint diff --git a/source/lib/blueprints/byom/pipeline_definitions/sagemaker_endpoint_config.py b/source/lib/blueprints/byom/pipeline_definitions/sagemaker_endpoint_config.py new file mode 100644 index 0000000..473662f --- /dev/null +++ b/source/lib/blueprints/byom/pipeline_definitions/sagemaker_endpoint_config.py @@ -0,0 +1,63 @@ +# ##################################################################################################################### +# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +from aws_cdk import ( + aws_sagemaker as sagemaker, +) + + +def create_sagemaker_endpoint_config( + scope, # NOSONAR:S107 this function is designed to take many arguments + id, + sagemaker_model_name, + model_name, + inference_instance, + data_capture_location, + kms_key_arn, + **kwargs, +): + # Create the sagemaker endpoint config + sagemaker_endpoint_config = sagemaker.CfnEndpointConfig( + scope, + id, + production_variants=[ + { + "variantName": "AllTraffic", + "modelName": sagemaker_model_name, + "initialVariantWeight": 1, + "initialInstanceCount": 1, + "instanceType": inference_instance, + } + ], + data_capture_config={ + "enableCapture": True, + "initialSamplingPercentage": 100, + "destinationS3Uri": f"s3://{data_capture_location}", + "captureOptions": [{"captureMode": "Output"}, {"captureMode": "Input"}], + "captureContentTypeHeader": {"csvContentTypes": ["text/csv"]}, + # The key specfied here is used to encrypt data on S3 captured by the endpoint. If you don't provide + # a KMS key ID, Amazon SageMaker uses the default KMS key for Amazon S3 for your role's account. + # for more info see DataCaptureConfig + # https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-sagemaker-endpointconfig.html + "kmsKeyId": kms_key_arn, + }, + # The key specified here is used to encrypt data on the storage volume attached to the + # ML compute instance that hosts the endpoint. Note: a key can not be specified here when + # using an instance type with local storage (e.g. certain Nitro-based instances) + # for more info see the KmsKeyId doc at + # https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-sagemaker-endpointconfig.html + kms_key_id=kms_key_arn, + tags=[{"key": "endpoint-config-name", "value": f"{model_name}-endpoint-config"}], + **kwargs, + ) + + return sagemaker_endpoint_config diff --git a/source/lib/blueprints/byom/lambdas/create_sagemaker_model/setup.py b/source/lib/blueprints/byom/pipeline_definitions/sagemaker_model.py similarity index 69% rename from source/lib/blueprints/byom/lambdas/create_sagemaker_model/setup.py rename to source/lib/blueprints/byom/pipeline_definitions/sagemaker_model.py index 86b27a0..74ba7d0 100644 --- a/source/lib/blueprints/byom/lambdas/create_sagemaker_model/setup.py +++ b/source/lib/blueprints/byom/pipeline_definitions/sagemaker_model.py @@ -1,5 +1,5 @@ -################################################################################################################## -# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# ##################################################################################################################### +# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # # # # Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # # with the License. A copy of the License is located at # @@ -10,6 +10,16 @@ # OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # # and limitations under the License. # # ##################################################################################################################### -from setuptools import setup, find_packages +from aws_cdk import ( + aws_sagemaker as sagemaker, +) + + +def create_sagemaker_model(scope, id, execution_role, **kwargs): + # Create the model + model = sagemaker.CfnModel(scope, id, execution_role_arn=execution_role.role_arn, **kwargs) + + # add dependency on the Sagemaker execution role + model.node.add_dependency(execution_role) -setup(name="create_sagemaker_model", packages=find_packages()) \ No newline at end of file + return model diff --git a/source/lib/blueprints/byom/pipeline_definitions/sagemaker_monitor_role.py b/source/lib/blueprints/byom/pipeline_definitions/sagemaker_monitor_role.py new file mode 100644 index 0000000..d933bfa --- /dev/null +++ b/source/lib/blueprints/byom/pipeline_definitions/sagemaker_monitor_role.py @@ -0,0 +1,99 @@ +# ##################################################################################################################### +# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +from aws_cdk import ( + aws_iam as iam, + core, +) +from lib.conditional_resource import ConditionalResources + +from lib.blueprints.byom.pipeline_definitions.iam_policies import ( + kms_policy_document, + sagemaker_monitor_policiy_statement, + sagemaker_tags_policy_statement, + sagemaker_logs_metrics_policy_document, + s3_policy_read, + s3_policy_write, + pass_role_policy_statement, + get_role_policy_statement, +) + + +def create_sagemaker_monitor_role( + scope, # NOSONAR:S107 this function is designed to take many arguments + id, + kms_key_arn, + assets_bucket_name, + data_capture_bucket, + data_capture_s3_location, + baseline_output_bucket, + baseline_job_output_location, + output_s3_location, + kms_key_arn_provided_condition, + baseline_job_name, + monitoring_schedual_name, +): + # create optional polocies + kms_policy = kms_policy_document(scope, "MLOpsKmsPolicy", kms_key_arn) + + # add conditions to KMS and ECR policies + core.Aspects.of(kms_policy).add(ConditionalResources(kms_key_arn_provided_condition)) + + # create sagemaker role + role = iam.Role(scope, id, assumed_by=iam.ServicePrincipal("sagemaker.amazonaws.com")) + + # permissions to create sagemaker resources + sagemaker_policy = sagemaker_monitor_policiy_statement(baseline_job_name, monitoring_schedual_name) + + # sagemaker tags permissions + sagemaker_tags_policy = sagemaker_tags_policy_statement() + # logs/metrics permissions + logs_metrics_policy = sagemaker_logs_metrics_policy_document(scope, "SagemakerLogsMetricsPolicy") + # S3 permissions + s3_read = s3_policy_read( + list( + set( + [ + f"arn:aws:s3:::{assets_bucket_name}", + f"arn:aws:s3:::{assets_bucket_name}/*", + f"arn:aws:s3:::{data_capture_bucket}", + f"arn:aws:s3:::{data_capture_s3_location}/*", + f"arn:aws:s3:::{baseline_output_bucket}", + f"arn:aws:s3:::{baseline_job_output_location}/*", + ] + ) + ) + ) + s3_write = s3_policy_write( + [ + f"arn:aws:s3:::{output_s3_location}/*", + ] + ) + # IAM PassRole permission + pass_role_policy = pass_role_policy_statement(role) + # IAM GetRole permission + get_role_policy = get_role_policy_statement(role) + + # add policy statments + role.add_to_policy(sagemaker_policy) + role.add_to_policy(sagemaker_tags_policy) + role.add_to_policy(s3_read) + role.add_to_policy(s3_write) + role.add_to_policy(pass_role_policy) + role.add_to_policy(get_role_policy) + + # attach he logs/metrics policy document + logs_metrics_policy.attach_to_role(role) + # attach the conditional policies + kms_policy.attach_to_role(role) + + return role diff --git a/source/lib/blueprints/byom/pipeline_definitions/sagemaker_monitoring_schedule.py b/source/lib/blueprints/byom/pipeline_definitions/sagemaker_monitoring_schedule.py new file mode 100644 index 0000000..812b6c7 --- /dev/null +++ b/source/lib/blueprints/byom/pipeline_definitions/sagemaker_monitoring_schedule.py @@ -0,0 +1,131 @@ +# ##################################################################################################################### +# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +from aws_cdk import aws_sagemaker as sagemaker, core + + +def create_sagemaker_monitoring_scheduale( + scope, # NOSONAR:S107 this function is designed to take many arguments + id, + monitoring_schedule_name, + endpoint_name, + baseline_job_name, + baseline_job_output_location, + schedule_expression, + monitoring_output_location, + instance_type, + instance_volume_size, + max_runtime_seconds, + kms_key_arn, + role_arn, + image_uri, + stack_name, +): + """ + create_sagemaker_monitoring_scheduale creates a monitoring schedule using CDK + + :scope: CDK Construct scope that's needed to create CDK resources + :monitoring_schedual_name: name of the monitoring job to be created + :endpoint_name: name of the deployed SageMaker endpoint to be monitored + :baseline_job_name: name of the baseline job + :baseline_job_output_location: S3 prefix in the S3 assets bucket to store the output of the job + :schedule_expression: cron job expression + :monitoring_output_location: S3 location where the output will be stored + :instance_type: compute instance type for the baseline job, in the form of a CDK CfnParameter object + :instance_volume_size: volume size of the EC2 instance + :max_runtime_seconds: max time the job is allowd to run + :kms_key_arn": optional arn of the kms key used to encrypt datacapture and to encrypt job's output + :role_arn: Sagemaker role's arn to be used to create the monitoring schedule + :image_uri: the name of the stack where the schedule will be created + :return: return an sagemaker.CfnMonitoringSchedule object + + """ + schedule = sagemaker.CfnMonitoringSchedule( + scope, + id, + monitoring_schedule_name=monitoring_schedule_name, + monitoring_schedule_config=sagemaker.CfnMonitoringSchedule.MonitoringScheduleConfigProperty( + schedule_config=sagemaker.CfnMonitoringSchedule.ScheduleConfigProperty( + schedule_expression=schedule_expression + ), + monitoring_job_definition=sagemaker.CfnMonitoringSchedule.MonitoringJobDefinitionProperty( + baseline_config=sagemaker.CfnMonitoringSchedule.BaselineConfigProperty( + constraints_resource=sagemaker.CfnMonitoringSchedule.ConstraintsResourceProperty( + s3_uri=f"s3://{baseline_job_output_location}/{baseline_job_name}/constraints.json" + ), + statistics_resource=sagemaker.CfnMonitoringSchedule.StatisticsResourceProperty( + s3_uri=f"s3://{baseline_job_output_location}/{baseline_job_name}/statistics.json" + ), + ), + monitoring_inputs=sagemaker.CfnMonitoringSchedule.MonitoringInputsProperty( + monitoring_inputs=[ + sagemaker.CfnMonitoringSchedule.MonitoringInputProperty( + endpoint_input=sagemaker.CfnMonitoringSchedule.EndpointInputProperty( + endpoint_name=endpoint_name, + local_path="/opt/ml/processing/input/monitoring_dataset_input", + s3_input_mode="File", + s3_data_distribution_type="FullyReplicated", + ) + ) + ] + ), + monitoring_output_config=sagemaker.CfnMonitoringSchedule.MonitoringOutputConfigProperty( + monitoring_outputs=[ + sagemaker.CfnMonitoringSchedule.MonitoringOutputProperty( + s3_output=sagemaker.CfnMonitoringSchedule.S3OutputProperty( + s3_uri=f"s3://{monitoring_output_location}", + local_path="/opt/ml/processing/output", + s3_upload_mode="EndOfJob", + ) + ) + ], + kms_key_id=kms_key_arn, + ), + monitoring_resources=sagemaker.CfnMonitoringSchedule.MonitoringResourcesProperty( + cluster_config=sagemaker.CfnMonitoringSchedule.ClusterConfigProperty( + instance_count=1.0, + instance_type=instance_type, + volume_size_in_gb=core.Token.as_number(instance_volume_size), + volume_kms_key_id=kms_key_arn, + ) + ), + monitoring_app_specification=sagemaker.CfnMonitoringSchedule.MonitoringAppSpecificationProperty( + image_uri=image_uri + ), + stopping_condition=sagemaker.CfnMonitoringSchedule.StoppingConditionProperty( + max_runtime_in_seconds=core.Token.as_number(max_runtime_seconds) + ), + role_arn=role_arn, + ), + ), + tags=[ + {"key": "stack_name", "value": stack_name}, + ], + ) + + # This is a workaround the current bug in CDK aws-sagemaker, where the MonitoringInputs property + # is duplicated. link to the bug https://github.com/aws/aws-cdk/issues/12208 + schedule.add_property_override( + "MonitoringScheduleConfig.MonitoringJobDefinition.MonitoringInputs", + [ + { + "EndpointInput": { + "EndpointName": {"Ref": "ENDPOINTNAME"}, + "LocalPath": "/opt/ml/processing/input/monitoring_dataset_input", + "S3DataDistributionType": "FullyReplicated", + "S3InputMode": "File", + } + } + ], + ) + + return schedule diff --git a/source/lib/blueprints/byom/pipeline_definitions/sagemaker_role.py b/source/lib/blueprints/byom/pipeline_definitions/sagemaker_role.py new file mode 100644 index 0000000..693c181 --- /dev/null +++ b/source/lib/blueprints/byom/pipeline_definitions/sagemaker_role.py @@ -0,0 +1,102 @@ +# ##################################################################################################################### +# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +from aws_cdk import ( + aws_iam as iam, + core, +) +from lib.conditional_resource import ConditionalResources + +from lib.blueprints.byom.pipeline_definitions.iam_policies import ( + ecr_policy_document, + kms_policy_document, + sagemaker_policiy_statement, + sagemaker_monitor_policiy_statement, + sagemaker_tags_policy_statement, + sagemaker_logs_metrics_policy_document, + s3_policy_read, + s3_policy_write, + pass_role_policy_statement, + get_role_policy_statement, +) +from lib.blueprints.byom.pipeline_definitions.templates_parameters import ( + create_custom_algorithms_ecr_repo_arn_provided_condition, +) + + +def create_sagemaker_role( + scope, # NOSONAR:S107 this function is designed to take many arguments + id, + custom_algorithms_ecr_arn, + kms_key_arn, + assets_bucket_name, + input_bucket_name, + input_s3_location, + output_s3_location, + ecr_repo_arn_provided_condition, + kms_key_arn_provided_condition, +): + # create optional polocies + ecr_policy = ecr_policy_document(scope, "MLOpsECRPolicy", custom_algorithms_ecr_arn) + kms_policy = kms_policy_document(scope, "MLOpsKmsPolicy", kms_key_arn) + + # add conditions to KMS and ECR policies + core.Aspects.of(kms_policy).add(ConditionalResources(kms_key_arn_provided_condition)) + core.Aspects.of(ecr_policy).add(ConditionalResources(ecr_repo_arn_provided_condition)) + + # create sagemaker role + role = iam.Role(scope, id, assumed_by=iam.ServicePrincipal("sagemaker.amazonaws.com")) + + # permissions to create sagemaker resources + sagemaker_policy = sagemaker_policiy_statement() + + # sagemaker tags permissions + sagemaker_tags_policy = sagemaker_tags_policy_statement() + # logs permissions + logs_policy = sagemaker_logs_metrics_policy_document(scope, "LogsMetricsPolicy") + # S3 permissions + s3_read = s3_policy_read( + list( + set( + [ + f"arn:aws:s3:::{assets_bucket_name}", + f"arn:aws:s3:::{assets_bucket_name}/*", + f"arn:aws:s3:::{input_bucket_name}", + f"arn:aws:s3:::{input_s3_location}", + ] + ) + ) + ) + s3_write = s3_policy_write( + [ + f"arn:aws:s3:::{output_s3_location}/*", + ] + ) + # IAM PassRole permission + pass_role_policy = pass_role_policy_statement(role) + # IAM GetRole permission + get_role_policy = get_role_policy_statement(role) + + # add policy statments + role.add_to_policy(sagemaker_policy) + role.add_to_policy(sagemaker_tags_policy) + logs_policy.attach_to_role(role) + role.add_to_policy(s3_read) + role.add_to_policy(s3_write) + role.add_to_policy(pass_role_policy) + role.add_to_policy(get_role_policy) + + # attach the conditional policies + kms_policy.attach_to_role(role) + ecr_policy.attach_to_role(role) + + return role diff --git a/source/lib/blueprints/byom/pipeline_definitions/share_actions.py b/source/lib/blueprints/byom/pipeline_definitions/share_actions.py deleted file mode 100644 index b1906f4..0000000 --- a/source/lib/blueprints/byom/pipeline_definitions/share_actions.py +++ /dev/null @@ -1,117 +0,0 @@ -# ##################################################################################################################### -# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. # -# # -# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # -# with the License. A copy of the License is located at # -# # -# http://www.apache.org/licenses/LICENSE-2.0 # -# # -# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # -# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # -# and limitations under the License. # -# ##################################################################################################################### -from aws_cdk import ( - aws_iam as iam, - aws_lambda as lambda_, - aws_apigateway as apigw, - aws_codepipeline_actions as codepipeline_actions, - core, -) -from aws_solutions_constructs import aws_apigateway_lambda -from lib.blueprints.byom.pipeline_definitions.helpers import ( - codepipeline_policy, - suppress_cloudwatch_policy, -) - - -# configure inference lambda step in the pipeline -def configure_inference(scope, blueprint_bucket): - """ - configure_inference updates inference lambda function's environment variables and puts the value - for Sagemaker endpoint URI as a lambda invoked codepipeline action - - :scope: CDK Construct scope that's needed to create CDK resources - :blueprint_bucket: CDK object of the blueprint bucket that contains resources for BYOM pipeline - :is_realtime_inference: a CDK CfnCondition object that says if inference type is realtime or not - :return: codepipeline action in a form of a CDK object that can be attached to a codepipeline stage - """ - # provision api gateway and lambda for inference using solution constructs - inference_api_gateway = aws_apigateway_lambda.ApiGatewayToLambda( - scope, - "BYOMInference", - lambda_function_props={ - "runtime": lambda_.Runtime.PYTHON_3_8, - "handler": "main.handler", - "code": lambda_.Code.from_bucket(blueprint_bucket, "blueprints/byom/lambdas/inference.zip"), - }, - api_gateway_props={ - "defaultMethodOptions": { - "authorizationType": apigw.AuthorizationType.IAM, - }, - "restApiName": f"{core.Aws.STACK_NAME}-inference", - "proxy": False, - }, - ) - - provision_resource = inference_api_gateway.api_gateway.root.add_resource("inference") - provision_resource.add_method("POST") - inference_api_gateway.lambda_function.add_to_role_policy( - iam.PolicyStatement( - actions=[ - "sagemaker:InvokeEndpoint", - ], - resources=[ - f"arn:{core.Aws.PARTITION}:sagemaker:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:endpoint/*", - ], - ) - ) - - # lambda function that gets invoked from codepipeline - configure_inference_lambda = lambda_.Function( - scope, - "configure_inference_lambda", - runtime=lambda_.Runtime.PYTHON_3_8, - handler="main.handler", - code=lambda_.Code.from_bucket(blueprint_bucket, "blueprints/byom/lambdas/configure_inference_lambda.zip"), - environment={ - "inference_lambda_arn": inference_api_gateway.lambda_function.function_arn, - "LOG_LEVEL": "INFO", - }, - ) - configure_inference_lambda.node.default_child.cfn_options.metadata = suppress_cloudwatch_policy() - # iam permissions to respond to codepipeline and update inference lambda - configure_inference_lambda.add_to_role_policy( - iam.PolicyStatement( - actions=[ - "lambda:UpdateFunctionConfiguration", - ], - resources=[inference_api_gateway.lambda_function.function_arn], - ) - ) - configure_inference_lambda.add_to_role_policy(codepipeline_policy()) - - role_child_nodes = configure_inference_lambda.role.node.find_all() - role_child_nodes[2].node.default_child.cfn_options.metadata = { - "cfn_nag": { - "rules_to_suppress": [ - { - "id": "W12", - "reason": ( - "The codepipeline permissions PutJobSuccessResult and PutJobFailureResult " - "are not able to be bound to resources." - ), - } - ] - } - } - # configuring codepipeline action to invoke the lambda - configure_inference_action = codepipeline_actions.LambdaInvokeAction( - action_name="configure_inference_lambda", - inputs=[], - outputs=[], - # passing the parameter from the last stage in pipeline - user_parameters=[{"endpointName": "#{sagemaker_endpoint.endpointName}"}], - lambda_=configure_inference_lambda, - ) - - return (configure_inference_lambda.function_arn, configure_inference_action) diff --git a/source/lib/blueprints/byom/pipeline_definitions/source_actions.py b/source/lib/blueprints/byom/pipeline_definitions/source_actions.py index 2ef997d..c1081fc 100644 --- a/source/lib/blueprints/byom/pipeline_definitions/source_actions.py +++ b/source/lib/blueprints/byom/pipeline_definitions/source_actions.py @@ -16,11 +16,11 @@ ) -def source_action(model_artifact_location, assets_bucket): +def source_action(artifact_location, assets_bucket): """ source_action configures a codepipeline action with S3 as source - :model_artifact_location: path to the model artifact in the S3 bucket: assets_bucket + :artifact_location: path to the artifact (model/inference data) in the S3 bucket: assets_bucket :assets_bucket: the bucket cdk object where pipeline assets are stored :return: codepipeline action in a form of a CDK object that can be attached to a codepipeline stage """ @@ -28,16 +28,16 @@ def source_action(model_artifact_location, assets_bucket): return source_output, codepipeline_actions.S3SourceAction( action_name="S3Source", bucket=assets_bucket, - bucket_key=model_artifact_location.value_as_string, + bucket_key=artifact_location.value_as_string, output=source_output, ) -def source_action_model_monitor(training_data_location, assets_bucket): +def source_action_model_monitor(template_zip_file, assets_bucket): """ source_action_model_monitor configures a codepipeline action with S3 as source - :training_data_location: path to the training data in the S3 bucket: assets_bucket + :template_zip_file: path to the template zip file in : assets_bucket containg model monitor template and parameters :assets_bucket: the bucket cdk object where pipeline assets are stored :return: codepipeline action in a form of a CDK object that can be attached to a codepipeline stage """ @@ -45,12 +45,12 @@ def source_action_model_monitor(training_data_location, assets_bucket): return source_output, codepipeline_actions.S3SourceAction( action_name="S3Source", bucket=assets_bucket, - bucket_key=training_data_location.value_as_string, + bucket_key=template_zip_file.value_as_string, output=source_output, ) -def source_action_custom(model_artifact_location, assets_bucket, custom_container): +def source_action_custom(assets_bucket, custom_container): """ source_action configures a codepipeline action with S3 as source @@ -66,3 +66,20 @@ def source_action_custom(model_artifact_location, assets_bucket, custom_containe bucket_key=custom_container.value_as_string, output=source_output, ) + + +def source_action_template(template_location, assets_bucket): + """ + source_action_model_monitor configures a codepipeline action with S3 as source + + :template_location: path to the zip file containg the CF template and stages configuration in the S3 bucket: assets_bucket + :assets_bucket: the bucket cdk object where pipeline assets are stored + :return: codepipeline action in a form of a CDK object that can be attached to a codepipeline stage + """ + source_output = codepipeline.Artifact() + return source_output, codepipeline_actions.S3SourceAction( + action_name="S3Source", + bucket=assets_bucket, + bucket_key=template_location.value_as_string, + output=source_output, + ) \ No newline at end of file diff --git a/source/lib/blueprints/byom/pipeline_definitions/templates_parameters.py b/source/lib/blueprints/byom/pipeline_definitions/templates_parameters.py new file mode 100644 index 0000000..2c2acc4 --- /dev/null +++ b/source/lib/blueprints/byom/pipeline_definitions/templates_parameters.py @@ -0,0 +1,457 @@ +# ##################################################################################################################### +# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +from aws_cdk import core + + +def create_notification_email_parameter(scope): + return core.CfnParameter( + scope, + "NOTIFICATION_EMAIL", + type="String", + description="email for pipeline outcome notifications", + allowed_pattern="^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$", + constraint_description="Please enter an email address with correct format (example@exmaple.com)", + min_length=5, + max_length=320, + ) + + +def create_git_address_parameter(scope): + return core.CfnParameter( + scope, + "CodeCommit Repo Address", + type="String", + description="AWS CodeCommit repository clone URL to connect to the framework.", + allowed_pattern=( + "^(((https:\/\/|ssh:\/\/)(git\-codecommit)\.[a-zA-Z0-9_.+-]+(amazonaws\.com\/)[a-zA-Z0-9-.]" + "+(\/)[a-zA-Z0-9-.]+(\/)[a-zA-Z0-9-.]+$)|^$)" + ), + min_length=0, + max_length=320, + constraint_description=( + "CodeCommit address must follow the pattern: ssh or " + "https://git-codecommit.REGION.amazonaws.com/version/repos/REPONAME" + ), + ) + + +def create_existing_bucket_parameter(scope): + return core.CfnParameter( + scope, + "ExistingS3Bucket", + type="String", + description="Name of existing S3 bucket to be used for ML assests. S3 Bucket must be in the same region as the deployed stack, and has versioning enabled. If not provided, a new S3 bucket will be created.", + allowed_pattern="((?=^.{3,63}$)(?!^(\d+\.)+\d+$)(^(([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])\.)*([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])$)|^$)", + min_length=0, + max_length=63, + ) + + +def create_existing_ecr_repo_parameter(scope): + return core.CfnParameter( + scope, + "ExistingECRRepo", + type="String", + description="Name of existing Amazom ECR repository for custom algorithms. If not provided, a new ECR repo will be created.", + allowed_pattern="((?:[a-z0-9]+(?:[._-][a-z0-9]+)*/)*[a-z0-9]+(?:[._-][a-z0-9]+)*|^$)", + min_length=0, + max_length=63, + ) + + +def create_account_id_parameter(scope, id, account_type): + return core.CfnParameter( + scope, + id, + type="String", + description=f"AWS {account_type} account number where the CF template will be deployed", + allowed_pattern="^\d{12}$", + ) + + +def create_org_id_parameter(scope, id, account_type): + return core.CfnParameter( + scope, + id, + type="String", + description=f"AWS {account_type} organizational unit id where the CF template will be deployed", + allowed_pattern="^ou-[0-9a-z]{4,32}-[a-z0-9]{8,32}$", + ) + + +def create_blueprint_bucket_name_parameter(scope): + return core.CfnParameter( + scope, + "BLUEPRINT_BUCKET", + type="String", + description="Bucket name for blueprints of different types of ML Pipelines.", + min_length=3, + ) + + +def create_data_capture_bucket_name_parameter(scope): + return core.CfnParameter( + scope, + "DATA_CAPTURE_BUCKET", + type="String", + description="Bucket name where the data captured from SageMaker endpoint will be stored.", + min_length=3, + ) + + +def create_baseline_output_bucket_name_parameter(scope): + return core.CfnParameter( + scope, + "BASELINE_OUTPUT_BUCKET", + type="String", + description="Bucket name where the output of the baseline job will be stored.", + min_length=3, + ) + + +def create_batch_input_bucket_name_parameter(scope): + return core.CfnParameter( + scope, + "BATCH_INPUT_BUCKET", + type="String", + description="Bucket name where the data input of the bact transform is stored.", + min_length=3, + ) + + +def create_assets_bucket_name_parameter(scope): + return core.CfnParameter( + scope, + "ASSETS_BUCKET", + type="String", + description="Bucket name where the model and training data are stored.", + min_length=3, + ) + + +def create_custom_algorithms_ecr_repo_arn_parameter(scope): + return core.CfnParameter( + scope, + "CUSTOM_ALGORITHMS_ECR_REPO_ARN", + type="String", + description="The arn of the Amazon ECR repository where custom algorithm image is stored (optional)", + allowed_pattern="(^arn:aws:ecr:(us(-gov)?|ap|ca|cn|eu|sa)-(central|(north|south)?(east|west)?)-\\d:\\d{12}:repository/.+|^$)", + constraint_description="Please enter valid ECR repo ARN", + min_length=0, + max_length=2048, + ) + + +def create_kms_key_arn_parameter(scope): + return core.CfnParameter( + scope, + "KMS_KEY_ARN", + type="String", + description="The KMS ARN to encrypt the output of the batch transform job and instance volume (optional).", + allowed_pattern="(^arn:aws:kms:(us(-gov)?|ap|ca|cn|eu|sa)-(central|(north|south)?(east|west)?)-\d:\d{12}:key/.+|^$)", + constraint_description="Please enter kmsKey ARN", + min_length=0, + max_length=2048, + ) + + +def create_algorithm_image_uri_parameter(scope): + return core.CfnParameter( + scope, + "IMAGE_URI", + type="String", + description="The algorithm image uri (build-in or custom)", + ) + + +def create_model_name_parameter(scope): + return core.CfnParameter( + scope, "MODEL_NAME", type="String", description="An arbitrary name for the model.", min_length=1 + ) + + +def create_stack_name_parameter(scope): + return core.CfnParameter( + scope, "STACK_NAME", type="String", description="The name to assign to the deployed CF stack.", min_length=1 + ) + + +def create_endpoint_name_parameter(scope): + return core.CfnParameter( + scope, "ENDPOINT_NAME", type="String", description="The name of the ednpoint to monitor", min_length=1 + ) + + +def create_model_artifact_location_parameter(scope): + return core.CfnParameter( + scope, + "MODEL_ARTIFACT_LOCATION", + type="String", + description="Path to model artifact inside assets bucket.", + ) + + +def create_inference_instance_parameter(scope): + return core.CfnParameter( + scope, + "INFERENCE_INSTANCE", + type="String", + description="Inference instance that inference requests will be running on. E.g., ml.m5.large", + allowed_pattern="^[a-zA-Z0-9_.+-]+\.[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$", + min_length=7, + ) + + +def create_batch_inference_data_parameter(scope): + return core.CfnParameter( + scope, + "BATCH_INFERENCE_DATA", + type="String", + description="S3 bukcet path (including bucket name) to batch inference data file.", + ) + + +def create_batch_job_output_location_parameter(scope): + return core.CfnParameter( + scope, + "BATCH_OUTPUT_LOCATION", + type="String", + description="S3 path (including bucket name) to store the results of the batch job.", + ) + + +def create_data_capture_location_parameter(scope): + return core.CfnParameter( + scope, + "DATA_CAPTURE_LOCATION", + type="String", + description="S3 path (including bucket name) to store captured data from the Sagemaker endpoint.", + min_length=3, + ) + + +def create_baseline_job_output_location_parameter(scope): + return core.CfnParameter( + scope, + "BASELINE_JOB_OUTPUT_LOCATION", + type="String", + description="S3 path (including bucket name) to store the Data Baseline Job's output.", + min_length=3, + ) + + +def create_monitoring_output_location_parameter(scope): + return core.CfnParameter( + scope, + "MONITORING_OUTPUT_LOCATION", + type="String", + description="S3 path (including bucket name) to store the output of the Monitoring Schedule.", + min_length=3, + ) + + +def create_schedule_expression_parameter(scope): + return core.CfnParameter( + scope, + "SCHEDULE_EXPRESSION", + type="String", + description="cron expression to run the monitoring schedule. E.g., cron(0 * ? * * *), cron(0 0 ? * * *), etc.", + allowed_pattern="^cron(\\S+\\s){5}\\S+$", + ) + + +def create_training_data_parameter(scope): + return core.CfnParameter( + scope, + "TRAINING_DATA", + type="String", + description="Location of the training data in Assets S3 Bucket.", + ) + + +def create_instance_type_parameter(scope): + return core.CfnParameter( + scope, + "INSTANCE_TYPE", + type="String", + description="EC2 instance type that model moniroing jobs will be running on. E.g., ml.m5.large", + allowed_pattern="^[a-zA-Z0-9_.+-]+\.[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$", + min_length=7, + ) + + +def create_instance_volume_size_parameter(scope): + return core.CfnParameter( + scope, + "INSTANCE_VOLUME_SIZE", + type="Number", + description="Instance volume size used in model moniroing jobs. E.g., 20", + ) + + +def create_monitoring_type_parameter(scope): + return core.CfnParameter( + scope, + "MONITORING_TYPE", + type="String", + allowed_values=["dataquality", "modelquality", "modelbias", "modelexplainability"], + default="dataquality", + description="Type of model monitoring. Possible values: DataQuality | ModelQuality | ModelBias | ModelExplainability ", + ) + + +def create_max_runtime_seconds_parameter(scope): + return core.CfnParameter( + scope, + "MAX_RUNTIME_SECONDS", + type="Number", + description="Max runtime in secodns the job is allowed to run. E.g., 3600", + ) + + +def create_baseline_job_name_parameter(scope): + return core.CfnParameter( + scope, + "BASELINE_JOB_NAME", + type="String", + description="Unique name of the data baseline job", + min_length=3, + max_length=63, + ) + + +def create_monitoring_schedule_name_parameter(scope): + return core.CfnParameter( + scope, + "MONITORING_SCHEDULE_NAME", + type="String", + description="Unique name of the monitoring schedule job", + min_length=3, + max_length=63, + ) + + +def create_template_zip_name_parameter(scope): + return core.CfnParameter( + scope, + "TEMPLATE_ZIP_NAME", + type="String", + allowed_pattern="^.*\.zip$", + description="The zip file's name containing the CloudFormation template and its parameters files", + ) + + +def create_template_file_name_parameter(scope): + return core.CfnParameter( + scope, + "TEMPLATE_FILE_NAME", + type="String", + allowed_pattern="^.*\.yaml$", + description="CloudFormation template's file name", + ) + + +def create_stage_params_file_name_parameter(scope, id, stage_type): + return core.CfnParameter( + scope, + id, + type="String", + allowed_pattern="^.*\.json$", + description=f"parameters json file's name for the {stage_type} stage", + ) + + +def create_custom_container_parameter(scope): + return core.CfnParameter( + scope, + "CUSTOM_CONTAINER", + default="", + type="String", + description=( + "Should point to a zip file containing dockerfile and assets for building a custom model. " + "If empty it will beusing containers from SageMaker Registry" + ), + ) + + +def create_ecr_repo_name_parameter(scope): + return core.CfnParameter( + scope, + "ECR_REPO_NAME", + type="String", + description="Name of the Amazon ECR repository. This repo will be useed to store custom algorithms images.", + allowed_pattern="(?:[a-z0-9]+(?:[._-][a-z0-9]+)*/)*[a-z0-9]+(?:[._-][a-z0-9]+)*", + min_length=1, + ) + + +def create_image_tag_parameter(scope): + return core.CfnParameter( + scope, "IMAGE_TAG", type="String", description="Docker image tag for the custom algorithm", min_length=1 + ) + + +def create_custom_algorithms_ecr_repo_arn_provided_condition(scope, custom_algorithms_ecr_repo_arn): + return core.CfnCondition( + scope, + "CustomECRRepoProvided", + expression=core.Fn.condition_not(core.Fn.condition_equals(custom_algorithms_ecr_repo_arn, "")), + ) + + +def create_kms_key_arn_provided_condition(scope, kms_key_arn): + return core.CfnCondition( + scope, + "KMSKeyProvided", + expression=core.Fn.condition_not(core.Fn.condition_equals(kms_key_arn, "")), + ) + + +def create_git_address_provided_condition(scope, git_address): + return core.CfnCondition( + scope, + "GitAddressProvided", + expression=core.Fn.condition_not(core.Fn.condition_equals(git_address, "")), + ) + + +def create_existing_bucket_provided_condition(scope, existing_bucket): + return core.CfnCondition( + scope, + "S3BucketProvided", + expression=core.Fn.condition_not(core.Fn.condition_equals(existing_bucket.value_as_string, "")), + ) + + +def create_existing_ecr_provided_condition(scope, existing_ecr_repo): + return core.CfnCondition( + scope, + "ECRProvided", + expression=core.Fn.condition_not(core.Fn.condition_equals(existing_ecr_repo.value_as_string, "")), + ) + + +def create_new_bucket_condition(scope, existing_bucket): + return core.CfnCondition( + scope, + "CreateS3Bucket", + expression=core.Fn.condition_equals(existing_bucket.value_as_string, ""), + ) + + +def create_new_ecr_repo_condition(scope, existing_ecr_repo): + return core.CfnCondition( + scope, + "CreateECRRepo", + expression=core.Fn.condition_equals(existing_ecr_repo.value_as_string, ""), + ) diff --git a/source/lib/blueprints/byom/realtime_inference_pipeline.py b/source/lib/blueprints/byom/realtime_inference_pipeline.py new file mode 100644 index 0000000..83d991a --- /dev/null +++ b/source/lib/blueprints/byom/realtime_inference_pipeline.py @@ -0,0 +1,171 @@ +# ##################################################################################################################### +# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +from aws_cdk import ( + aws_lambda as lambda_, + aws_s3 as s3, + aws_apigateway as apigw, + core, +) + +from aws_solutions_constructs.aws_lambda_sagemakerendpoint import LambdaToSagemakerEndpoint +from aws_solutions_constructs import aws_apigateway_lambda +from lib.blueprints.byom.pipeline_definitions.sagemaker_role import create_sagemaker_role +from lib.blueprints.byom.pipeline_definitions.sagemaker_model import create_sagemaker_model +from lib.blueprints.byom.pipeline_definitions.sagemaker_endpoint_config import create_sagemaker_endpoint_config +from lib.blueprints.byom.pipeline_definitions.sagemaker_endpoint import create_sagemaker_endpoint +from lib.blueprints.byom.pipeline_definitions.helpers import suppress_lambda_policies +from lib.blueprints.byom.pipeline_definitions.templates_parameters import ( + create_blueprint_bucket_name_parameter, + create_assets_bucket_name_parameter, + create_algorithm_image_uri_parameter, + create_custom_algorithms_ecr_repo_arn_parameter, + create_inference_instance_parameter, + create_kms_key_arn_parameter, + create_model_artifact_location_parameter, + create_model_name_parameter, + create_data_capture_location_parameter, + create_custom_algorithms_ecr_repo_arn_provided_condition, + create_kms_key_arn_provided_condition, +) + + +class BYOMRealtimePipelineStack(core.Stack): + def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: + super().__init__(scope, id, **kwargs) + + # Parameteres # + assets_bucket_name = create_assets_bucket_name_parameter(self) + blueprint_bucket_name = create_blueprint_bucket_name_parameter(self) + custom_algorithms_ecr_repo_arn = create_custom_algorithms_ecr_repo_arn_parameter(self) + kms_key_arn = create_kms_key_arn_parameter(self) + algorithm_image_uri = create_algorithm_image_uri_parameter(self) + model_name = create_model_name_parameter(self) + model_artifact_location = create_model_artifact_location_parameter(self) + data_capture_location = create_data_capture_location_parameter(self) + inference_instance = create_inference_instance_parameter(self) + + # Conditions + custom_algorithms_ecr_repo_arn_provided = create_custom_algorithms_ecr_repo_arn_provided_condition( + self, custom_algorithms_ecr_repo_arn + ) + kms_key_arn_provided = create_kms_key_arn_provided_condition(self, kms_key_arn) + + # Resources # + # getting blueprint bucket object from its name - will be used later in the stack + blueprint_bucket = s3.Bucket.from_bucket_name(self, "BlueprintBucket", blueprint_bucket_name.value_as_string) + + # provision api gateway and lambda for inference using solution constructs + inference_api_gateway = aws_apigateway_lambda.ApiGatewayToLambda( + self, + "BYOMInference", + lambda_function_props={ + "runtime": lambda_.Runtime.PYTHON_3_8, + "handler": "main.handler", + "code": lambda_.Code.from_bucket(blueprint_bucket, "blueprints/byom/lambdas/inference.zip"), + }, + api_gateway_props={ + "defaultMethodOptions": { + "authorizationType": apigw.AuthorizationType.IAM, + }, + "restApiName": f"{core.Aws.STACK_NAME}-inference", + "proxy": False, + }, + ) + # add supressions + inference_api_gateway.lambda_function.node.default_child.cfn_options.metadata = suppress_lambda_policies() + provision_resource = inference_api_gateway.api_gateway.root.add_resource("inference") + provision_resource.add_method("POST") + + # create Sagemaker role + sagemaker_role = create_sagemaker_role( + self, + "MLOpsRealtimeSagemakerRole", + custom_algorithms_ecr_arn=custom_algorithms_ecr_repo_arn.value_as_string, + kms_key_arn=kms_key_arn.value_as_string, + assets_bucket_name=assets_bucket_name.value_as_string, + input_bucket_name=assets_bucket_name.value_as_string, + input_s3_location=assets_bucket_name.value_as_string, + output_s3_location=data_capture_location.value_as_string, + ecr_repo_arn_provided_condition=custom_algorithms_ecr_repo_arn_provided, + kms_key_arn_provided_condition=kms_key_arn_provided, + ) + + # create sagemaker model + sagemaker_model = create_sagemaker_model( + self, + "MLOpsSagemakerModel", + execution_role=sagemaker_role, + primary_container={ + "image": algorithm_image_uri.value_as_string, + "modelDataUrl": f"s3://{assets_bucket_name.value_as_string}/{model_artifact_location.value_as_string}", + }, + tags=[{"key": "model_name", "value": model_name.value_as_string}], + ) + + # Create Sagemaker EndpointConfg + sagemaker_endpoint_config = create_sagemaker_endpoint_config( + self, + "MLOpsSagemakerEndpointConfig", + sagemaker_model.attr_model_name, + model_name.value_as_string, + inference_instance.value_as_string, + data_capture_location.value_as_string, + core.Fn.condition_if( + kms_key_arn_provided.logical_id, kms_key_arn.value_as_string, core.Aws.NO_VALUE + ).to_string(), + ) + + # create a dependency on the model + sagemaker_endpoint_config.add_depends_on(sagemaker_model) + + # create Sagemaker endpoint + sagemaker_endpoint = create_sagemaker_endpoint( + self, + "MLOpsSagemakerEndpoint", + sagemaker_endpoint_config.attr_endpoint_config_name, + model_name.value_as_string, + ) + + # add dependency on endpoint config + sagemaker_endpoint.add_depends_on(sagemaker_endpoint_config) + + # Create Lambda - sagemakerendpoint + LambdaToSagemakerEndpoint( + self, + "LambdaSagmakerEndpoint", + existing_sagemaker_endpoint_obj=sagemaker_endpoint, + existing_lambda_obj=inference_api_gateway.lambda_function, + ) + + # Outputs # + core.CfnOutput( + self, + id="SageMakerModelName", + value=sagemaker_model.attr_model_name, + ) + core.CfnOutput( + self, + id="SageMakerEndpointConfigName", + value=sagemaker_endpoint_config.attr_endpoint_config_name, + ) + core.CfnOutput( + self, + id="SageMakerEndpointName", + value=sagemaker_endpoint.attr_endpoint_name, + ) + core.CfnOutput( + self, + id="EndpointDataCaptureLocation", + value=f"https://s3.console.aws.amazon.com/s3/buckets/{data_capture_location.value_as_string}/", + description="Endpoint data capture location (to be used by Model Monitor)", + ) diff --git a/source/lib/blueprints/byom/single_account_codepipeline.py b/source/lib/blueprints/byom/single_account_codepipeline.py new file mode 100644 index 0000000..4d3dc01 --- /dev/null +++ b/source/lib/blueprints/byom/single_account_codepipeline.py @@ -0,0 +1,137 @@ +# ##################################################################################################################### +# Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved. # +# # +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance # +# with the License. A copy of the License is located at # +# # +# http://www.apache.org/licenses/LICENSE-2.0 # +# # +# or in the 'license' file accompanying this file. This file is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES # +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions # +# and limitations under the License. # +# ##################################################################################################################### +from aws_cdk import ( + aws_iam as iam, + aws_s3 as s3, + aws_sns as sns, + aws_sns_subscriptions as subscriptions, + aws_events_targets as targets, + aws_events as events, + aws_codepipeline as codepipeline, + core, +) +from lib.blueprints.byom.pipeline_definitions.source_actions import source_action_template +from lib.blueprints.byom.pipeline_definitions.deploy_actions import create_cloudformation_action +from lib.blueprints.byom.pipeline_definitions.helpers import ( + pipeline_permissions, + suppress_pipeline_bucket, + suppress_iam_complex, + suppress_sns, + suppress_cloudformation_action, +) +from lib.blueprints.byom.pipeline_definitions.templates_parameters import ( + create_notification_email_parameter, + create_template_zip_name_parameter, + create_template_file_name_parameter, + create_stage_params_file_name_parameter, + create_assets_bucket_name_parameter, + create_stack_name_parameter, +) + + +class SingleAccountCodePipelineStack(core.Stack): + def __init__(self, scope: core.Construct, id: str, **kwargs) -> None: + super().__init__(scope, id, **kwargs) + + # Parameteres # + notification_email = create_notification_email_parameter(self) + template_zip_name = create_template_zip_name_parameter(self) + template_file_name = create_template_file_name_parameter(self) + template_params_file_name = create_stage_params_file_name_parameter(self, "TEMPLATE_PARAMS_NAME", "main") + assets_bucket_name = create_assets_bucket_name_parameter(self) + stack_name = create_stack_name_parameter(self) + + # Resources # + assets_bucket = s3.Bucket.from_bucket_name(self, "AssetsBucket", assets_bucket_name.value_as_string) + + # create sns topic and subscription + pipeline_notification_topic = sns.Topic( + self, + "SinglePipelineNotification", + ) + pipeline_notification_topic.node.default_child.cfn_options.metadata = suppress_sns() + pipeline_notification_topic.add_subscription( + subscriptions.EmailSubscription(email_address=notification_email.value_as_string) + ) + + # Defining pipeline stages + # source stage + source_output, source_action_definition = source_action_template(template_zip_name, assets_bucket) + + # create cloudformation action + cloudformation_action = create_cloudformation_action( + self, + "deploy_stack", + stack_name.value_as_string, + source_output, + template_file_name.value_as_string, + template_params_file_name.value_as_string, + ) + + source_stage = codepipeline.StageProps(stage_name="Source", actions=[source_action_definition]) + deploy = codepipeline.StageProps( + stage_name="DeployCloudFormation", + actions=[cloudformation_action], + ) + + single_account_pipeline = codepipeline.Pipeline( + self, + "SingleAccountPipeline", + stages=[source_stage, deploy], + cross_account_keys=False, + ) + + # Add CF suppressions to the action + deployment_policy = cloudformation_action.deployment_role.node.find_all()[2] + deployment_policy.node.default_child.cfn_options.metadata = suppress_cloudformation_action() + + # add notification to the single-account pipeline + single_account_pipeline.on_state_change( + "NotifyUser", + description="Notify user of the outcome of the pipeline", + target=targets.SnsTopic( + pipeline_notification_topic, + message=events.RuleTargetInput.from_text( + ( + f"Pipeline {events.EventField.from_path('$.detail.pipeline')} finished executing. " + f"Pipeline execution result is {events.EventField.from_path('$.detail.state')}" + ) + ), + ), + event_pattern=events.EventPattern(detail={"state": ["SUCCEEDED", "FAILED"]}), + ) + single_account_pipeline.add_to_role_policy( + iam.PolicyStatement( + actions=["events:PutEvents"], + resources=[ + f"arn:{core.Aws.PARTITION}:events:{core.Aws.REGION}:{core.Aws.ACCOUNT_ID}:event-bus/*", + ], + ) + ) + + # add cfn supressions + pipeline_child_nodes = single_account_pipeline.node.find_all() + pipeline_child_nodes[1].node.default_child.cfn_options.metadata = suppress_pipeline_bucket() + pipeline_child_nodes[6].node.default_child.cfn_options.metadata = suppress_iam_complex() + # attaching iam permissions to the pipelines + pipeline_permissions(single_account_pipeline, assets_bucket) + + # Outputs # + core.CfnOutput( + self, + id="Pipelines", + value=( + f"https://console.aws.amazon.com/codesuite/codepipeline/pipelines/" + f"{single_account_pipeline.pipeline_name}/view?region={core.Aws.REGION}" + ), + ) diff --git a/source/requirements-test.txt b/source/requirements-test.txt index dab34f7..8a8c3cd 100644 --- a/source/requirements-test.txt +++ b/source/requirements-test.txt @@ -1,7 +1,7 @@ -sagemaker==2.15.3 -boto3==1.14.62 +sagemaker==2.39.0 +boto3==1.17.23 crhelper==2.0.6 urllib3==1.25.10 pytest==6.1.2 pytest-cov==2.10.1 -moto==1.3.16 \ No newline at end of file +moto[all]==2.0.2 \ No newline at end of file diff --git a/source/requirements.txt b/source/requirements.txt index d29bdf0..09c336b 100644 --- a/source/requirements.txt +++ b/source/requirements.txt @@ -1,32 +1,34 @@ -aws-cdk.assets==1.83.0 -aws-cdk.aws-apigateway==1.83.0 -aws-cdk.aws-cloudformation==1.83.0 -aws-cdk.aws-cloudwatch==1.83.0 -aws-cdk.aws-codebuild==1.83.0 -aws-cdk.aws-codecommit==1.83.0 -aws-cdk.aws-codedeploy==1.83.0 -aws-cdk.aws-codepipeline==1.83.0 -aws-cdk.aws-codepipeline-actions==1.83.0 -aws-cdk.core==1.83.0 -aws-cdk.aws-ecr==1.83.0 -aws-cdk.aws-ecr-assets==1.83.0 -aws-cdk.aws-events==1.83.0 -aws-cdk.aws-events-targets==1.83.0 -aws-cdk.aws-iam==1.83.0 -aws-cdk.aws-kms==1.83.0 -aws-cdk.aws-lambda==1.83.0 -aws-cdk.aws-lambda-event-sources==1.83.0 -aws-cdk.aws-logs==1.83.0 -aws-cdk.aws-s3==1.83.0 -aws-cdk.aws-s3-assets==1.83.0 -aws-cdk.aws-s3-deployment==1.83.0 -aws-cdk.aws-s3-notifications==1.83.0 -aws-cdk.aws-sagemaker==1.83.0 -aws-cdk.aws-sns==1.83.0 -aws-cdk.aws-sns-subscriptions==1.83.0 -aws-cdk.core==1.83.0 -aws-cdk.custom-resources==1.83.0 -aws-cdk.region-info==1.83.0 -aws-solutions-constructs.aws-apigateway-lambda==1.83.0 -aws-solutions-constructs.core==1.83.0 - +aws-cdk.assets==1.96.0 +aws-cdk.aws-apigateway==1.96.0 +aws-cdk.aws-cloudformation==1.96.0 +aws-cdk.aws-cloudwatch==1.96.0 +aws-cdk.aws-codebuild==1.96.0 +aws-cdk.aws-codecommit==1.96.0 +aws-cdk.aws-codedeploy==1.96.0 +aws-cdk.aws-codepipeline==1.96.0 +aws-cdk.aws-codepipeline-actions==1.96.0 +aws-cdk.core==1.96.0 +aws-cdk.aws-ecr==1.96.0 +aws-cdk.aws-ecr-assets==1.96.0 +aws-cdk.aws-events==1.96.0 +aws-cdk.aws-events-targets==1.96.0 +aws-cdk.aws-iam==1.96.0 +aws-cdk.aws-kms==1.96.0 +aws-cdk.aws-lambda==1.96.0 +aws-cdk.aws-lambda-event-sources==1.96.0 +aws-cdk.aws-logs==1.96.0 +aws-cdk.aws-s3==1.96.0 +aws-cdk.aws-s3-assets==1.96.0 +aws-cdk.aws-s3-deployment==1.96.0 +aws-cdk.aws-s3-notifications==1.96.0 +aws-cdk.aws-sagemaker==1.96.0 +aws-cdk.aws-sns==1.96.0 +aws-cdk.aws-sns-subscriptions==1.96.0 +aws-cdk.core==1.96.0 +aws-cdk.custom-resources==1.96.0 +aws-cdk.region-info==1.96.0 +aws-solutions-constructs.aws-apigateway-lambda==1.96.0 +aws-solutions-constructs.aws-lambda-sagemakerendpoint==1.96.0 +aws-solutions-constructs.core==1.96.0 +aws-cdk.cloudformation-include==1.96.0 +aws-cdk.aws-cloudformation==1.96.0 diff --git a/source/run-all-tests.sh b/source/run-all-tests.sh index 03e6c3d..07147be 100755 --- a/source/run-all-tests.sh +++ b/source/run-all-tests.sh @@ -119,11 +119,11 @@ run_framework_lambda_test() { for folder in */ ; do cd "$folder" function_name=${PWD##*/} - if [ "$folder" != "custom_resource/" ]; then - pip install -r requirements-test.txt - run_python_test $(basename $folder) - rm -rf *.egg-info - fi + + pip install -r requirements-test.txt + run_python_test $(basename $folder) + rm -rf *.egg-info + cd .. done }