This project is a subset of my work at Mckesson. Please note that the provided bucket name, file paths, and other credentials are placeholders and not real. Replace them with your actual credentials when setting up the project.
This repository contains the infrastructure code to set up a data processing environment on AWS. The infrastructure includes an EC2 instance for data processing, an S3 bucket for data storage, and CloudWatch Logs for monitoring.
- Prerequisites
- Infrastructure Components
- Setup Instructions
- Dockerization
- Jenkins CI/CD Pipeline
- Terraform Automation
- Usage
- Contributing
- License
Before setting up the infrastructure, make sure you have the following:
- AWS Account
- AWS CLI installed and configured
- Docker installed
- Jenkins installed and configured
An EC2 instance is provisioned to handle data processing tasks.
- AMI: Amazon Machine Image (Replace
your-ami-id
with the desired AMI ID) - Instance Type:
t2.micro
An S3 bucket is created to store data files securely.
- Bucket Name:
your-s3-bucket-name
CloudWatch Logs are configured to monitor the EC2 instance logs.
- Log Group Name:
/aws/ec2/data_processing_logs
git clone https://github.com/your-username/your-repo.git
cd your-repo
Create a requirements.txt
file with the required Python packages:
pandas==1.3.5
snowflake-connector-python==2.7.8
boto3==1.18.5
Install the dependencies:
pip install -r requirements.txt
Set the necessary environment variables for AWS credentials:
export AWS_ACCESS_KEY_ID=your-access-key-id
export AWS_SECRET_ACCESS_KEY=your-secret-access-key
The application is containerized using Docker for consistent deployment.
- Dockerfile: Contains the instructions to build the Docker image.
Jenkins is used for continuous integration and continuous deployment (CI/CD).
- Jenkinsfile: Defines the pipeline stages:
- Checkout
- Build Docker Image
- Run Tests
- Deploy to AWS using Terraform
Terraform is used for infrastructure as code (IaC) to provision and manage AWS resources.
- Terraform Files:
providers.tf
: AWS provider configurationec2_instance.tf
: EC2 instance resources3_bucket.tf
: S3 bucket resourcecloudwatch_logs.tf
: CloudWatch Logs resource
- Run the Jenkins pipeline to automate the infrastructure setup and deployment.
- Monitor the infrastructure and logs on AWS Console and CloudWatch.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License - see the LICENSE file for details.