Orchestrate Amazon EMR Serverless jobs with AWS Step functions
In this project based in a real-world scenario, I acted as the Cloud Engineer to run Spark job on EMR Serverless that processes the data in an Amazon Simple Storage Service (Amazon S3) bucket and stores the aggregated results in Amazon S3.
Below are few screenshots:
Create CloudFormation Stack
Stack creation complete
S3 Bucket, EMR Serverless Role and Step Function Role is created
Pyspark script and data is uploaded to S3 bucket folders
Create the Step Function
Select Blank
Create the workflow in the Step Function
Step Function workflow creation is complete
Start the execution
The Step Function workflow failed with below permission error
Re-started the workflow execution after giving the necessary permission
The execution completed successfully
View the workflow run details
The output is stored in the folder of the S3 bucket
Download the result
We can open the csv file and see the result