Mohan Golla
3 min readJun 5, 2024

Orchestrate Amazon EMR Serverless jobs with AWS Step functions

In this project based in a real-world scenario, I acted as the Cloud Engineer to run Spark job on EMR Serverless that processes the data in an Amazon Simple Storage Service (Amazon S3) bucket and stores the aggregated results in Amazon S3.

Below are few screenshots:

Create CloudFormation Stack

Stack creation complete

S3 Bucket, EMR Serverless Role and Step Function Role is created

Pyspark script and data is uploaded to S3 bucket folders

Create the Step Function

Select Blank

Create the workflow in the Step Function

Step Function workflow creation is complete

Start the execution

The Step Function workflow failed with below permission error

Re-started the workflow execution after giving the necessary permission

The execution completed successfully

View the workflow run details

The output is stored in the folder of the S3 bucket

Download the result

We can open the csv file and see the result

Mohan Golla
Mohan Golla

No responses yet