Create and manage Amazon EMR Clusters from SageMaker Studio to run interactive Spark and ML workloads – Part 2

Security groups
Finally, the security group that is connected to your Studio domain should enable outgoing traffic, and the security group of the Amazon EMR main node must enable inbound TCP traffic from the Studio circumstances security group.
The following screenshot reveals the incoming rules setup in your SageMaker account.

In Part 1 of this series, we offered detailed assistance for developing, linking, stopping, and debugging Amazon EMR clusters from Amazon SageMaker Studio in a single-account setup.
In this post, we dive deep into how you can use the very same performance in specific enterprise-ready, multi-account setups. As explained in the AWS Well-Architected Framework, separating work throughout accounts allows your organization to set typical guardrails while separating environments. This can be particularly helpful for particular security requirements, along with streamline cost in between groups and jobs.
Option summary
In this post, we go through the process to attain the following architectural setup. We present the very same easy user interface as we saw in Part 1 for our data workers, abstracting away multi-account information from their daily workflow when not needed.

After you make the peering request, the admin can accept this request from the 2nd account.

We first describe how to establish your cross-account networks in order to connect to Amazon EMR from Studio. To start, we require to ensure that some prerequisites are set properly. For our example, a DevOps admin needs to configure an Amazon SageMaker domain with an elastic network interface to a private VPC and specify the security group ID to attach.
Establish the network
After we established the Studio domain, we need to configure our network settings to permit communication between accounts.
VPC peering
We begin with VPC peering in between the accounts in order to assist in traffic back and forth.

The following route table of an Amazon EMR subnet reveals traffic outbound from the Amazon EMR account to Studio for 10.0.20.0/ 24 through a peering connection.

The following screenshot reveals the incoming rules configuration in your Amazon EMR account.

From our Studio account, on the Amazon Virtual Private Cloud (Amazon VPC) console, select Peering connections.
Choose Create peering connection.
Develop your request to peer the Studio VPC within the Amazon EMR accounts VPC.

When peering personal subnets, you must allow private IP DNS resolution at the VPC peering connection level.
Path tables
After you establish the peering connection, you must enable the flow of traffic by manually including paths to the private subnet path tables in both accounts. We do this to make it possible for development and connection of EMR clusters from the Studio account to the remote accounts personal subnet.
These paths point to the IP address series of the peered VPCs personal subnets and are set by going to the Route Tables tab found on the subnet page. Here the admin on each account can modify the paths.
The following path table of a Studio subnet shows traffic outbound from the Studio represent 2.0.1.0/ 24 through a peering connection.

Establish authorizations
We require to produce an AWS Identity and Access Management (IAM) role in the secondary Amazon EMR account that has the very same Amazon EMR visibility authorization as we saw in Part 1.
The following code shows the particular approvals for the IAM function. Its the very same as in Part 1, but consists of the policy AllowRoleAssumptionForCrossAccountDiscovery:

cat > > “$ FILE” <<< " Version": "2012-10-17",. " Statement": [ mkdir -p $FILE_DIRECTORY. set -eux. Again, its worth noting that you can customize the full set of residential or commercial properties for Amazon EMR when creating AWS CloudFormation templates that can be released though Studio. This suggests that you can allow Spot, vehicle scaling, and other popular configurations through your Service Catalog product. You can parameterize the preset CloudFormation design template, which develops the EMR cluster, so that end-users can customize various elements of the cluster to match their workloads. For example, the data researcher or information engineer may wish to specify the number of core nodes on the cluster, and the developer of the template can specify AllowedValues to set guardrails. Discover EMR clusters across accounts. To enable cluster discovery throughout accounts, we need to offer the formerly produced remote IAM role ARN to the Studio execution function. The Studio execution role presumes that remote function to discover and connect to EMR clusters in the remote account. When the Jupyter server begins, lifecycle setups run prior to reading of ARN roles that are composed in the config file. This makes it possible for administrators to overwrite and completely control which cross-account ARNs are utilized at runtime. After the LCC runs and the files are composed, the server checks out the file/ home/sagemaker-user/. FILE_DIRECTORY="/ home/sagemaker-user/. cross-account-configuration-DO_NOT_DELETE" FILE_NAME=" emr-discovery-iam-role-arns-DO_NOT_DELETE. json" FILE="$ FILE_DIRECTORY/$ FILE_NAME" #!/ bin/bash. ] # This script develops the file that notifies SageMaker Studio that the role "arn: aws: iam::123456789012: role/ASSUMABLE-ROLE" in remote account "123456789012" must be assumed to list and describe EMR clusters in the remote account. User journey. The following diagram shows the user journey for a combined note pad experience after you link your different accounts. Just as in the previous post, the DevOps persona develops an AWS Service Catalog product and portfolio within the Studio account, from which information workers can provision templated EMR clusters. This assumable function also requires a trust relationship with the Studio account (make certain to customize the account ID):.

Leave a Reply

Your email address will not be published.