Customize Amazon SageMaker Studio using Lifecycle Configurations

Amazon SageMaker Studio is a web-based, integrated advancement environment (IDE) for machine learning (ML) that lets you build, train, debug, release, and monitor your ML designs. It supplies all the tools you require to take your designs from experimentation to production while improving your efficiency. You can write code, track experiments, envision data, and perform debugging and tracking within a single, integrated visual user interface.
Were excited to reveal Lifecycle Configuration for Studio, a new ability that enables designers to automate personalization for your Studio advancement environments.
Lifecycle setups are shell scripts triggered by Studio lifecycle events, such as beginning a brand-new Studio note pad. You can utilize these shell scripts to automate customization for your Studio environments, such as installing JupyterLab extensions, preloading datasets, and setting up source code repositories.
Formerly, customizations to Studio environments were possible, but you needed to reapply them by hand whenever apps were deleted or recreated. Lifecycle configuration offers a method to immediately and repeatably apply your customizations.
In this post, we reveal you how to utilize lifecycle configurations for 3 common personalization usage cases:

Setting up customized bundles
Configuring auto-shutdown of non-active notebook apps
Establishing Git configuration

For more examples, go to the SageMaker Studio Lifecycle Configuration Samples repository on GitHub.
Install custom-made plans on base kernel images
One common use case for lifecycle configuration is to set up custom libraries so theyre offered right now whenever you start a new kernel app. Lifecycle configuration allows you to automate this procedure without the requirement to build a customized Studio image.
State that you require to set up pyarrow in your note pad environment so that you can deal with a Parquet-formatted training dataset for your ML design. Lets see how to utilize lifecycle configuration to automate the installation of this reliance in the kernel.
The following is the common workflow for using lifecycle setup in your apps:

Compose the script
Convert the script to a base64 encoded string.
Create a lifecycle configuration entity through the AWS Command Line Interface (AWS CLI).
Associate the lifecycle configuration to a domain or user profile.
Start the Studio app with the specified lifecycle configuration.

Write the script.
The following sample script installs pyarrow utilizing the pip package manager. You can modify this script to set up the dependences you require for your own notebooks:

pip install– upgrade $PACKAGE.

# This script installs a single pip bundle on a SageMaker Studio Kernel Application
#!/ bin/bash.

KernelGateway– Enables access to the code run environment and kernels for your Studio note pads and terminals.

LCC_CONTENT= openssl base64 -A -in install-package. sh.

set -eux.

The base64-encoded script content is now saved in the LCC_CONTENT variable.
Produce a lifecycle setup entity via the AWS CLI.
Now we can develop the lifecycle setup entity using the AWS CLI, defining the base64-encoded material saved in the LCC_CONTENT variable as the lifecycle configuration material.
At this moment, you need to determine whether the lifecycle setup ought to belong to the JupyterServer or KernelGateway app type:.

aws sagemaker create-studio-lifecycle-config.
— studio-lifecycle-config-content $LCC_CONTENT.
— studio-lifecycle-config-name install-pip-package-on-kernel.
— studio-lifecycle-config-app-type KernelGateway.

# PARAMETERS.
PLAN= pyarrow.

After you produce the lifecycle setup, note the lifecycle setup ARN returned in the reaction:.

In this case, since we want to customize the kernel environment that the notebook code runs in by setting up extra custom-made bundles, we ought to specify KernelGateway for the lifecycle setup app type. In the following code, we name the created entity install-pip-package-on-kernel, however you are totally free to utilize your own:.

JupyterServer– Enables access to the visual interface for Studio.

One valuable practice when developing and debugging your own scripts is to utilize set -eux, which assists you to see in the logs where a failure occurred. It composes the commands line by line while its running, and stops the script immediately when there is a failure.
Lets conserve the preceding script as a file called install-package. sh.
Transform the script to a base64 encoded string.
When creating the lifecycle config, we pass the script contents as a base64 encoded string. In a terminal, use the following command:.

” StudioLifecycleConfigArn”: “arn: aws: sagemaker: us-east-2:123456789012: studio-lifecycle-config

set -eux.

For per-user bypasses, you can specify a default lifecycle setup in the user profile, which bypasses any specified for the domain.
Set up Git setup.
Designers frequently keep their code or notebooks in version-controlled Git repositories to work together with others. Usually, this needs developers to set up user details or qualifications in their development environment.
Prior to making any Git devotes from your Studio environment, you want to set up the e-mail and user name that is associated with devotes. Generally, the commands appear like the following code:.

aws sagemaker update-user-profile– domain-id d-abc123.
— user-profile-name my-existing-user.
— user-settings .

Start the app.
After you add the lifecycle setup to the domain or user, in the Studio user interface, go to the Launcher where you create new notebooks. Beside the image choice option (Select a SageMaker image), you can see the Select a start-up script option.

Before you can utilize the lifecycle configuration, you require to associate it with a Studio domain or user profile. The set of lifecycle configurations defined in the domain or user profile settings identifies which lifecycle setups are readily available for the domain or user profile to utilize. Note that lifecycle setups attached to a domain are inherited by all users of a domain, but those connected to a user are scoped particularly to that user.
You can utilize the AWS CLI to develop a user profile that can utilize our brand-new lifecycle configuration. Because this lifecycle setup is associated with the KernelGateway app type, we include it to the list of lifecycle config ARNs under KernelGatewayAppSettings.

Alternatively, you can update an existing user profile to include the lifecycle setup:.

aws sagemaker create-studio-lifecycle-config.
— studio-lifecycle-config-name install-autoshutdown-extension.
— studio-lifecycle-config-content $LCC_CONTENT.
— studio-lifecycle-config-app-type JupyterServer.

aws sagemaker create-user-profile– domain-id d-abc123.
— user-profile-name my-new-user.
— user-settings
” KernelGatewayAppSettings”: install-pip-package- on-kernel”] .

git config– international user.email “[email protected]”.
git config– worldwide user.name “Your Name”.

The new note pad now uses the defined script.
Configure auto-shutdown of inactive kernels.
Lets state youre an administrator for a Studio domain, and wish to save costs by having notebook apps closed down automatically after extended periods of lack of exercise. You can create a lifecycle configuration on that installs the Studio auto-shutdown JupyterLab extension by default on users JupyterServer apps, so users dont have to install it by hand, and it stays made it possible for even if the JupyterServer app gets restarted.
The following script from the Studio lifecycle configuration example scripts repository (install-autoshutdown-extension) sets up the extension:.

The lifecycle setup entity is immutable. If you need to update a lifecycle configuration entity, you need to rather create a new lifecycle setup entity, update apps to use the brand-new lifecycle setup entity, and erase the old lifecycle setup entity.
Associate the lifecycle configuration to a domain or user profile.
Prior to you can use the lifecycle configuration, you require to associate it with a Studio domain or user profile. The set of lifecycle setups defined in the domain or user profile settings figures out which lifecycle configurations are offered for the domain or user profile to utilize.
You can utilize the AWS CLI to create a user profile that can utilize our brand-new lifecycle configuration. We add it to the list of lifecycle config ARNs under KernelGatewayAppSettings due to the fact that this lifecycle setup is associated with the KernelGateway app type. Make sure to change the domain ID in the following script. You can find your domain ID in the Studio Control Panel under Studio Summary.

# restarts jupyter server.
nohup supervisorctl -c/ etc/supervisor/conf. d/supervisord. conf restart jupyterlabserver.

You can validate the contents of the script after you pick it.

Choose the script from the offered ones for your user or domain on the drop-down menu.

About the Authors.
Andrew Ang is a Deep Learning Architect at the Amazon ML Solutions Lab, where he helps AWS customers determine and build AI/ML solutions to resolve their business issues.
Sumit Thakur is a Senior Product Manager for Amazon Machine Learning where he likes working on products that make it simple for consumers to get begun with device knowing on cloud. In his spare time, he likes connecting with nature and enjoying sci-fi TV series.
Ram Vegiraju is a ML Architect with the SageMaker Service team. He focuses on helping consumers build and enhance their AI/ML services on Amazon SageMaker. In his extra time, he likes writing and taking a trip.
Rama Thamman is a Software Development Manager with the AI Platforms team, leading the ML Migrations team.

It bypasses any default JupyterServer lifecycle configuration defined at the domain level when you specify a default lifecycle configuration in the user profile. See the following code:.

# Installs SageMaker Studios Auto Shutdown Idle Kernel Sessions extension.
#!/ bin/bash.

Base64 encode the script.
Develop a lifecycle setup entity. For git configuration scripts, utilize JupyterServer as the app type.
Attach it to the Studio entity that you wish to make the lifecycle setup readily available for use with. Since these scripts set up user-specific credentials, connect these at the user profile level.

sudo yum -y set up wget.
wget https://github.com/aws-samples/sagemaker-studio-auto-shutdown-extension/raw/main/sagemaker_studio_autoshutdown-0.1.1.tar.gz.
pip install sagemaker_studio_autoshutdown-0.1.1. tar.gz.
jlpm config set cache-folder/ tmp/yarncache.
jupyter lab construct– debug– minimize= False.

To have these settings persisted whenever the Jupyter Server restarts, you can utilize the following script (set-git-config) from the example scripts repository (ensure you customize it to add your own user name and e-mail) as a JupyterServer lifecycle setup script for your user profile.
Another frequent usage case is established Git credentials for authentication to remote repositories. Although you can utilize the AWS Identity and Access Management (IAM) execution function used by Studio to immediately verify to AWS CodeCommit repositories, developers may likewise require to by hand establish a password or designer token to connect to other repository sources such as GitHub.
The following script (set-git-credentials) from the example scripts repository shows you how to establish a workflow that retrieves a password or developer token from AWS Secrets Manager straight when validating to your remote repository. Storing passwords and tokens in Secrets Manager eliminates the need to store any delicate details on the Amazon Elastic File System (Amazon EFS) storage instance backing your Studio domain. Make sure that you customize the script with your own user name, secret name and key, and Region.
Comparable to the previous example, to establish the lifecycle setup, you complete the following actions:.

LCC_CONTENT= openssl base64 -A -in install-autoshutdown. sh # install-autoshutdown. sh is a file with the above script contents.

aws sagemaker update-user-profile– domain-id d-abc123.
— user-profile-name my-user.
— user-settings
” JupyterServerAppSettings”: set-git-config”.
] .

Due to the fact that were tailoring the JupyterServer app by installing a JupyterLab extension, this lifecycle setup ought to be related to the JupyterServer app type when producing the SageMaker entity:.

Conclusion.
In this post, we highlighted 3 usage cases that represent typical automation jobs for data and designers scientists, however you can discover more examples of what you can do with lifecycle configurations in the public repository of notebook lifecycle setup scripts.
You can begin utilizing lifecycle configuration for Studio today, in all Regions where Studio is readily available.
For additional information, see the following resources:.

aws sagemaker update-domain– domain-id d-abc123.
— default-user-settings
” JupyterServerAppSettings”: install-autoshutdown-extension”.
] .

We want to make this the default lifecycle setup for all users in the domain. We can achieve this by including the lifecycle configuration to the default settings of the domain using the DefaultResourceSpec settings. In this manner, the script runs by default whenever users in the domain log in to Studio for the first time or restart Studio:.

Leave a Reply

Your email address will not be published.