Run AlphaFold v2.0 on Amazon EC2

Configure the correct Amazon Virtual Private Cloud (Amazon VPC) setting based on your AWS environment requirements. Think about utilizing the default Amazon VPC and evaluate Get started with Amazon VPC if this is your very first time configuring your Amazon VPC.
Set the system volume to 200 GiB, and include one brand-new information volume of 3 TB (3072 GiB) in size.

Install AlphaFold
Youre now ready to set up AlphaFold.

After the post in Nature about the open-source of AlphaFold v2.0 on GitHub by DeepMind, numerous in the clinical and research study neighborhood have actually desired to try DeepMinds AlphaFold implementation firsthand. With compute resources through Amazon Elastic Compute Cloud (Amazon EC2) with Nvidia GPU, you can rapidly get AlphaFold running and try it out yourself.
In this post, I supply you with step-by-step guidelines on how to set up AlphaFold on an EC2 instance with Nvidia GPU.
Summary of option
The procedure begins with a Deep Learning Amazon Machine Image (DLAMI). After installation, we run predictions utilizing the AlphaFold design with CASP14 samples on the circumstances. I also reveal how to produce an Amazon Elastic Block Store (Amazon EBS) snapshot for future usage to reduce the effort of setting it up again and save costs.
To run AlphaFold without setting up a new EC2 instance from scratch, go to the last section of this post. You can produce a brand-new EC2 circumstances with the offered public EBS pictures in a brief time.
The total expense for the AWS resources utilized in this post cost is less than $100 if you complete all the actions and shut down all resources within 24 hours. If you produce an EBS picture and store it inside your AWS account, the EBS picture storage cost is about $150 per month.
Launch an EC2 circumstances with a DLAMI
In this section, I show how to establish an EC2 circumstances using a DLAMI from AWS. It already has great deals of AlphaFolds reliances preinstalled and conserves time on the setup.

Make certain that the security group settings enable you to access the EC2 circumstances with SSH, and the EC2 instance can reach the internet to install AlphaFold and other packages.
Launch the EC2 circumstances.
Wait on the EC2 circumstances to end up being all set and use SSH to access the Amazon EC2 terminal.
Optionally, if you have other required software for the new EC2 circumstances, install it now.

Select a p3.2 xlarge instance with one GPU as the instance type. If you do not have sufficient quota on a p3.2 xlarge circumstances, you can increase the Amazon EC2 quota on your AWS account.

On the Amazon EC2 console, select your preferred AWS Region.
Introduce a brand-new EC2 instance with DLAMI by searching Deep Learning AMI. I utilize DLAMI variation 48.0 based on Ubuntu 18.04. This is the latest variation at the time of this writing.

After you use SSH to access the Amazon EC2 terminal, very first update all bundles:

Mount the data volume to the folder/ data. For more details, refer to the Make an Amazon EBS volume readily available for use on Linux

Utilize the lsblk command to see your offered disk gadgets and their mount points (if suitable) to assist you determine the right gadget name to use:

About the Author.
Solutions Architect on the Global Healthcare and Life Science group at AWS. At AWS, he works carefully with life science customers changing drug discovery, scientific trials, and drug commercialization.

Use the following command to run a prediction of a protein sequence from/ data/input/T1050. fasta:.

# Path to a directory site that will keep the results.
output_dir=/ data/output/alphafold.

You can securely shut down your EC2 instance now. At this moment, all the information in the data volume is securely saved in the photo for future use.
Recreate the EC2 circumstances with a photo.
To recreate a brand-new EC2 circumstances with AlphaFold, the very first steps resemble what you did earlier when producing an EC2 circumstances from scratch. Instead of developing the data volume from scratch, you attach a brand-new volume brought back from the Amazon EBS picture.

cd/ data.
mkdir -p/ data/af _ download_data.
mkdir -p/ data/output/alphafold.
mkdir -p/ data/input.
git clone https://github.com/deepmind/alphafold.git.

pip3 install -r/ data/alphafold/docker/ requirements.txt.

sudo apt set up aria2 rsync git vim wget tmux tree -y.

Utilize the following command to run a prediction of a protein series from/ data/input/T1024. fasta:.

Mount the brand-new data volume to the/ data folder:.

Open the Amazon EC2 console and in the AWS Region of your choice, and release a new EC2 instance with DLAMI by searching Deep Learning AMI.
Select the DLAMI based upon Ubuntu 18.04.

Utilize the tail command to keep track of the forecast progress:.
The whole forecast takes a couple of hours to complete. When the prediction is total, you should see the following in the output folder. In this case, << target_name> > is T1050.

Otherwise, you cant mount the volume to the brand-new EC2 instance.
You can safely shut down the EC2 instance and delete the EBS information volume if it didnt erase when the EC2 circumstances was shut down. When you require to use AlphaFold again, you can follow the same procedure to spin up a new EC2 instance and run brand-new predictions in a matter of minutes. Due to the fact that you back up the information on the information volumes to point-in-time pictures, you avoid paying for EC2 instances and EBS volumes when you do not need them. Producing an EBS volume based on the previous photo considerably reduces the time required to recreate the EC2 instance with AlphaFold.

Update all plans on the system and install the reliances. You do need to rebuild the AlphaFold Docker image.

sudo mkfs.xfs/ dev/xvdb.
sudo mkdir/ data.
sudo install/ dev/xvdb/ information.
sudo chown ubuntu: ubuntu -R/ information.
df -h/ data.

sudo mkdir/ data.
sudo install/ dev/xvdf/ information.
sudo chown ubuntu: ubuntu -R/ information.

#### USER CONFIGURATION ####.

source activate python3.
/ tools/GPUCloudWatchMonitor/gpumon. Use AlphaFold for prediction.
Were now ready to run predictions with AlphaFold.

lsblk

sudo file -s/ dev/xvdf.

sudo chown ubuntu: ubuntu/ data/output/alphafold/ -R.

cd/ data/alphafold.
docker develop -f docker/Dockerfile -t alphafold.
docker images.

Tidy up.
You can safely shut down the EC2 instance and erase the EBS information volume if it didnt delete when the EC2 circumstances was shut down. When you require to use AlphaFold again, you can follow the same procedure to spin up a brand-new EC2 circumstances and run brand-new forecasts in a matter of minutes.
Conclusion.
With Amazon EC2 with Nvidia GPU and the Deep Learning AMI, you can install the new AlphaFold implementation from DeepMind and run forecasts over CASP14 samples. You avoid paying for EC2 instances and EBS volumes when you dont need them due to the fact that you back up the information on the data volumes to point-in-time photos. Producing an EBS volume based upon the previous snapshot significantly reduces the time needed to recreate the EC2 instance with AlphaFold. Therefore, you can start running your forecasts in a short amount of time.

Figure out whether there is a file system on the volume. New volumes are raw block devices, and you need to produce a file system on them before you can install and use them. The device is an empty volume.

The whole download procedure could take over 10 hours, so wait for it to finish. You can use the following command to monitor the download and unzip procedure:.

I use a different protein series due to the fact that the photo contains the outcome from T1050 already. If you wish to run the prediction for T1050 once again, first erase or rename the existing T1050 result folder prior to running the new prediction.

cd/ data/alphafold.
docker build -f docker/Dockerfile -t alphafold.
docker images.

cd/ information.
nohup python3/ data/alphafold/docker/ run_docker. py– fasta_paths=/ data/input/T1024. fasta– max_template_date= 2020-05-14 &&.

Select p3.2 xlarge with one GPU as the instance type. You can increase the Amazon EC2 quota on your AWS account if you dont have adequate quota on a p3.2 xlarge instance.

Produce a photo from the information volume.
It requires time to set up AlphaFold on an EC2 instance. However, the P3 circumstances and the EBS volume can end up being pricey if you keep them running all the time. You may desire to have an EC2 instance all set quickly however also dont wish to hang out rebuilding the environment each time you need it. An EBS picture assists you conserve both time and expense.

# Set to target of scripts/download _ all_databases. sh.
DOWNLOAD_DIR=/ data/af _ download_data.

On the Amazon EC2 console, choose Snapshots.
Select the picture you produced earlier or use the general public snapshot provided.
On the Actions menu, choose Create Volume.For this post, we supply public snapshots in Regions us-east-1, us-west-2, and eu-west-1. You can browse public snapshots by photo ID: snap-0d736c6e22d0110d0 in us-east-1, snap-080e5bbdfe190ee7e in us-west-2, snap-08d06a7c7c3295567 in eu-west-1.
Establish the brand-new information volume settings appropriately. Ensure that the Availability Zone is the very same as the recently created EC2 instance. Otherwise, you cant install the volume to the brand-new EC2 instance.
Choose Create Volume to develop the new data volume.

Confirm that the Nvidia container kit is set up:.

You ought to see the brand-new Docker image after the construct is complete.

You can use this very same process to create a couple of more.fasta apply for screening under the/ data/input folder.

Update/ data/alphafold/docker/ run_docker. py to make the setup march the regional course:.

On the Amazon EC2 console, select Volumes in the navigation pane under Elastic Block Store.
Filter by the EC2 circumstances ID.Two volumes need to be listed.
Select the data volume with 3072 GiB in size.
On the Actions menu, pick Create snapshot.The snapshot takes a couple of hours to end up.
When the picture is total, select Snapshots, and your new picture needs to remain in the list.

sudo docker run– rm– gpus all nvidia/cuda:11.0- base nvidia-smi.

Use this protein 3D audience from NIH to see the forecasted 3D structure from your outcome folder.
Select ranked_0. pdb, which contains the forecast with the greatest confidence.The following is a 3D view of the predicted structure for T1050 by AlphaFold.

When the download procedure is complete, you should have the following files in your/ data/af _ download_data folder:.

Usage SSH to access the Amazon EC2 terminal and run lsblk. You should see that the new information volume is unmounted. In this case, it is/ dev/xvdf.

Select Volumes, and you must see the recently produced data volume. Its state must be available.
Select the volume, and on the Actions menu, choose Attach volume.
Pick the newly produced EC2 circumstances and attach the volume.

Set up the correct Amazon VPC setting based upon your AWS environment requirements. Think about using the default Amazon VPC and review Get started with Amazon VPC if this is your very first time configuring your Amazon VPC.
Set the system volume to 200 GiB, but do not include a new information volume.

Go to the CASP14 target list and copy the sequence from the plaintext link for T1050.

You must see similar output to the following screenshot.

<< target_name>>/.
features.pkl.
ranked _. pdb.
ranking_debug. json.
relaxed_model _ 1,2,3,4,5. pdb.
result_model _ 1,2,3,4,5. pkl.
timings.json.
unrelaxed_model _ 1,2,3,4,5. pdb.
msas/.
bfd_uniclust_hits. a3m.
mgnify_hits. sto.
pdb70_hits. hhr.
uniref90_hits. sto.

vim |/ tools/GPUCloudWatchMonitor/gpumon. py.

Release gpumon and begin sending GPU metrics to CloudWatch:.

You should see output like the following screenshot.

Copy the content into a new T1050.fasta file and wait under the/ data/input folder.

Usage pip to install all Python reliances needed by AlphaFold:.

Change the owner of the output folder from root so you can copy them:.

# Name of the AlphaFold Docker image.
docker_image_name=alphafold.

sudo apt update.
sudo apt install aria2 rsync git vim wget tmux tree -y.
pip3 install -r/ data/alphafold/docker/ requirements.txt.

Install the AlphaFold reliances and any other required tools:.

Use scp to copy the output from the prediction output folder to your local folder:.

Download the information using the offered scripts in the background. AlphaFold requires multiple genetic (series) database and model criteria.

Make sure that the security group settings allow you to access the EC2 instance and the EC2 circumstances can reach the web to set up Python and Docker plans.
Release the EC2 instance.
Bear in mind of which Availability Zone the circumstances is in and the instance ID, since you use them in a later step.

sudo docker run– rm– gpus all nvidia/cuda:11.0- base nvidia-smi.

Use the tail command to keep an eye on the forecast development:.
The entire prediction takes a couple of hours to end up.

du -sh/ data/af _ download_data/ *.

Change the Region in gpumon.py if your circumstances is in another Region, and offer a new namespace like AlphaFold as the CloudWatch namespace:.

sudo file -s/ dev/xvdb.

With the folders youve produced, the setups look like the following. Set it appropriately if you set up a various folder structure in your EC2 instance.

Set up CloudWatch keeping an eye on for GPU (Optional).
Optionally, you can set up Amazon CloudWatch keeping an eye on for GPU. This needs an AWS Identity and Access Management (IAM) role.

Produce a file system on the volume and install the volume to the/ data folder:.

Create an Amazon EC2 IAM function for CloudWatch and attach it to the EC2 instance.

Create working folders, and clone the AlphaFold code from the GitHub repo:.

You use the brand-new volume solely, so the snapshot you develop later on has all the necessary information.

Construct the AlphaFold Docker image. Since a.dockerignore file is under that folder, make sure that the regional course is/ data/alphafold.

Verify the NVidia container set is installed:.

vim/ data/alphafold/docker/ run_docker. py.

Identify whether there is a file system on the volume. The data volume produced from the snapshot has an XFS file system on it currently.

$ DOWNLOAD_DIR/ # Total:|2.2 TB (download: 438 GB).
bfd/ #|1.7 TB (download: 271.6 GB).
# 6 files.
mgnify/ #|64 GB (download: 32.9 GB).
mgy_clusters_2018_12. fa.
params/ #|3.5 GB (download: 3.5 GB).
# 5 CASP14 models,.
# 5 pTM designs,.
# LICENSE,.
# = 11 files.
pdb70/ #|56 GB (download: 19.5 GB).
# 9 files.
pdb_mmcif/ #|206 GB (download: 46 GB).
mmcif_files/.
# About 180,000. cif files.
obsolete.dat.
small_bfd/ #|17 GB (download: 9.6 GB).
bfd-first_non_consensus_sequences. fasta.
uniclust30/ #|86 GB (download: 24.9 GB).
uniclust30_2018_08/.
# 13 files.
uniref90/ #|58 GB (download: 29.7 GB).
uniref90.fasta.

scp -i << ec2-key-path>>. pem -r [email protected]:/data/output/alphafold/T1050 |/ Downloads/.

Leave a Reply

Your email address will not be published.