Bring Your Amazon SageMaker model into Amazon Redshift for remote inference

Get the SageMaker design endpoint
On the Amazon SageMaker console, under Inference in the navigation pane, choose Endpoints to discover your model name. You utilize this when you develop the remote inference design in Amazon Redshift.

Select bring-your-own-model-remote-inference. ipynb.

Release the model
To release the design, go to the SageMaker console and open the note pad that was produced by the CloudFormation template.

Establish parameters as revealed in the following screenshot and then run all cells.

Amazon Redshift, a fast, totally handled, extensively used cloud data storage facility, natively incorporates with Amazon SageMaker for artificial intelligence (ML). 10s of countless clients utilize Amazon Redshift to process exabytes of information every day to power their analytics workloads. Data analysts and database designers wish to utilize this data to train ML designs, which can then be utilized to produce insights for use cases such as forecasting income, anticipating client churn, and detecting anomalies.
Amazon Redshift ML makes it simple for SQL users to create, train, and deploy ML designs utilizing familiar SQL commands. In a previous post, we covered how Amazon Redshift ML permits you to use your information in Amazon Redshift with SageMaker, a totally managed ML service, without needing you to end up being a professional in ML. We likewise discussed how Amazon Redshift ML allows ML experts to create XGBoost or MLP designs in an earlier post. Furthermore, Amazon Redshift ML enables information researchers to either import existing SageMaker designs into Amazon Redshift for in-database reasoning or from another location conjure up a SageMaker endpoint.
This post demonstrates how you can allow your data storage facility users to utilize SQL to invoke a remote SageMaker endpoint for prediction. We initially train and deploy a Random Cut Forest design in SageMaker, and show how you can develop a design with SQL to conjure up that SageMaker predictions from another location. Then, we demonstrate how end users can invoke the model.
Prerequisites
To start, we need an Amazon Redshift cluster with the Amazon Redshift ML feature made it possible for. For an intro to Amazon Redshift ML and instructions on setting it up, see Create, train, and release artificial intelligence models in Amazon Redshift using SQL with Amazon Redshift ML.
You also have to make certain that the SageMaker model is released and you have the endpoint. You can use the following AWS CloudFormation design template to provision all the required resources in your AWS accounts instantly.
Service introduction
Amazon Redshift ML supports text and CSV inference formats. For more details about different SageMaker algorithms and their reasoning formats, see Random Cut Forest (RCF) Algorithm.
Amazon SageMaker Random Cut Forest (RCF) is an algorithm created to find anomalous data points within a dataset. Examples of anomalies that are necessary to detect include when site activity uncharacteristically spikes, when temperature level information diverges from a regular habits, or when modifications to public transit ridership reflect the event of an unique occasion.
In this post, we use the SageMaker RCF algorithm to train an RCF model using the Notebook produced by the CloudFormation design template on the Numenta Anomaly Benchmark (NAB) NYC Taxi dataset.
We downloaded the data and stored it in an Amazon Simple Storage Service (Amazon S3) bucket. The data includes the variety of New York City taxi travelers over the course of 6 months aggregated into 30-minute buckets. We naturally anticipate to discover anomalous occasions occurring during the NYC marathon, Thanksgiving, Christmas, New Years Day, and on the day of a snowstorm.
We then use this design to predict anomalous events by creating an anomaly rating for each information point.
The following figure illustrates how we use Amazon Redshift ML to create a model using the SageMaker endpoint.

Prepare data to develop a remote inference design utilizing Amazon Redshift ML
Produce the schema and load the information in Amazon Redshift utilizing the following SQL:

About the Authors.
Phil Bates is a Senior Analytics Specialist Solutions Architect at AWS with over 25 years of information storage facility experience.
Debu Panda, a primary product manager at AWS, is an industry leader in analytics, application platform, and database technologies and has more than 25 years of experience in the IT world.
Nikos Koulouris is a Software Development Engineer at AWS. He received his PhD from University of California, San Diego and he has been working in the locations of databases and analytics.
Murali Narayanaswamy is a primary machine discovering scientist in AWS. He received his PhD from Carnegie Mellon University and works at the crossway of ML, AI, optimization, discovering and reasoning to combat uncertainty in real-world applications including personalization, forecasting, supply chains and big scale systems.

Amazon Redshift now supports connecting the default IAM role. You can use the default IAM role as follows if you have made it possible for the default IAM function in your cluster.

with score_cutoff as.
( choose stddev( public.remote _ fn_rcf( nbr_passengers)) as sexually transmitted disease, avg( public.remote _ fn_rcf( nbr_passengers)) as mean, (mean + 3 * sexually transmitted disease) as score_cutoff_value.
from public.rcf _ taxi_data).

DROP TABLE IF EXISTS public.rcf _ taxi_data CASCADE;
DEVELOP TABLE public.rcf _ taxi_data.
(.
ride_timestamp timestamp,.
nbr_passengers int.
);.
COPY public.rcf _ taxi_data.
FROMs 3:// sagemaker-sample-files/datasets/tabular/ anomaly_benchmark_taxi/ NAB_nyc_taxi. csv.
iam_role arn: aws: iam:::<< accountid>>: role/RedshiftML ignoreheader 1 csv delimiter ,;.

Create a design.
Create a design in Amazon Redshift ML utilizing the SageMaker endpoint you previously caught:.

choose ride_timestamp, nbr_passengers, public.remote _ fn_rcf( nbr_passengers) as rating.
from public.rcf _ taxi_data;.

Amazon Redshift, a fast, completely managed, widely utilized cloud information storage facility, natively integrates with Amazon SageMaker for device learning (ML). 10s of thousands of customers utilize Amazon Redshift to process exabytes of information every day to power their analytics workloads. Information analysts and database developers want to utilize this information to train ML designs, which can then be used to create insights for usage cases such as forecasting income, anticipating customer churn, and discovering anomalies.
Amazon Redshift ML makes it simple for SQL users to create, train, and release ML models utilizing familiar SQL commands. In a previous post, we covered how Amazon Redshift ML enables you to utilize your data in Amazon Redshift with SageMaker, a totally managed ML service, without needing you to become an expert in ML.

You get output like the following screenshot, which shows the endpoint and function name.

Now that we have our anomaly ratings, we require to check for higher-than-normal abnormalities.
Amazon Redshift ML has batching optimizations to decrease the communication cost with SageMaker and provides high-performance remote reasoning.
Look for high abnormalities.
The following code runs a query for any data points with ratings higher than 3 basic deviations (roughly 99.9 th percentile) from the mean rating:.

Check design status.
You can use the program design command to see the status of the design:.

You can use the Amazon Redshift question editor v2 to run these commands.

program design public.remote _ random_cut_forest.

choose ride_timestamp, nbr_passengers, public.remote _ fn_rcf( nbr_passengers) as score.
from public.rcf _ taxi_data.
where score > > (choose score_cutoff_value from score_cutoff).
order by 2 desc;.

Calculate anomaly ratings throughout the entire taxi dataset.
Now, run the reasoning query utilizing the function name from the produce model declaration:.

You can likewise use the default IAM role with your CREATE MODEL command as follows:.

The following screenshot shows our outcomes.

COPY public.rcf _ taxi_data.
FROMs 3:// sagemaker-sample-files/datasets/tabular/ anomaly_benchmark_taxi/ NAB_nyc_taxi. csv.
iam_role default ignoreheader 1 csv delimiter ,;.

CREATE MODEL public.remote _ random_cut_forest.
FUNCTION remote_fn_rcf( int).
RETURNS decimal( 10,6).
SAGEMAKER randomcutforest-xxxxxxxxx.
IAM_ROLE arn: aws: iam::<< accountid>>: role/RedshiftML;.

DEVELOP MODEL public.remote _ random_cut_forest.
FUNCTION remote_fn_rcf( int).
RETURNS decimal( 10,6).
SAGEMAKER randomcutforest-xxxxxxxxx.
IAM_ROLE default;.

Conclusion.
In this post, we used SageMaker Random Cut Forest to detect anomalous information points in a taxi ridership dataset. In this information, the anomalies occurred when ridership was low or uncharacteristically high. Nevertheless, the RCF algorithm is also efficient in spotting when, for example, data breaks periodicity or uncharacteristically modifications international behavior.
We then used Amazon Redshift ML to demonstrate how you can make reasonings on not being watched algorithms (such as Random Cut Forest). This permits you to equalize ML by making forecasts with Amazon Redshift SQL commands.
For additional information about developing different models with Amazon Redshift ML see the Amazon Redshift ML documents.

The data in the following screenshot shows that the most significant spike in ridership occurs on November 2, 2014, which was the annual NYC marathon. We also see spikes on Labor Day weekend, New Years Day and the July 4th vacation weekend.

Leave a Reply

Your email address will not be published.