In the era of Big Data, businesses are faced with a deluge of time series data. Amazon Forecast Deep Learning algorithms such as DeepAR+ and CNN-QR build representations that effectively record typical patterns and patterns across these many time series.
In some cases, it might be possible to additional improve Amazon Forecast accuracy by training the designs with likewise acting subsets of the time series dataset. In contrast, forecasts for items such as light bulbs, which have periodic need, can be challenging to get. By preprocessing the time series dataset to separate these products into different groups, we offer the Amazon Forecast models the capability to find out more powerful patterns for the various demand patterns.
In this blog site, well discover about one such preprocessing strategy called clustering and how it can help you improve your forecasts.
Clustering summary
Clustering is a not being watched Machine Learning strategy that groups items based upon some procedure of resemblance, normally a distance metric. Clustering algorithms seek to split items into groups such that a lot of items within the group are close to each other while being well separated from those in other groups. In the context of time series clustering, Dynamic Time Warping (DTW) is a frequently utilized range metric that measures similarity between 2 sequences based on optimal matching on a non-linear warping path along the time dimension.
This post shows you how to preprocess your Target Time Series (TTS) data utilizing K-means algorithm with DTW as the range metric to produce clusters of uniform time series information to train your Amazon Forecast models with. This is an experimental method and your mileage might differ depending upon the structure of your information and expectations around the clustering configuration that finest records its irregularity. In this presentation, we set num_clusters= 3 with the intent of finding subsets in the time series dataset that correspond to collections of time series that are “smooth”( a really routine circulation of target value over time), “intermittent” (high variations in frequency with reasonably consistent target worths), and ” irregular” (broad variations in both target worths and frequency).
The post is accompanied by two demo note pads, the very first is optional associating with cleaning of the open source UCI Online Retail II dataset; and the main note pad associating with time series clustering. The dataset makes up of time series information connected to service to organization online sales of gift-ware in UK over a two-year period. We utilize the tslearn.clustering module of Python tslearn bundle for clustering of this time series data using DTW Barycenter Averaging (DBA) K-means.
In the following sections, we will dive into the experiment setup and walk through the accompanying notebooks offered in the GitHub Clustering Preprocessing notebook repository. The experiment is setup utilizing Jupyter based Python notebooks. Beginner level efficiency in Python and Jupyter/IPython is anticipated.
( Optional) Data preparation
The optional notebook on Data Cleaning and Preparation processes the UCI Online Retail II dataset and carries out a series of data cleaning operations to the columns appropriate to this experiment (viz StockCode, InvoiceDate and Quantity) representing ( item_id, timestamp, target_value) schema in Amazon Forecast. We roll up the information to day-to-day frequency and resample the time series to fill missing values with absolutely nos. At this stage, we likewise drop a couple of time series with uncommon stock code data and shift data to match format needed for clustering
DBA K-means clustering.
Let us begin the discussion on time series clustering with a fast intro to DTW distances. The DTW algorithm discovers a distance between 2 time series by finding a non-linear, “distorted” course along the time measurement that lessens the expense of matching a set of time points on the two series.
For presentation, the following plots are created for the sets of time series in our dataset. The plot on left presents the DTW course between the first and 5th time series, and the one on the right, between the sixth and tenth time series:
As seen here, matches between the set of time series are aligned along a warped temporal path (for e.g. time 0th circumstances of the top-left time series is matched with several time points of the bottom left time series, and not strictly t0 to t0, t1 to t1 … for the various sets of time series).
This optimal path is found by DTW algorithm by decreasing an appropriate metric under a variety of restrictions. The following plot demonstrates how DTW algorithm selects the optimum path (represented in white) based upon the cost matrix of matching the 2 sets of time series from the preceding example. We see for the left plot that t0 of time series along Y axis gets matched with a lot of times points of time series along axis X, a precise replica of the warped course from the preceding plots:
In the Time Series Clustering note pad, we will train a K-means Clustering algorithm based on DTW range with Barycenter Averaging. First, we convert the dataframe to tslearn time_series_dataset object and normalize the time series to zero mean and unit variance. We then train the TimeSeriesKMeans method from tslearn.clustering module applying the follow clustering setup:
As seen here, the green line, representing the clustered data projection appears to track the actual target values from the hold out set (in blue) somewhat much better than the baseline (in orange) model trained with all data together.
As danger of over-fitting exists with extremely high cluster counts, we suggest setting num_clusters= 3 as we have discovered this setting to work well in try outs real life datasets, which typically have a mix of regular, otherwise lumpy and periodic need. If you have actually observed a lot more variety to exist in your dataset, you might pick a more appropriate value for this specification based upon your experience. Clustering strategies are not encouraged for datasets with fewer than a thousand time series because splitting the dataset could have restricting effect on Deep Learning models that flourish on big volumes of information.
Conclusion.
Clustering can be a valuable addition to your target time series information preprocessing pipeline. If you have product metadata (IM) and related time series (RTS) data, you can also consist of those in Amazon Forecast training as formerly.
Recommendations.
n_jobs= -1 ensures that the training uses all offered cores on your machine. On a maker with 12 cores (CPU), training the clustering design took approximately half an hour of wall time. Your mileage might vary depending on your makers configuration. Other option to the distance metric is “softdtw” which might at times produce improved separation, at a higher calculate cost.
Following figure offer a visual representation of the composition of different clusters found by DBA K-means algorithm and corresponding counts of constituent time series (which amount to the total count of time series in the un-clustered dataset). This representation also offers a great sense of the consistency of the different clusters as the element time series signals are outlined all together, in an overlapping fashion:.
n_clusters= 3,.
metric=” dtw”,
n_jobs= -1.
You may also discover that the data range is bounded. This is because of the preceding stabilizing action that enables contrasts in an amplitude invariant way. On closer evaluation, we discover that individual cluster structure is homogeneous, and the distribution of time series by clusters is well balanced (approximately in the proportion 4:5:2).
With the clusters determined, we now divide the TTS into subsets based upon the labels for the different time series in the dataset. We likewise reformat the information from columnar time to indexed time to match schema consistent with the Amazon Forecast service APIs. These TTS files are now ready to be utilized to train Amazon Forecast designs.
Results.
In the following plots, we present a comparison between the standard (all data taken together), and clustered approach (num_clusters= 3) where we divided the dataset and train AutoML designs for each cluster separately. The following plots are for a few representative samples and compare p50 forecasts acquired from these models for one week worth of hold out set information that the designs have not seen formerly:.
About the Authors.
Vivek M Agrawal is a Data Scientist on the Amazon Forecast group. Over the years, he has actually applied his enthusiasm for Machine Learning to assist clients build cutting edge AI/ML solutions in the Cloud. In his totally free time, he enjoys checking out various cuisines, documentaries, and playing board games.
In the context of time series clustering, Dynamic Time Warping (DTW) is a frequently utilized distance metric that determines resemblance between 2 series based on optimal matching on a non-linear warping course along the time dimension.
At this stage, we also drop a couple of time series with unusual stock code information and transpose data to match format required for clustering
We see for the left plot that t0 of time series along Y axis gets matched with many time points of time series along axis X, an exact reproduction of the distorted course from the preceding plots:
We likewise reformat the information from columnar time to indexed time to match schema consistent with the Amazon Forecast service APIs. If you have product metadata (IM) and related time series (RTS) information, you can likewise consist of those in Amazon Forecast training as previously.