How accelerates property-based ML model delivery with Amazon SageMaker

This post was created in collaboration with Mohammed Alauddin, Data Engineering and Data Science Regional Manager, and Kamal Hossain, Lead Data Scientist at, now part of PropertyGuru Group. is the market-leading home website in Malaysia and is now part of the PropertyGuru Group. provides a search experience that makes it possible for property candidates to go through thousands of residential or commercial property listings readily available in the market. The search function currently serves its purpose in narrowing down possible residential or commercial properties for consumers, continues non-stop to look for new ways to improve the customer search experience.
The significant driving force of reinvention for customers within is anchored on data and artificial intelligence (ML), with ML models being trained, retrained, and deployed for their customers practically every day. These innovations include home viewing and location-based suggestions, which show a set of listings based upon the search behavior and user profiles.
However, with more ML workloads deployed, difficulties related to scale began to surface area. In this post, we talk about those difficulties and how the Data Science group automated their workflows utilizing Amazon SageMaker.
Challenges running ML projects at scale
When the Data Science team began out their ML journey, the teams main focus was rolling and recognizing out ML functions that would benefit their customers. In, experimenting and confirming newly specified hypotheses rapidly is a typical practice. Nevertheless, as their ML footprint grew, the groups focus gradually shifted from finding brand-new experiences to undifferentiated heavy lifting. The following are some of the challenges they came across:

Because of these difficulties, the team concluded they required to reconsider their process to construct, train, and deploy designs. They likewise determined the requirement to reevaluate their tooling to enhance functional efficiency and manage their expense efficiently.
Automating ML delivery with SageMaker
After much research, the group concluded that Amazon SageMaker was the most extensive ML platform that addressed their challenges. With SageMaker, data scientists and designers can rapidly and quickly train and construct ML models, and after that straight release them into a production-ready hosted environment. It offers self-service access to incorporated Jupyter note pads for simple access to the information sources for exploration and analysis, without the requirement to manage servers. With prebuilt and native assistance for numerous ML structures, such as PyTorch, TensorFlow, and MXNet, SageMaker uses flexible dispersed training alternatives that adjust to any particular workflows. The training and hosting are billed by the second, with no minimum fees and no in advance dedications. SageMaker likewise uses other attractive cost-optimization functions such as handled spot training, which can decrease expense up to 90%, SageMaker Savings Plans, and multi-model endpoints that allow a single host to serve multiple designs.
The last piece that covered everything together was the combination of SageMaker with constant combination and continuous shipment (CI/CD) tooling.
To automate their ML shipment, the group upgraded their ML workflows with SageMaker as the hidden service for design development, training, and hosting, paired with CI/CD tooling to automate the steps needed to release new ML application updates. In the following areas, we talk about the revamped workflows.
Data preparation workflow
With the intro of SageMaker, the SageMaker notebook offered self-service environments with access to preprocessed data, which enabled data scientists to move faster with the CPU or GPU resources they needed.
The group counted on the service for data preparation and curation. It provided a combined, web-based visual user interface providing complete gain access to, control, and exposure into each action required to construct, train, and release designs, without the requirement to set up compute circumstances and file storage.
The team also utilized Apache Airflow as the workflow engine to schedule and run their complex data pipelines. They use Apache Airflow to automate the preliminary data preprocessing workflow that supplies access to curated data.
The following diagram illustrates the updated workflow.

Lack of automation and self-service capabilities– ML projects included numerous various groups, such as data engineering, information science, platform engineering, and product groups. As more projects sneaked in, the wait time in between teams increased, impacting the time the function was provided to market.

Functional overhead– Over time, they realized they had a range of tools and frameworks to keep, such as scikit-learn, TensorFlow, and PyTorch. Different ML structures were utilized for varying usage cases. The team resorted to handling these structure updates through multiple self-managed container images, which was extremely lengthy. To stay up to date with the most recent updates for each of these ML structures, regular updates to the container images needed to be made. This led to higher levels of maintenance, taking the teams focus far from constructing new experiences for their consumers.

High cost– ML is an iterative process that requires retraining to keep designs pertinent. Depending on the use cases and the volume of information, training can be costly because it requires the use of effective virtual makers. The other issue faced was every ML model released had its own reasoning instance, which indicated that as more ML designs were deployed, the expense increased linearly.

The data preparation workflow has the following actions:

The Data Science group examines sample data (on their laptop) from the data lake and constructs extract, change, and load (ETL) scripts to prepare the information for downstream expedition. These scripts are uploaded to Apache Airflow.
Several datasets drawn out from data lake go through multiple steps of data change (including signs up with, filtering, and enrichment). The initial data preprocessing workflow is orchestrated and run by Apache Airflow.
The preprocessed data, in the type of Parquet, is stored and provided in an Amazon Simple Storage Service (Amazon S3) engagement data container.
On the SageMaker notebook instance, the Data Science team downloads the data from the engagement data S3 bucket into Amazon Elastic File System (Amazon EFS) to perform regional expedition and screening.
More information exploration and data preprocessing activities happen to transform the engagement data into functions that better represent the underlying issues to the predictive designs.
The curated data is saved in the curated data S3 pail.
After the information is prepared, the team carries out regional ML training and reasoning screening on the SageMaker notebook circumstances. A subset of the curated data is utilized throughout this stage.
Actions 5, 6, and 7 are duplicated iteratively up until acceptable results are attained.

ML design training and implementation workflow
The ML design training and implementation workflow relied on the teams personal Git repository to activate the workflow implemented on the CI/CD pipeline.
The approach carried out was that Git served as the one and just source of fact for setup settings and source code. This suggested that both the facilities and the application are now versioned through code and can be investigated utilizing standard software advancement and delivery approach.
The following diagram shows this workflow.

The workflow has the following actions:

With the information curated from the data preparation workflow, local training and reasoning screening is performed iteratively on the SageMaker note pad instance.
When the desired outcomes are attained, the Data Science team commits the setup settings into Git. The configuration consists of the following:

The Git commit triggers the CI/CD pipeline. The CI/CD pipeline fires up a Python Boto3 script to arrangement the SageMaker infrastructure.
In the advancement AWS account, a new SageMaker training task is provisioned with the dedicated configuration settings with Spot Instances. The dataset from the curated informations S3 container is downloaded into the training cluster, with training beginning immediately.
After the ML training task is complete, a model artifact is created and stored in Amazon S3. Every epoch, examination metric, and log from the training task is saved in Amazon CloudWatch Logs.
When a design artifact is kept in Amazon S3, it sets off an occasion that invokes an AWS Lambda function to develop a Slack alert that the training job is total. The alert consists of a link to the training tasks CloudWatch Logs for review.
If the Data Science team is pleased with the examination report, the group unclogs the pipeline through an approval function in the CI/CD pipeline and kicks off a Python Boto3 script to release the ML model onto the SageMaker hosting infrastructure for more reasoning screening.
After recognition, the group raises a Git pull request to have ML engineers carry out the final review. The ML engineers might run more tests against the development environments SageMaker reasoning endpoint to verify the results.
The ML engineer combines the pull demand that activates the CI/CD pipeline to release the brand-new design into the data production environment if everything works as expected. The CI/CD pipeline runs a Python script to deploy the design on the SageMaker multi-model endpoint. If there are problems with the reasoning results, the pull demand is declined with feedback provided.
The SageMaker hosting infrastructure is provisioned and the CI/CD workflow runs a health check script versus the SageMaker reasoning endpoint to verify the reasoning endpoints health.

Data source location
Cluster instance type and size

ML model serving and API layer workflow
For any ML usage case, before any ML models are served to consumers, suitable business logic need to be applied to it. The service logic covers the ML inferenced output (from SageMaker) with numerous computations and calculation to fulfill the usage case requirements.
Lambda permits you to run code without provisioning or handling servers, with scaling and schedule handled by the service. You pay only for the calculate time you take in, and there is no charge when the code isnt running.
To handle the serverless application advancement, utilizes the Serverless Framework (SLS) to establish and preserve their organization logic on Lambda. The CI/CD pipeline releases brand-new updates to Lambda.
The Lambda functions are exposed to customers through GraphQL APIs constructed on Amazon Elastic Kubernetes Service (Amazon EKS) with AWS Fargate.
The following diagram shows this workflow.

SageMaker prebuilt container image to pick the ML framework such as PyTorch, TensorFlow, or scikit-learn
Pricing design to choose either Spot or On-Demand Instances.

The workflow consists of the following steps:

About the Authors.
Mohammad Alauddin is the Engineering Manager for Data at PropertyGuru Group. Over the last 15 years, hes contributed to data analytics, data engineering, and device knowing tasks in the Telco, Airline, and PropTech Digital Industry.
Md Kamal Hossain is the Lead Data Scientist at PropertyGuru Group. He leads the Data Science Centre of Excellence (DS CoE) for ideation, design and productionizing end-to-end AI/ML solutions utilizing cloud services.
Fabian Tan is a Principal Solutions Architect at Amazon Web Services. He has a strong passion for software advancement, databases, information analytics and artificial intelligence. He works carefully with the Malaysian developer community to assist them bring their concepts to life.

Absence of automation and self-service abilities– ML tasks included numerous various teams, such as data engineering, information science, platform engineering, and item teams. With SageMaker, data scientists and developers can rapidly and easily train and build ML designs, and then directly release them into a production-ready hosted environment. If whatever works as anticipated, the ML engineer combines the pull demand that activates the CI/CD pipeline to release the new design into the data production environment.” By implementing our information science workflows throughout SageMaker and our existing CI/CD tools, the automation and decrease in functional overhead enabled us to focus on ML design enhancement activities, accelerating our ML designs time to market quicker by 60%,” states Mohammad Alauddin, Head of Data Science and Engineering. Over the last 15 years, hes contributed to information analytics, information engineering, and machine knowing projects in the Telco, Airline, and PropTech Digital Industry.

Continuing from the ML training and implementation workflow, multiple ML designs may be released on the SageMaker hosting infrastructure. The ML designs are overlayed with pertinent business logic (implemented on Lambda) before being served to customers.
If there are any updates to the company logic, the data scientist updates the source code on the Serverless Framework and commits it to the Git repository.
The Git devote triggers the CI/CD pipeline to change the Lambda function with the most recent updates. This activity operates on the development account and is validated before being duplicated on the production account.
Numerous Lambda functions are deployed with associated company reasonings that query the SageMaker inference endpoints.

For each API demand made to the API layer, the GraphQL API processes the request and forwards the demand to the corresponding Lambda function. An invoked function may query several SageMaker inference endpoints, and processes business logic prior to providing a response to the requestor.
To evaluate the efficiency of the ML designs deployed, a dashboard that tracks the metric (such as clickthrough rate or open rate) for every ML model is created to imagine the efficiency of the ML models in production. These metrics acted as our assisting light on how continues to repeat and enhance the ML models.
Service results
The team observed valuable results from the improved workflows.
” By executing our information science workflows across SageMaker and our existing CI/CD tools, the automation and decrease in operational overhead enabled us to focus on ML design enhancement activities, accelerating our ML models time to market quicker by 60%,” states Mohammad Alauddin, Head of Data Science and Engineering. “Not only that, with SageMaker Spot Instances, enabled with a basic switch, we were likewise able to lower our information science infrastructure cost by 75%. By improving our ML models time to market, the ability to collect our customers feedback was also accelerated, allowing us to tweak and enhance our listing recommendations clickthrough rate by 250%.”.
Summary and next actions.
The team was deeply motivated with the company results, there is still plenty of space to improve their consumers experience. They have strategies to even more improve the ML model serving workflow, including A/B testing and design monitoring features.
To further minimize undifferentiated work, the team is also checking out SageMaker jobs to streamline management and maintenance of their ML workflows, and SageMaker Pipelines to automate actions such as data filling, information change, training and tuning, and release at scale.
About PropertyGuru Group & &
. is headquartered in Kuala Lumpur, Malaysia and uses over 200 employees. is the market leading residential or commercial property portal, offering a search experience in both English and Bahasa Malaysia. also provides customer services such as LoanCare — a home mortgage eligibility indication, News & & Lifestyle channel — content to boost consumers home journey, occasions — to connect home applicants with designers and representatives offline, and much more. The business becomes part of PropertyGuru Group, Southeast Asias leading residential or commercial property innovation company1.
PropertyGuru is Southeast Asias leading property innovation business. Established in 2007, PropertyGuru has grown to become Southeast Asias # 1 digital residential or commercial property market with leading positions in Singapore, Vietnam, Malaysia and Thailand. The Company presently hosts more than 2.8 million monthly realty listings and serves over 50 million month-to-month property applicants and over 50,000 active home agents throughout the five largest economies in Southeast Asia– Indonesia, Malaysia, Singapore, Thailand and Vietnam.
1 In terms of relative engagement market share based upon SimilarWeb information.
The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or precision of this post.

Leave a Reply

Your email address will not be published.