AWS Machine Learning and Automation
“With 1Strategy’s guidance, we were able to take a big step forward in developing and operationalizing ML models within the AWS environment. This project yielded a roughly 50% improvement to the success metric when compared to the current process,”
-John Borgo, Vice President of Analytics, Angie’s List
About ANGI Homeservices
Through its brands including Angie’s List and HomeAdvisor, ANGI Homeservices is creating the world’s largest digital marketplace for home services, connecting millions of homeowners across the globe with home service professionals. Headquartered in Denver, Colorado, ANGI Homeservices connects homeowners with trusted home service professionals across the United States, Canada, and Europe.
The Angie’s List digital marketplace platform enables service providers (such as plumbers, roofers, carpenters, etc.) from all over the United States to publicize their services to homeowners in their locale by creating a membership on the platform. Each new member joins at a free tier and can pay more to unlock premium advantages such as advertising.
Angie’s List was seeking to improve its ability to identify which service providers are likely to upgrade their account tier so that their sales force can focus on these customers and drive revenue. Prior to working with 1Strategy, the primary production model for identifying service providers to target was a static heuristic based on deep domain knowledge. Angie’s List had also developed machine learning (ML) models based on the Random Forest and XGBoost algorithms which were competitive with this benchmark heuristic.
One of the big challenges for ML in this environment is class imbalance, where most of the customers fall into the free subscriber class. Because there are so many of them, falsely classifying even a small percentage of free subscribers to the subscriber class produces a large number of “false positives,” making it difficult to pick out the customers who are truly most likely to opt for premium benefits.
Another challenge is personnel opportunity cost: both the heuristic and ML processes require several hours of manually intensive operations to put them into production on biweekly basis. Angie’s List engaged 1Strategy in order to improve their yield and to automate their process by implementing it on AWS.
Why Amazon Web Services
Angie’s List recognized the strategic advantages of an enterprise-level AWS approach and was already in the process of shifting resources to AWS. However, this project helped Angie’s List to move beyond merely exploring AWS capabilities and towards putting AWS tools to use and expediting the migration processes. Angie’s List sought to leverage 1Strategy’s understanding and proven capacity to accelerate operationalizing their machine learning models on AWS. John Borgo, Angie’s List Vice President of Analytics said, “One of our tenets is to be a data-driven company, and a key component of that is to automate insights into business processes. AWS offers a comprehensive toolset that enables this, and 1Strategy helped to steepen our learning curve.”
Angie’s List process for implementing their ML model involved manually exporting historical data to an R programming environment for training and batch prediction, then manually moving these predictions back to production for use on a subsample of the data for live testing.
1Strategy assisted Angie’s List data science team to redeploy their model in Python and move it to Amazon SageMaker’s Jupyter environment where the team could collaborate on further model development. Angie’s List designed an architecture that would automate the process of putting disparate data sources into automated model creation and implementation (diagram below). A third-party service uses the AWS Software Development Kit (SDK) to collect data from various sources, run extraction, transformation, and loading (ETL) on it and lands the data in Amazon Simple Storage Service (S3). S3 was chosen to host both the ML model training data and the artifacts resulting from model training due to its reliability and elasticity for object storage. An AWS Step function triggers an AWS Lambda function for the model training update procedure when new processed data arrives. Then Amazon SageMaker creates a new model and uses it to make predictions using SageMaker Batch Transforms. Amazon Simple Queue Service (SQS) provides notification and data quality metrics to determine the validity of the new model predictions prior to putting it into production. After passing data quality, the execution results are sent to an SQS queue which is then ingested by the orchestration layer for use by the Angie’s List sales force.
The SageMaker Jupyter environment provided flexibility for exploring competing models using the scikitlearn library and proved to be an ideal environment for Angie’s List integrated ML needs. When the team explored the data using the pandas and matplotlib libraries in a SageMaker notebook, they discovered a critical step in the data pre-processing and training/validation data set selection that improved algorithm performance from 73% to 81% using the Area Under the Curve metric, as shown in the diagram below. Area Under the Curve quantifies the probability that the algorithm correctly ranks customers in terms of probability of upgrading. The resulting algorithm gets this ranking correct in 81% of cases. In addition, this discovery delivered a more stable model in terms of which input features were most important to the prediction, allowing the data science team to have more confidence in its predictions.
Figure 1 – (left) Original algorithm performance Area Under the Curve Chart
(right) Improved algorithm perfomance Area Under the Curve chart
After leveraging the SageMaker Jupyter environment for model formation, Angie’s List chose to implement SageMaker’s built-in XGboost algorithm as its primary model for its convenience and maintainability. SageMaker’s Hyperparameter Optimization (HPO) process allows an automated and efficient process for optimal model selection from the most recent training data. The team was able to replace a complicated HPO implementation with a few lines of code, and the Bayesian optimization approach reduced HPO computation time by approximately 50%. Once a model is created, SageMaker’s integrated environment allowed the data scientists to store the model artifacts for later use with either on-demand inference endpoint or with batch inferences.
The project helped Angie’s List develop AWS expertise on their team, especially in SageMaker’s suite of machine learning services, and paved the way for future ML implementations. At the end of the project, the ML algorithms deployed by the team were outperforming the original heuristic in production while also having successfully proven the automated model retraining path once the data has been received from data warehouse. “With 1Strategy’s guidance, we were able to take a big step forward in developing and operationalizing ML models within the AWS environment. This project yielded a roughly 50% improvement to the success metric when compared to the current process,” said Borgo.
1Strategy is an Amazon Partner Network (APN) Premier Consulting Partner, focusing exclusively on Amazon Web Services (AWS). 1Strategy helps businesses architect, migrate, and optimize their workloads on AWS, creating scalable, cost-effective, secure, and reliable solutions. 1Strategy also helps customers get real value from their data using comprehensive machine learning models and artificial intelligence. 1Strategy holds the AWS DevOps, Migration, Data & Analytics, and Machine Learning Competencies and is a partner of the AWS Public Sector Program. 1Strategy was one of the initial ten AWS Partners globally who was qualified and authorized by AWS to conduct a Well-Architected Review and is among the top Well Architected partners in the AWS eco-system. With experts having deployed AWS solutions since 2007, 1Strategy is a leader in custom training—providing customers with the knowledge, tools, and best practices to manage those solutions over time. 1Strategy is a TEKsystems Global Services company with teams in Seattle and Salt Lake City, supporting customers throughout the US and across every vertical.