December 19th, 2017
AWS SageMaker
By Aaron Caldiero

AWS SageMaker is a full-featured machine-learning framework announced at this year’s re:Invent.

 

This year at re:Invent, AWS announced many new offerings related to Machine Learning (ML) and Artificial Intelligence (AI). ML and AI seem to be a very big priority for AWS, and they have made that known with their focus on growing current services and creating new services in the ML/AI space. One of the highlights and standout services that they announced this year is Amazon SageMaker.

From the SageMaker product page AWS offers the brief description: “Amazon SageMaker is a fully-managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale.”

“Machine learning and AI is a horizontal enabling layer. It will empower and improve every business, every government organization, every philanthropy—basically there’s no institution in the world that cannot be improved with machine learning,” Jeff Bezos, Amazon’s Artificial Intelligence and Machine Learning Strategy.

In order to make Bezos’ vision of machine learning in every business a reality, it needs to be made accessible to application developers and made to easily integrate with any business application. AWS SageMaker is the strongest effort yet to fulfill the promise of ML that’s easy to use and easy to integrate with. SageMaker is not just for app developers and integration; it has enough features to satisfy even the most demanding Data Scientists as well.

Previous to SageMaker, AWS offered Amazon Machine Learning (AML). AML was a minimum viable product. It was AWS’s first attempt to make machine learning accessible to developers, integratable with applications, and scalable. AML minimally met those lofty goals, and as such saw minimal use.

On the other hand, SageMaker is a full-service offering. It is in a whole different ballpark from AML. With AML, you were stuck using three very basic algorithms with few options for configuring and customizing those algorithms. AML was really only suited for very simple regressions and classifications that could just as easily be done in Excel.

SageMaker is full featured enough to be a Data Scientist’s or Developer’s complete suite of tools for development and deployment of machine learning models. The first thing that SageMaker offers is three different ways to build machine learning models. ML models can be built in TensorFlow or MXNet, they can be built using Spark MLlib, and they can be custom built in a Docker container. That last option is there for even the most academic of Data Scientists to deploy any custom ML model that they can come up with, and still have scalability and integration with other apps.

Another reason that SageMaker is a complete suite is that it has an integrated Jupyter notebook for developing and documenting ML models. Jupyter notebooks are fantastic for keeping revisions as well as for reproducibility of ML modeling results. They have become the de facto standard for Data Science development. To have the Jupyter notebooks integrated with the model deployment service is indeed a great feature.

With SageMaker, AWS not only created a service that can be used as a complete ML suite, but they were forward thinking enough to add features that you may not even know you need. SageMaker has a feature that allows you to build multiple custom ML models, deploy those models, and then do A/B testing of those models to determine the best performing of all your models. This is truly a fantastic feature that can make a very time-consuming task much more efficient, and is typically an afterthought for most ML developers.

Typically, Data Science teams need an entire separate team to build and maintain their infrastructure. With SageMaker’s being serverless and an AWS managed service, there is no longer a need for a separate team to build and maintain the infrastructure and tools that a Data Science team requires.

There is one item to keep in mind: in the introduction session of re:Invent 2017, the presenter mentioned that SageMaker is not ideally suited for lift-and-shift of existing ML workloads. He said that ideally SageMaker would be best suited for a migration or refactoring. I anticipate that in the future AWS will make the lift-and-shift from existing-to-SageMaker a feasible option and create some tooling around that use case.

The use case that was presented at re:Invent 2017 was done with the company Intuit. The Data Science Manager for Intuit described how they used SageMaker to develop and deploy their fraud detection models. She was so impressed with the features and performance that they have plans to integrate more ML in other areas of Intuit’s products.

Overall, SageMaker is a huge leap forward for ML Developers and Data Scientists. It has the potential to make the ML development process much simpler, and increase efficiency for Data Scientists.

 

References

Cost is by the second:

https://aws.amazon.com/sagemaker/pricing/

https://aws.amazon.com/sagemaker/pricing/instance-types/

 

Other articles you may be interested in

Machine Learning, Part 1, Part 2, and Part 3 – By Alex Graves

Turn-key, Move-in Ready Data Solutions – By Aaron Caldiero

 

About the Author

Aaron Caldiero is the Senior Big Data Solutions Architect at 1Strategy. Prior to starting at 1Strategy, Aaron was the Manager of Enterprise Data Science for Zions Bancorporation. Aaron helps customers architect their data solutions in AWS.

Aaron has 16 years experience in the Financial Services industry, and has built data platforms and solutions from the ground up. Aaron has experience working in Accounting, Finance, Marketing, Fraud, and Security departments within financial institutions over the years.