November 15th, 2019
Amazon Personalize and Forecast – Machine Learning for the Rest of Us
By Will Nave

Machine Learning and Artificial Intelligence (ML/AI) is being adopted by businesses across all industries at an ever-increasing rate. An organization’s ability to leverage ML/AI technology depends on numerous factors. For starters, wrangling all of your data is no small feat. Recruiting and retaining talent with the necessary domain knowledge to put your data to work is also a big challenge. Combine all of that with the specialized hardware and systems required to develop and train models, and the endeavor quickly becomes unapproachable for many businesses that don’t have a tech giant’s resources and budget.

Amazon Web Services (AWS) is hoping to bridge that gap and ease the entry to ML/AI with its newest services; Amazon Personalize and Amazon Forecast. With Amazon Personalize and Amazon Forecast, you no longer need to have the technical expertise and domain knowledge to get started with Machine Learning. If you can manage to organize your data into a CSV file and store it in S3, you can leverage either service to use ML models for generating dynamic campaigns and performing time series forecasting.

Instead of maintaining a team of data scientists and ML experts to lead the development of algorithms and models, Amazon Personalize and Amazon Forecast let you benefit from the years of research and development Amazon has used for their own business. You provide the data, they provide automated processes for doing hyperparameter tuning, model selection, training, and campaign/forecast generation. At the end of a successful deployment, you are left with a query-able endpoint to access your results.

The AWS Console provides an intuitive interface and easy to navigate visualizations for interacting with your data in both services. Each service operates off the concept of Dataset Groups. Within an established Dataset Group, one to three datasets are stored containing the data that will be used to train the models. Based on the content of the datasets you provide, and the options you choose during configuration, both services are capable of automating the process of algorithm selection and training.

Amazon Personalize

Getting started with Amazon Personalize, requires the creation of the Dataset Group and the association of up to three datasets. The only required dataset is the user-item interaction data. At a minimum, the user-item interaction dataset requires the following three fields represented by the following names and datatypes:

  • user_id(string)
  • item_id(string)
  • timestamp(long)

Additionally, you can choose to include the users and items datasets. The users dataset is the first optional dataset and only requires the user_id field. The items dataset is the second of two optional datasets and only requires the item_id field. As an addition to the required fields, each of the datasets can include up to five metadata fields. Age and gender are examples of metadata fields that are used as part of the users dataset. Genre, color, and brand are examples of metadata fields that could be included in the items dataset.

To associate the datasets with the Dataset Group, a schema for each dataset must be created. The schemas for Amazon Personalize conform to the Apache Avro format. When creating your schemas, you must comply with the following rules:

  • The schema fields can be in any order, but they must match the order of the corresponding header columns in the dataset CSV file.
  • All required fields for a specific dataset type must be included in the dataset and must conform to the datatype associated with the field.
  • Metadata fields that are added in addition to the required fields can be a string or non-string type. If the metadata field is a string, it must also include the categorical attribute.
  • Each schema and its related dataset can contain up to five metadata fields.

The following, taken from the official AWS Personalize documentation, is an example of a schema created for a users dataset:

{
    "type": "record",
    "name": "Users",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "AGE",
            "type": "int"
        },
        {
            "name": "GENDER",
            "type": "string",
            "categorical": true
        }
    ],
    "version": "1.0"
}

The contents of a corresponding CSV file representing the users dataset should look similar to this example:

USER_ID,AGE,GENDER
john_doe,23,male
jane_doe,39,female
jeff_vader,62,male

After all of the datasets have been created and associated with schemas they can be imported into the Dataset Group. Upon successful import, the process of creating a Solution can begin. Solutions are the representation of trained ML models used to make recommendations to customers. Amazon Personalize uses the concept of “recipes” to simplify the configuration and creation of personalized Solutions. Each selectable recipe is represented by a different configured combination of algorithm and hyperparameters. Amazon Personalize provides three types of recipes, each with differing requirements. Here is a list of the three recipe types and the individual recipes that fit into each of the three categories. (For more information see documentation on Using Predefined Recipes)

  1. USER_PERSONALIZATION Recipes – predicts the items a user will interact with.
    1. HRNN
    2. HRNN-Metadata
    3. HRNN-Coldstart
    4. Popularity-Count
  2. PERSONALIZED_RANKING Recipes – personalizes ranked results.
    1. Personalized-Ranking
  3. RELATED_ITEMS Recipes – returns items similar to a given item.
    1. SIMS

The creation of the solution involves selecting the Create Solution option from the console, providing a name, choosing Automatic to allow AutoML to select the appropriate recipe, and then selecting whether to allow Amazon Personalize to perform Hyper Parameter Optimization (HPO). Click “Next” to proceed and “Finish” on the next screen to confirm the configuration and create the Solution. That is all there is to it!

Once your Solution has been built, a solution version will need to be created to start the training process. Select your Solution in the console and click on the “Create solution version” button in the top right of the screen.

As soon as the Solution version finishes training it is time to create a Campaign and start generating recommendations! Testing your Campaign by generating a recommendation is as easy as going to the newly created Campaign in the console and entering a user_id in the User ID field found under the Test campaign results section.

Amazon Forecast

Working with Amazon Forecast is similar to the same workflow used in Amazon Personalize. To begin, the creation of a Dataset Group is required. Also similar to Amazon Personalize, up to three datasets in the CSV file format can be added to the Dataset Group. In a similar use case as the Recipes in Amazon Personalize, Amazon Forecast employs Forecasting Domains to help automate the process of selecting, tuning, and training the most optimal model for your purpose. The creation of the Dataset Group requires a name and a Forecasting Domain designation. The following seven Forecasting Domains are available (for a more detailed description of the forecasting domains, see Predefined Dataset Domains and Dataset Types):

  • Retail – used to forecast retail demand.
  • Inventory planning – used to forecast demand for raw materials and determine how much inventory of any given item is in stock.
  • EC2 capacity – used to forecast Amazon Elastic Compute Cloud capacity.
  • Work force – used to plan and identify the amount of work force that is required.
  • Web traffic – used to forecast web traffic to a web property or set of web properties.
  • Metrics – used to forecast metrics such as revenue, sales, and cashflow.
  • Custom – used to generate forecasts that do not fit into any of the other pre-defined forecasting domains.

The only required dataset in Amazon Forecast is the target time series dataset. The target time series requires the following fields with associated datatypes:

  • item_id(string)
  • timestamp(timestamp)
  • target_value(float)

The required target_value field of the target time series dataset will vary based upon the selected Forecasting Domain of the Dataset Group. For instance, if you select Inventory planning as the Forecasting Domain the target_value would be represented by demand and be labeled “demand” in the CSV column header and schema in place of “target_value”.

The related time series dataset is the first of the two optional datasets. The related time series has two required fields, item_id, and timestamp. Price, stockout_days, inventory_onhand, revenue, and in_stock are all recommended fields for the related time series. It is important to note, any field added to the related time series that is not one of the required fields must be of the datatype integer or float.

The item metadata dataset is the second of the two optional datasets. The item metadata dataset only requires the item_id field. Category, brand, lead_time, order_cycle, and safety_stock are all examples of additional fields that could be included in the item metadata dataset. All fields in the item metadata dataset must be of the string datatype.

To import your datasets to the Dataset Group, Amazon Forecast requires the creation of a schema. Similar to the schema associated with Dataset Groups in Amazon Personalize, the schemas used to import datasets to Amazon Forecast Dataset Groups must match the column headers both in name and order. Below is an example of a target time series schema:

{
    "Attributes": [
        {
            "AttributeName": "item_id",
            "AttributeType": "string"
        },
        {
            "AttributeName": "timestamp",
            "AttributeType": "timestamp"
        },
        {
            "AttributeName": "demand",
            "AttributeType": "float"
        }
    ]
}

The following is an example of what a correlating target time series CSV file might look like:

ITEM_ID,TIMESTAMP,DEMAND
11111,2019-10-05,10.7
22222,2019-10-05,42.0
33333,2019-10-05,3.12

Two of the three datasets involve a third configuration step that requires selecting the frequency of the data. Data frequency for the target time series dataset and the related time series datasets is based on the timestamp values of the timestamp fields. Both datasets should have parity between them in regard to the timestamp values and accordingly should share the same data frequency.

Another important detail to note is the limitation of available formats that can be used to represent timestamp fields in the forecast datasets. All timestamp datatype values must be of the yyyy-MM-dd format, or the yyyy-MM-dd HH:mm:ss format. Any other timestamp format will result in a failure during import to the Dataset Group. If the data frequency of your dataset is minutes or hours, you must use the yyyy-MM-dd HH:mm:ss format.

Once the datasets are all successfully imported into the Dataset Group, a Predictor can be created. The Predictor, conceptually similar to a Solution in Amazon Personalize, is a simplified and automated means of selecting, configuring, and training your forecast model. There are a few configuration values to be mindful of to ensure a successful training:

  • Forecast horizon – tells Amazon Forecast how far into the future to predict your data and is set in units that should have a direct correlation to the data frequency of your target time series dataset.
  • Forecast frequency – the frequency at which your forecasts are generated. This value must be greater than or equal to the target time series dataset frequency.

Once the Predictor finishes training it can be used to create a Forecast. And once the Forecast has been created it can be queried via a forecast lookup to generate a forecast for a specific item.

With that, all that is left is to interpret the results. The P10, P50, and P90 values have respectively 10%, 50%, and 90% probability of satisfying actual demand; with the P50 value coming in closest to what the actual demand should be.

Before diving in . . .

I highly recommend taking advantage of the demos and tutorials that the Amazon Forecast and Amazon Personalize teams have put together. Start with the console-based tutorials, and then move on to the workshops that provide a deeper understanding of both services and use Jupyter notebooks to demonstrate the process of automating all of the steps required to create Campaigns and Forecasts.

Have questions about Amazon Forecast and Amazon Personalize? Ready to take the next steps and deploy your Campaigns and Forecasts to production? Schedule a consultation with our AWS Experts, or reach out to us at info@1strategy.com.