May 16th, 2017
Turn-key, Move-in-Ready Data Solutions
By Aaron Caldiero

We recently had a professional sports team come to us as a potential client. As with any professional sports team, they had a variety of data sources and a need to perform analytics on that data. Their data sources were both internal and external, in a variety of formats.

The primary sources of their data included their CRM running in Microsoft Dynamics, Ticketmaster daily extracts, Point-of-Sale transaction data, and a variety of miscellaneous internal application data. They wanted to bring all these data sources together to perform analytics to determine the key drivers of season ticket sales, just to start out with.

They also wanted a turn-key solution that they would not need to babysit themselves or hire people with tech skills to handle. They needed to have dashboards for executives that would point to a centralized data repository. The dashboards would need to be able to show high level summaries of the data, as well as provide drill-down capabilities. The final requirement was that this all needed to be done yesterday.

While discovering what the client needed, we also discovered something about professional sports teams: they have very large name recognition, but function more like small businesses. With such a large name recognition one would think they are like large enterprises with accompanying large budgets. However, they actually have a small back office and IT staff that supports the bare minimum of functionality. Most of their expenditures are for players, so expenses for other things—like analytics and databases—need to be kept to a minimum.

Image

This all seems like a very tall order, and there are very few options that can provide a complete, turn-key solution for their needs, especially in such a short amount of time—and especially on a small budget. The client essentially was looking for a move-in-ready data environment, and they wanted to be handed the keys right away.

Now, typically one can’t have their cake and eat it, too. Usually it’s said you can have fast, quality, or cheap—pick any two, because you can’t have all three. However, with a winning combination of Amazon Web Services (AWS) providing the building blocks, and 1Strategy performing the building and gluing together of those building blocks, it is possible to achieve all three—fast, quality, and cheap.

The AWS building blocks we chose to use were S3 and RDS, as well as some custom scripting to bring the pieces together. S3 was used as a landing zone for all of their data sources, and RDS was used as place to house the data after some processing and formatting. We decided to use RDS Aurora; since it is an AWS Managed Service, the client would not need to worry about database maintenance, upgrades, or servicing. Aurora also provides a MySQL interface which made it easy to use with Excel, Tableau, and any other analysis tools the client may want to use.

Below are the steps we took to build out the data environment.

Build Steps

  • Step 1 – Create an Aurora Cluster
  • Step 2 – Configure Aurora Connection
  • Step 3 – Connect to the Aurora Cluster
  • Step 4 – Load Data from S3 to Aurora
  • Step 5 – Format Data to Required Data Types
  • Step 6 – Query the Data
  • Step 7 – Connect to Data from Excel
  • Step 8 – Connect to Data from Tableau and Visualize

Step 1 – Create an Aurora Cluster

The first step is to create an Aurora cluster. This was done through the AWS Console.

  • Select RDS from the Services Menu
  • Select the database engine, in this case we chose Aurora for the reasons described above.
  • Next are the database details.
  •   • For this demo we started out with the defaults for DB Instance Class and Multi-AZ Deployment.
      • For DB Instance Identifier, specify a name that is unique for all DB instances owned by your AWS account in the current region. DB instance identifier is case insensitive, but stored as all lower-case, as in “mydbinstance”
      • Then specify the Master Username and Password

  • Next are configuring the Advanced Settings.
  •   • Leave most of the Advanced Settings with the default values with the exception of the Database Name value. For that specify a string of up to 64 alpha-numeric characters that define the name given to a database that Amazon RDS creates when it creates the DB instance, as in “mydb”.

  • Then click Launch DB Instance, and the Aurora database will be created.

Step 2 – Configure Aurora Connection

The next step is to configure a Role and Security Group for connecting to the Aurora Cluster, and then to associate the Role and Security Group to the Aurora Cluster.

  • Select EC2 from the Services Menu, and then select Security Groups from the sidebar.
  • Select Create Security Group, and then add an inbound rule for MYSQL/Aurora (Type: MYSQL/Aurora, Protocol: TCP, Port: 3306, Source: IP Address), and finally click create.
  • Next select IAM from the Services Menu, and then select Roles from the sidebar.
  • Select Create New Role:
  •   • For Role Name provide an appropriate name, like “my-aurora-role.”
      • For Role Type select AWS Service Roles.
      • From the list select Amazon RDS (allows RDS to call AWS services on your behalf).
      • For Attach Policy just skip for now and click Next Step.
      • Then Review and click Create Role.

  • Next, go back to Roles on the sidebar, search for the role you just created, and select it (this Role is needed in order to move data from S3 to Aurora as part of the ETL process later on).
  •   • On the Permissions tab select Attach Policy.
      • Search for S3 and then select the “AmazonS3FullAccess” Policy.
      • Search for RDS and then select the “AmazonRDSFullAccess” Policy.
      • Select Attach Policy.

  • Next, select RDS from the Services Menu and then select Instances from the sidebar.
  • Select the instance used for the Cluster Endpoint. It should be the one with the Cluster Instance name without the AZ as part of the name.
  •   • Select Instance Actions and from the drop down select Modify.
      • Under the Network & Security section and in the Security Group box select the Security Group previously created.
      • Scroll to the bottom and check the box for “Apply Immediately.”
      • Then select Continue and then Modify DB Instance.

  • Next select Clusters from the sidebar, and select the cluster previously created.
  •   • Select Manage IAM Roles.
      • Select the Role previously created for Aurora from the drop down.
      • Select Done.

Step 3 – Connect to the Aurora Cluster

In this step you will connect to the Aurora Cluster using a JDBC driver and the following connection string pattern: jdbc:mysql://<your-aurora-cluster-endpoint>.rds.amazonaws.com:3306/<your-db-name>

For this example I use SQL Squirrel, but any application that uses JDBC will be able to connect in a similar way.

Step 4 – Load Data from S3 to Aurora

For this step I used SQL Squirrel to submit the code below to the Aurora Cluster to load sample data that was previously extracted from source and saved in S3.

It is also necessary to set up permissions for Aurora to access S3 (for instructions on how to set up permissions see: link to AWS Documentation).

The code example below uses two different methods of loading data from S3. The first method loads all files from an S3 path, and the second one loads data from a single file on S3.


Step 5 – Format Data to Required Data Types

When the sample data has been loaded in the previous step, it is all loaded as strings. This was done to ease the loading of data and to make it easier to do data typing in a way that works best with MySQL/Aurora.

Step 6 – Query the Data

Querying data from Aurora.

Step 7 – Connect to Data from Excel

To connect to the Aurora database from Excel is a simple process of using Excel’s built in query tools for connecting to external data sources using ODBC connections.

Once the connection to the Aurora database has been created, you can use the MSQuery to write SQL and query your data similar to the examples in the previous section. Once your query has been run, the results from the query will be in your Excel spreadsheet. From there you can use all the powers of Excel like Pivot Tables and Charts to visualize and interact with the data.

Step 8 – Connect to Data from Tableau and Visualize

This section describes how to connect Tableau to a MySQL or Aurora database and set up the data source. Before you begin, gather this connection information:

  • Name of the server that hosts the database you want to connect to
  • User name and password
  • Are you connecting to an SSL server? Make the connection and set up the data source.

On the start page, under Connect, click MySQL, and then do the following:

  1. Enter the name of the server that hosts the database.
  2. Enter the user name and password, and then click Sign In.
  3. Select the Require SSL check box when connecting to an SSL server. From the Database drop-down list, select a database or use the text box to search for a database by name.

Under Table, select a table or use the text box to search for a table by name.

Drag the table to the canvas, then click the sheet tab to start your analysis.

Use custom SQL to connect to a specific query rather than the entire data source.

Conclusion

In the end, a customer can get up and running, using an end-to-end data solution very quickly and easily. Using AWS’s services and a few short scripts they can have an end-to-end solution.

For sample data used in this example, click HERE.