What if you could predict whether your stock of choice would rise or fall during the next month? Or if your favorite football team would win or lose their next match? How can you make such predictions? Perhaps machine learning can provide part of the answer. Cortana, the new digital personal assistant powered by Bing that comes with Windows Phone 8.1 accurately predicted 15 out of 16 matches in the 2014 FIFA World Cup.
In this Azure tutorial, we will explore Azure Machine Learning features and capabilities through solving one of the problems that we face in our everyday lives.
From the machine learning developer’s point of view, problems can be divided into two groups - those that can be solved using standard methods, and those that cannot be solved using standard methods. Unfortunately, most real life problems belongs to the second group. This is where machine learning comes into play. The basic idea is to use machines to find meaningful patterns in historical data and use it to solve the problem.
Gas prices are probably one of the items already in most people’s budget. Constant increase or decrease can influence prices of other groceries and services as well. There are a lot of factors that can influence gas prices, from weather conditions to political decisions and administrative fees, and to totally unpredictable factors such as natural disasters or wars.
The plan for this Azure machine learning tutorial is to investigate some accessible data and find correlations that can be exploited to create a prediction model.
Azure Machine Learning Studio
Azure Machine Learning Studio is web-based integrated development environment (IDE) for developing data experiments. It is closely knit with the rest of Azure’s cloud services and that simplifies development and deployment of machine learning models and services.
Creating the Experiment
There are five basic steps to creating a machine learning example. We will examine each of these steps through developing our own prediction model for gas prices.
Obtaining the Data
Gathering data is one of the most important step in this process. Relevance and clarity of the data are the basis for creating good prediction models. Azure Machine Learning Studio provides a number of sample data sets. Another great collection of datasets can be found at archive.ics.uci.edu/ml/datasets.html.
After collecting the data, we need to upload it to the Studio through their simple data upload mechanism:
Once uploaded, we can preview the data. The following picture shows part of our data that we just uploaded. Our goal here is to predict the price under the column labeled E95.
Our next step is to create a new experiment by dragging and dropping modules from the panel on the left into the working area.
Preprocessing available data involves adjusting the available data to your needs. The first module that we will use here is “Descriptive Statistics”. It computes statistical data from the available data. Besides “Descriptive Statistics” module, one of the commonly used modules is “Clean Missing Data”. The aim of this step is to give meaning to missing (null) values by replacing it with some other value or by removing them entirely.
Another module applied at this step in our tutorial is the “Filter Based Feature Selection” module. This module determines the features of the dataset that are most relevant to the results that we want to predict. In this case, as you can see in the picture below, the four most relevant features for “E95” values are “EDG BS”, “Oil”, “USD/HRK”, and “EUR/USD”.
Since “EDG BS” is another “output” value that cannot be used for making predictions, we will select only two from the remaining important features - that is price of oil, and currency rate under USD/HRK column.
Sample of the dataset after processing is shown below:
Choosing and Applying an Algorithm
Our next step is to split the available data using the “Split” module. The first part of the data (in our case 80%) will be used to train the model and the rest is used to score the trained model.
The following steps are the most important steps in the entire Azure machine learning process. The module “Train Model” accepts two input parameters. First is the raw training data, and the other is the learning algorithm. Here, we will be using the “Linear Regression” algorithm. Output of the “Train Model” module is one of the input parameters of the “Score Model” module. The other one is the rest of the available data. Score Model adds a new column to our dataset, Scored Labels. Values under the “Scored Labels” column are closer to the values of their corresponding E95 values when the applied learning algorithm works well with the available data.
Evaluate Model module gives us an evaluation of the trained model expressed in statistical values. If we look at “Coefficient of Determination”, we can conclude that there is around an 80% chance of predicting the correct price using this model.
Now, it is worth a try to use “Neural Network Regression” module. We will need to add new “Train Model” and “Score Model” modules and connect the output to the existing “Evaluate Model” module.
The “Neural Network Regression” module requires a bit more configuration. Since this is the most important module of the entire experiment, it is where we should focus our efforts, tweaking and experimenting with the settings and selection of the appropriate learning algorithm as a whole.
In this case, Evaluate module gives us a comparison of our two trained models. Again, based on Coefficient of Determination we see that Neural Networks provides slightly less accurate predictions.
At this point we can save the selected trained models for future use.
When we have a trained model, we can proceed with creating “Scoring Experiment”. That can be done by creating a new experiment from scratch or by using Azure Machine Learning Studio helper. Simply select the trained model and click on “Create Scoring Experiment”. New modules that we need here are “Web service input” and “Web service output”. We will add a “Project Columns” module to select our input and output values. Input values are Oil and USD/HRK, and output is predicted value under “Scored Labels” column of the “Score Model” output.
The picture below shows our scoring experiment after these few adjustments and after connecting the “Web service input” and “Web service output” modules accordingly.
Another nifty helper feature comes to play at this point. With “Publish Web Service” you can create a simple web service hosted on Azure’s cloud infrastructure.
Predicting New Data
Finally, we can test our prediction web service using a simple test form.
Through this simple machine learning tutorial we have shown how to create a fully functional prediction web service. Azure Machine Learning Studio integrated into the Azure platform can be a very powerful tool for creating data experiments. Besides Machine Learning Studio, there are other machine learning solutions such as Orange and Tiberious. Regardless of the development environment you like, I encourage you to explore machine learning and find your inner data scientist.