Skip to content

BigML Spring 2017 Release Webinar Video is Here!

We are happy to share that Time Series, BigML’s latest resource is already fully implemented on the BigML platform and available from the BigML Dashboard, API, as well as from WhizzML for its automation. Special thanks to all webinar attendees who joined the BigML Team yesterday during the official launch. As usual, your feedback and questions are very much appreciated and help us keep improving BigML every day!

Time Series is a sequentially indexed representation of your historical data that helps you forecast future values of numerical properties. This method is commonly used for predicting stock prices, sales forecasting, website traffic, production and inventory analysis, and weather forecasting, among many other use cases.

Don’t fret if you missed the live webinar. It is now available on the BigML YouTube channel so you can watch it as many times as you wish.

Please visit our dedicated Spring 2017 Release page for further reading. The learning resources available include:

  • The slides used during the webinar.
  • The Time Series documentation to learn how to create and evaluate your Time Series, and interpret the results before making forecasts from the BigML Dashboard and the BigML API.
  • The series of six blog posts that explain Time Series starting with the basics and progressively diving deeper into the technical and practical aspects of this new resource, with an emphasis on Time Series models for forecasting.

Thanks for your positive comments after the webinar. And remember that you can always reach out to us at support@bigml.com for any suggestions or questions you may have.

Behind the Scenes of BigML’s Time Series Forecasting

BigML’s Time Series Forecasting model uses Exponential Smoothing under hood. This blog post, the last one of our series of six about Time Series, will explore the technical details of Exponential Smoothing models, to help you gain insights about your forecasting results.

Exponential Smoothing Explained

To understand Exponential Smoothing, let’s first focus on the smoothing part of that term. Consider the following series, depicting the closing share price of EBAY over a 400 day period.

ebay

There is definitely some shape here, which can help us tell the story of this particular stock symbol. However, there are also quite a few transient fluctuations which are not necessarily of interest. One way to address this is to run a moving average filter over the data.  ebay-ma

The output of the moving average (MA) filter is shown as the blue line. At each time index, we compute the filtered data point as the arithmetic mean of the unfiltered data points located within a window of fixed width m about that time index. Given time series data y, a (symmetric) moving average filter produces the filtered series:

ql_a9573b73272fff76f75a8993c010a7a5_l3

As seen in the figure, the resulting filtered time series contains only the large scale movements in the stock price, and so we have successfully smoothed the noise away from the signal.

When we apply Exponential Smoothing to a time series, we are performing an operation that is somewhat similar to the moving average filter. The exponential smoothing filter produces the following series:

\ell_t = \alpha y_t + (1 - \alpha)\ell_{t-1}

Where 0 < \alpha < 1 is the smoothing coefficient. In other words, the smoothed value l is the \alpha-weighted average between the current data point and the previous smoothed value. If we substitute the value for  \ell_{t-1} , we can rewrite the exponential smoothing expression like so:

level-smoothing

Where \ell_0 is the initial smoothed state value. Here, we see that the exponentially smoothed value is a weighted sum of the original data points, just as with the MA filter. However, whereas the MA filter computes a uniformly-weighted sum over a window of constant width, the exponential smoother computes the sum going all the way back to the beginning of the series. Also the weights are highest for the points closest to the current time index, and decrease exponentially going back in time. To verify that this produces a smoothing effect, we can apply it to our EBAY data and look at the results.

ebay-ses

Why would we choose to smooth a time series using an exponential window instead of moving average? Conceptually, the exponential window is attractive because it allows the filter to emphasize a point’s immediate neighborhood without completely discarding the time series’ history. The fact that the parameter \alpha is continuously-valued means that there is more freedom to fine-tune the smoother’s fit to the data, compared to the moving average filter’s integer-value parameter.

Now, the other half of time series modeling is creating forecasts. Both the moving average and exponential smoother have a flat forecast function. That is, for any horizon h beyond the final data point, the forecast is just the last smoothed value computed by the filter.

\hat{y}_{t+h|t} = \ell_t

This is admittedly quite a simplistic result, but for stationary time series, these forecast values can be usable for reasonably short horizons. In order to forecast time series which exhibit more interesting movement, we need to incorporate trend into our model.

Trend models

In the previous section we smoothed a time series using a single pass of an exponential window filter, resulting in a “level-only” model which produces flat forecasts. To introduce some motion into our exponential smoothing forecasts we can add a trend component to our model.  We will define trend as the change between two consecutive level values \ell_{t-1} and \ell_t, and then interpret this purposefully vague definition in two ways:

  1.  The difference between consecutive level values (additive trend):                                                   r_t=\ell_t-\ell_{t-1}
  2.  The ratio between consecutive level values (multiplicative trend):                                                   r_t=\ell_t/\ell_{t-1}

We can then perform exponential smoothing on this trend value, in an identical fashion to the level value:

b_t=\beta r_t + (1-\beta)b_{t-1}

Where  0 < \beta < 1 is the trend smoothing coefficient.  This combination of exponential smoothing for level and trend is frequently referred to as Holt’s linear or exponential trend method, after the author who first described it in 1957. The forecast for a given horizon h from an exponential smoothing model with trend is simply the most recent level value, with the smoothed trend applied h times. That is,

y_{t+h|t} = \ell_t +hb_t \quad \textrm{or} \quad y_{t+h|t} =  \ell_t b_t^h

Hence, for additive trend models, the forecast is a straight line, and for multiplicative trend models, the forecast is an exponential curve. For some cases, it may be undesirable for the trend to continue at a constant value as the forecast horizon grows. We can introduce a damping coefficient 0 < \phi < 1 and reformulate the smoothing equations. The forecast, level, and trend equations for a damped additive trend model are:

ql_dd4bbc9f49548f4317d52b35400da685_l3

and for multiplicative trend:
ql_556f27bfade9aa8a8ce2c492113d3c10_l3

Seasonal models

Many time series exhibit seasonality, that is, a pattern of variation that takes over consecutive periods of fixed length. For example, alcohol sales may be higher during the summer than the winter, year after year, so a time series containing monthly sales figures of beer could exhibit a seasonal pattern with a period of m=12. Once again, seasonality can be modeled additively or multiplicatively. In the case of the former, the seasonal variation is independent of the level of series whereas in the latter, the variation is modeled as a proportion of the current level.

To bring it all together, the following is an example of a time series which exhibits both trend and seasonality.

ausbeer-mam

Note how the level is a smoothed version of the observed data, and the trend (labeled “slope”) is more or less the rate of change in the level.

Learning exponential smoothing models

Exponential smoothing models are fully specified by their smoothing coefficients α,β,γ , and φ along with initial state values l0, b0, and s0 (the remaining state values are obtained by running the smoothing equations forward). To evaluate how well an exponential smoothing model fits the data, we compute what is called the “within-sample one step ahead forecast error”. Put plainly, for each time step t, we compute the forecast for one step ahead, and calculate the error between the forecast and the actual data from the next time step.

et

We compute these errors for each time step where we have observed data available, and the sum of squared errors is our metric for model fit. This metric is then used to perform numeric optimization in order to obtain the best values for the smoothing coefficients and initial state values. BigML uses the Nelder-Mead simplex algorithm as its optimization solution.

Model Selection

Considering all the different combinations of trend and seasonality types for exponential smoothing can mean that we must choose among over a dozen different model configurations for a time series modeling task. Therefore, we need some way to rank the models against each other. Naturally, the ranking should incorporate a measure of how well the model fits the training time series, but should also help us avoid models which overfit the data. The tool that fits these requirements is the Akaike Information Criterion (AIC). Let \hat{L} be maximum likelihood value of the model, computed from the sum of squared errors between the model fit and the true training values. Let k be the total number of parameters required by the model type. For example an A,Ad,A model with seasonality of 4 uses 10 parameters: 4 smoothing coefficients and 6 initial state values (one level, one trend, and 4 seasonality). The AIC is defined by the following difference.

aic

Models which produce lower AIC values are considered better choices, so the best model is the one which maximizes the likelihood L, while minimizing the number of parameters k. Along with the AIC, BigML also computes two additional metrics for each model: the bias-corrected AIC (AICc), and the Bayesian Information Criterion (BIC). These quantities are still log-likelihood values penalized by model complexity. However the degree to which they punish extra parameters varies, with the BIC being most sensitive to the AIC being the least.

Want to know more about Time Series?

If you have any questions or you’d like to learn more about how Time Series work, please visit the dedicated release page. It includes a series of six blog posts about Time Series, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

Automating Time Series with WhizzML

by

Since the beginning of our civilization, humans have worried about the future. In particular, we worry about predicting the future. It’s widely known that in ancient Greece, the most famous oracle was in Delphi. Greek people went there to find out about their future and to decide what they should do to turn their fortunes around. Three thousand years later, the worry about how to act in the future remains, however, we’ve learned to base our decisions on the algorithms and science inherited from Pythagoras, Euclid, Thales or Archimedes rather than on Pythia’s words.

oraculo_griego_ml

To continue with our series of posts about Time Series, this fifth blog post focuses on WhizzML users. WhizzML is our Domain Specific Language for Machine Learning workflow automation, which provides programmatic support for all your BigML resources in a way that is completely executed in the BigML back-end.

Every resource in BigML can be managed through WhizzML. It follows naturally that you can now use WhizzML scripts to create Time Series models and make forecasts with it.

For the Time Series resource, we are going to begin explaining how to split a dataset based on the range parameter as it’s important to keep the data in the same order for creating the Time Series and also for evaluating it. As explained in a previous blog post, when testing other supervised models, we can use as test data any randomly sampled subset of instances. However, evaluating time series requires the hold out to be a range of our data, as the training-test splits must be sequential. For an 80-20 split, the test set is the final 20% of rows in the dataset. WhizzML calculates the split ranges by specifying a percentage of rows. In the script snippet below, we set aside 80% of our rows.

Screen Shot 2017-07-17 at 03.22.21

The function linear-split receives a dataset and a percentage, and it creates two complementary datasets, one for testing test-ds and one for training train-ds, by splitting the existing rows in two ranges. The one above is the largest script in this post, so it only gets easier from here on.

Now let’s see how to create a Time Series that models our data. Because we would like to evaluate our Time Series later on, we will use the training_ds_id that is output by our script. In fact, the unique mandatory parameter to create a Time Series is the dataset ID used for training. You can also specify which ETS models you would like to generate. By default, BigML will explore all of them. So the simplest code to create a Time Series is as easy as the example that follows.

Screen Shot 2017-07-18 at 11.45.12

In case you have more information about your data, you might want to use other Time Series creation parameters. The full list of parameters can be found in the Time Series section of the API documentation.

For monthly data and seasonal activities, you can set an additive trend for your data with its seasonality set to additive (with value 1) and its period specified as 12. Then you may fill the properties in the function that creates the time series as in the example below.

Screen Shot 2017-07-18 at 11.49.42

This is a good point to remind ourselves about the fact that most WhizzML requests are asynchronous. Thus, it’s quite possible that you will need to wait for the resource to finish before referring to it in other scripts or accessing its properties. For the previous example, the code would look like this:

Screen Shot 2017-07-18 at 11.52.17

Once the Time Series has been created, we can evaluate how good its fit is. Remember that our original dataset was chronologically split into two parts. Now, we will use the remaining 20% of the dataset to check the Time Series model performance.  The test-ds parameter in the code below represents the second part of the dataset. Following WhizzML’s less is more philosophy, creating an evaluation requires a simple code snippet with only two mandatory parameters: a Time Series to be evaluated and a dataset to use as test data.

Screen Shot 2017-07-18 at 11.55.00

In the evaluation object, there are some measures for each one of the ETS models in the Time Series. For more on this, see section 5 of our previous post.

After evaluating your Time Series, what’s next is calling for the aforementioned “Modern-day Oracle”. Once you build a Time Series with the entire original dataset, that is, including the hold-out rows, you can predict the future values of one or many features in your data domain. In this code, we demonstrate the simplest case, where the forecast is made only for one of the fields in your dataset.

Screen Shot 2017-07-17 at 10.53.26

As a developer, the cool part of WhizzML is that it allows you to create a complete script with all the steps you need and execute them in the cloud or in an on-premises deployment with a single call. This takes advantage of the service’s built-in scalability and parallelization capabilities and minimizes latency or possible network brittleness while exchanging information with the cloud. You can do this by directly creating an execution of your script either directly in the BigML API or by using any of the existing BigML Bindings.

BigML supports bindings in different programming languages that allow you to create not only the resources available in the platform, such as Time Series and Evaluations, but also Scripts and Executions. Everything can be managed from your favorite programming languages like Python, Node.js among many others. You can see the complete list of our bindings and the related documentation on our dedicated Tools page.

Want to know more about Time Series?

If you have any questions or you’d like to learn more about how Time Series work, please visit the dedicated release page for further learning. It includes a series of six blog posts about Time Series, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

Programming Time Series with BigML’s API

In this blog post, the fourth one of our series of six, we want to provide a brief summary of all the necessary steps to create a Time Series using the BigML API. As stated in our previous post, Time Series is often used to forecast the future values of a numeric field, which is sequentially distributed over time such as stock prices, sales volume or industrial data among many other use cases.

The API workflow to create a Time Series includes five main steps: first, upload your data to BigML, then create a dataset, create your Time Series, evaluate your Time Series and finally make forecasts. Note that any resource created with the API will automatically be created in your Dashboard too so you can take advantage of BigML’s intuitive visualizations at any time.

In case you never used the BigML API before, all requests to manage your resources must use HTTPS and be authenticated by using your username and API key to verify your identity. Find below a base URL example to manage Time Series:

https://bigml.io/timeseries?username=$BIGML_USERNAME;api_key=$BIGML_API_KEY

1. Upload your Data

You can upload your data, in your preferred format, from a local file, a remote file (using a URL) or from your cloud repository, e.g., AWS, Azure etc. This will automatically create a source in your BigML account.

First, you need to open up a terminal with curl or any other command-line tool that implements standard HTTPS methods. In the example below, we are creating a source from a local CSV file containing the monthly gasoline demand in Ontario from 1960 until 1975 that we previously downloaded from DataMarket.

curl "https://bigml.io/source?$BIGML_AUTH"
      -F file=@monthly-gasoline-demand-ontario-.csv

Remember that Time Series need to be trained with time-based data. BigML assumes the instances in your source data are chronologically ordered, i.e., the first instance in your dataset will be taken as the first data point in the series, the second instance is taken as the second data point and so on.

2. Create a Dataset

After the source is created, you need to build a dataset, which serializes your data and transforms it into a suitable input for the Machine Learning algorithm.

curl "https://bigml.io/dataset?$BIGML_AUTH"
       -X POST
       -H 'content-type: application/json'
       -d '{"source":"source/595f76362f1dfe13c4002737"}'

3. Create your Time Series

You only need your dataset ID to train your Time Series and BigML will set the default values for the rest of the configurable parameters. By default, BigML takes the last valid numeric field in your dataset as the objective field. You can also configure all Time Series parameters at creation time. You can find an explanation of each parameter in the previous post.

You can evaluate your Time Series performance with new data. Since the data in Time Series is sequentially distributed, a quick way to train and test your model with different subsets of your dataset is by using the “range” parameter. The “range” allows you to specify the subset instances that you want to use when creating and evaluating your model. For example, if we have 192 instances and we want to take the 80% for training the model and the 20% for testing it, we can set a range of 1 to 154 so the Time Series only uses those instances. Then we will be able to evaluate the model using the rest of instances (from 155 to 192).

curl "https://bigml.io/timeseries?$BIGML_AUTH"
       -X POST
       -H 'content-type: application/json'
       -d '{"dataset":"dataset/98b5527c3c1920386a000467", 
            "range":[1,154]}'

When your Time Series is created, you will get not one but several models in the JSON response. These models are the result of the combinations of the Time Series components (error, trend, and seasonality) and their variations (additive, multiplicative, damped/not damped) explained in this blog post. Each of these models is identified by a unique name that indicates the error, trend and seasonality components of that particular model. For example, the name M,Ad,A indicates a model with Multiplicative errors, Additive damped trend and Additive seasonality. You can perform evaluations and make forecasts for all models or you can select one or more specific models.

4. Evaluate your Time Series

When your Time Series has been created, you can evaluate its predictive performance. You just need to use the Time Series ID and the dataset containing the instances that you want to evaluate. In our example, we are using the same dataset that we used to create the Time Series. For the evaluation, we use the range from 155 to 192, which contains the last instances in the dataset that weren’t used to train the model.

curl "https://bigml.io/evaluation?$BIGML_AUTH"
       -X POST
       -H 'content-type: application/json'
       -d '{"dataset":"dataset/98b5527c3c1920386a000467", 
            "timeseries":"timeseries/98b5527c3c1920386a000467"
            "range":[155,192]}'

Evaluations for Time Series generate some well-known performance metrics such as MAE (Mean Absolute Error), MSE (Mean Squared Error) or R squared. You will also get other not-so-common ones like sMAPE (symmetric Mean Absolute Percentage Error), which is similar to MAE except the model errors are measured in percentage terms, MASE (Mean Absolute Scaled Error) and MDA (Mean Directional Accuracy), which compares the forecast direction (upward or downward) to the actual data direction. You can read more about these metrics in this article.

5. Make Forecasts

When you create a Time Series, BigML automatically forecasts the next 50 data points for each model per objective field. You can find the forecast along with the confidence interval (an upper and lower bound, where the forecast is located with 95% confidence) in the JSON response of the Time Series model.

If you want to perform a forecast for a longer time horizon, you can do it by using your Time Series ID as in the example below.

curl "https://bigml.io/forecast?$BIGML_AUTH"
       -X POST
       -H 'content-type: application/json'
       -d '{"timeseries":"timeseries/98b5527c3c1920386a000467"
            "horizon":100}'

Want to know more about Time Series?

Please visit the dedicated release page for further learning. It includes a series of six blog posts about Time Series, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

Creating your First Time Series Model with BigML’s Dashboard

BigML is bringing Time Series to the Dashboard to help you forecast future values based on your historical data. Time Series is widely used in forecasting stock prices, saleswebsite traffic, production and inventory levels among other use cases. This type of time-based data shows the common attribute of sequential distribution over time.

In this post, we will cover the six fundamental steps it takes to make forecasts by using Time Series in BigML, where you: upload your data, create datasets, train a Time Series model, analyze the resultsevaluate your model and make forecasts.

forecast_workflow2.png

To illustrate each of these steps, we will use a dataset from DataMarket which contains the monthly gasoline demand in Ontario from 1960 until 1975. By looking at the chart below, we can observe two main patterns in the data: the seasonality  (more demand during summer vs. winter months) and an increasing trend over the years.

gas-demand.png

1. Upload your Data

Upload your data to your BigML account. BigML provides many options to do so, in this case, we drag and drop to the Dashboard the dataset we have previously downloaded from DataMarket.

When you upload your data, the data type of each field in your dataset will be automatically detected by BigML. Time Series models will only use the numeric values in your dataset e.g., monthly gas demand field expressed in millions of gallons.

source-ts.png

Important Note!

BigML indexes your instances in the same order they are arranged in the original source, i.e., the first instance (or row) is taken as the first data point in the series, the second instance is taken as the second data point and so on. Therefore you need to ensure that your instances are chronologically ordered in your source data.

2. Create a Dataset

From your Source view, in the 1-click action menu, use the 1-click Dataset option to create a dataset. This is a structured version of your data ready to be consumed by a Time Series.

dataset-ts.png

Since Time Series is considered a supervised model you can evaluate it. You can use the 1-click split option to set aside some test data to later evaluate your model against. Since Time Series data is sequentially distributed, the split of the dataset needs to be linear instead of random. Using the configuration option shown in the image below, the first 80% instances of your dataset will be set aside for training and the last 20% for testing.

linear-split.png

3. Create your Time Series

To train your Time Series you can either use the 1-click Time Series option or you can configure the parameters provided by BigML. BigML allows you to configure the following parameters:

  • Objective fields: these are the fields you want to predict. You can select one or more objective fields and BigML will learn the Time Series models for each field separately, which further streamlines the training process.
  • Default Numeric Value: if your objective fields contain missing values, you can easily replace them by the field mean, median, maximum, minimum or zero. By default, they are replaced by using spline interpolation.
  • Forecast horizon: BigML presents a forecast along with your model creation so you can visualize it in the chart. The horizon is the number of data points that you want to forecast. You can always make a forecast for longer horizons once your model has been created.
  • Model components: BigML models your data by exploring different variations of the error, trend and seasonality components. The combinations of these components result in the multiple models returned (see the introductory blog post of this series for a more detailed explanation):
    • Error: represents the unpredictable variations in the Time Series data, and how they influence observed values. It can be additive or multiplicative. Multiplicative error is only suitable for strictly positive data. By default, BigML explores all error variations.
    • Trend and damped: the trend component can be additive, generating a linear growth of the Time Series, or multiplicative, generating an exponential growth of the Time Series. Moreover, if a damped parameter is included, the trend of the Time Series will become a flat line at some point in the future. By default, BigML explores all trend variations.
    • Seasonality:  if your data contains fixed periods or fluctuations that occur in regular intervals, you need to include the seasonality component to your models. It can be additive or multiplicative, and the latter makes the seasonal variations proportional to the level of the series. By default, BigML explores both methods.
    • Period length: the number of data points per period in the seasonal data. The period needs to be set taking into account the time interval of your instances and the seasonal frequency. For example, for quarterly data and annual seasonality, the period should be 4, for daily data and weekly seasonality, the period should be 7.
  • Dates: you can set dates for your data to visualize them afterward in the x-axis of the Time Series chart. BigML will calculate the dates for each instance by referencing the initial date associated with the first instance in your data and the row interval.
  • Range: you may want to use a subset of instances to create your Time Series. This option is also handy if you haven’t yet split your dataset into training and tests sets.

In our example, we configure the seasonality component by selecting “All”, so BigML explores all possible seasonal combinations (additive and multiplicative) and a period length of 12 since we have monthly data and annual seasonality. We also select the initial date of our dataset with a row interval of 1 month because each of the instances represents a single month of data. At that point, we can simply click the Create button to form our Time Series.

config-ts.png

4. Analyze your Results

When your Time Series has been created you will see your field values and the best Time Series model plotted in a chart. As we mentioned before, BigML learns multiple models as a result of the different components combinations. The best model is selected taking into account the AIC (Akaike’s Information Criterion), but you can use any other of the metrics offered such as the AICc (Corrected Akaike’s Information Criterion), the BIC (Schwarz Bayesian Information Criterion) or the R squared. The preferred metric to select the best model is usually the AIC (or its variations, the AICc or the BIC) rather than the R squared, since it takes into account the trade-off between the model’s goodness-of-fit and the model’s complexity (to avoid overfitting) while the R squared only measures the degree of adjustment of the model to the data. The AICc is a variation of the AIC for small datasets and the BIC introduces a heavier penalization on model complexity.  To learn more about the model metrics read this article.

ts-view1.png

Below the chart, if you display the panel you will find a table containing all the different models learned from your data. You can visualize them by plotting them on the chart. Each model has a unique name which identifies its different components: Error, Trend, Seasonality. In our example, the model A,A,A is a model with Additive error, Additive trend, and Additive seasonality.

ts-view.png

5. Evaluate your Time Series

You can evaluate a Time Series model using data that the model has not seen before. Just click on the Evaluate option in the 1-click menu and BigML will automatically select the remaining 20% of the dataset that you set aside for testing.

evaluate.png

When the evaluation has been created, you will be able to see your model plotted along with the test data data and the model forecasts. By default, BigML selects the best model by R squared measure, which quantifies the goodness-of-fit of the model to the test data and it can take values up to 1. You will also get different performance metrics for each of your models such as the MAE (Mean Absolute Error) and the MSE (Mean Squared Error). The lower the MAE and the MSE and the higher the R squared the better. You can see below that our model has a very good performance on the test data with an R squared of 0.9833.

ts-eval.png

The table within the panel below displays all the related models and other metrics such as sMAPE (symmetric Mean Absolute Percentage Error), which is similar to the MAE except the model errors are measured in percentage terms, the MASE (Mean Absolute Scaled Error) and the MDA (Mean Directional Accuracy), which compares the forecast direction (upward or downward) to the actual data direction. See this article for detailed explanations.

6. Make Forecasts

From your model view, you will be able to see the forecasts of your selected models with up to 50 future data points. If you want to predict a longer horizon, you can click on the option to extend it. You can also compare your model forecast with three other benchmark models: a model that always predicts the mean, a naive model that always predicts the last value of the series and a drift model that draws a straight line between the first and last observation of the series and extrapolates the future values.

ts-forecast.png

Want to know more about Time Series?

Please visit the dedicated release page for further learning. It includes a series of six blog posts about Time Series, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

Welcoming Enrique Dans to the Valencian Summer School in Machine Learning

As the dates near, and given the initial wave of response, we are getting excited about the upcoming Summer School in Machine Learning 2017. This edition will be held at the Veles e Vents building located close to Valencia’s scenic waterfront on September 14-15.

The VSSML17 is a two-day course for advanced undergraduates as well as graduate students and industry practitioners seeking a quick, practical, hands-on introduction to Machine Learning. The last edition was completed by over 140 participants from 19 countries representing 53 companies and 21 academic organizations. This year we have room for over 200 attendees and 26 countries are represented among applicants so far!

We are happy to share that BigML’s Strategic Advisor, prolific Spanish blogger, and IE Business School Professor, Enrique Dans, will be giving a special talk on September 14, at 06:00 PM CEST, at the end of the first day of the Summer School. Enrique Dans will explain the impact that Machine Learning is having in the real world context of business organizations as they go through their digital transformation.

Professor Enrique has a Ph.D. in Management from University of California, Los Angeles and an MBA from Instituto de Empresa (IE). He completed his post-doc studies at Harvard Business School. He is also the author of the best-seller “Everything is going to change.” Among his other qualifications, he serves as the Information Systems and Information Technology Chair at IE Business School. He was also one of the distinguished speakers at the 2015 conference: Technical and Business Perspectives on the Current and Future Impact of Machine Learning.

In the past, the skill set required to develop real-life Machine Learning applications have mostly remained in the playground of the few privileged academics and scientists. Times have changed and many businesses have come to the realization that their workforce can’t afford to stay behind the curve on this key enabler. So we urgently need to produce a much larger group of ML-literate professionals in an inclusive manner that appeals to developers, analysts, managers and subject matter experts alike. Professor Dans’ speech will go into detail on how to best incorporate Machine Learning in your future strategies while launching new types of products and services nobody even dreamt of until recently. 

Don’t miss this groundbreaking, hands-on Machine Learning event. Get your ticket before long, as there are few spaces left. We are looking forward to seeing you in Valencia!

Investigating Real-World Data with Time Series

In this blog post, the second one in our six post series on Time Series, we will bring the power of Time Series to a specific example. As we have previously posted, a BigML Time Series is a sequence of time-ordered data that has been processed by using exponential smoothing. This includes three smoothing filters to dampen high-frequency noise to reveal the underlying trend of the data. With BigML’s simple and beautiful Dashboard visualizations, we’ll investigate the number of houses sold in the United States.

The Data

We will be examining the number of houses sold (in millions) in the United States by month and year from January 1963 to December 2016. 

Just looking at a scatterplot of the data, we see the number of houses sold goes generally up and down until early 1991, after which the trend is mostly upward. It reaches a peak in early 2005, then goes generally downward again until 2011, when it once more begins to climb. Within each of these years, there is a noticeable seasonal trend, with more houses sold in the summer months and fewer in the winter. But these are all subjective impressions. Can we create a quantifiable model to predict house volume?

The Chart

First, let’s create a Time Series model from the 1-click action menu by using our raw dataset.

img 1.png

We can see in the chart that our Time Series data is represented by the black line and the plot of our best fit model is represented by the purple line. The model with the lowest AIC (one measure of fit) is labeled “M,A,N”. By clicking on the Select more models: dropdown, we can see this means this model is using Holt’s linear method with multiplicative errors, additive trend and no seasonality. If we wished, we could select some other model, perhaps optimizing for some other measure of fit.

By sliding the Forecast slider, we can see what the model predicts for dates in the future. This model predicts that the volume of houses sold will continue rise linearly. Because this model does not use seasonality, it doesn’t display the up and down pattern we would expect it to. Let’s create another Time Series, this time configuring the parameters so we can add seasonality.

img 2.png

This time the model with the lowest AIC is labeled “M,N,M” for multiplicative error, no trend, and multiplicative seasonality. It captures the ebb and flow of the seasonal sales, but no longer indicates that volume will continue to go up. Since 1963, housing volume has indeed been overall relatively flat.

img 3.png

Another Look at the Data

Perhaps we aren’t interested in what behavior housing volume has shown since 1963, but rather what it has been doing recently. We may use our domain knowledge to reason that the housing bubble and following crash was a very unusual event justifying our decision to focus on data from 2011 onwards. How has housing sales volume been changing during these years?

So we start by filtering our data to only include the months between January 2011 and December 2016. We want to capture seasonality, so we choose Configure Time Series from the configuration menu and on the advanced options, set Seasonality to All and Seasonal Periods to 12 (twelve months in a year). Now we can see both the upward trend and cyclic seasonality that we expect. One interesting and unexpected thing our model has discovered is that the cyclic trend is not completely smooth. It seems that there is a little uptick in housing volume in October of each year. Perhaps this can be explained by people wanting to buy before the busy holiday season!

img 4

This has been our second blog post on the new Time Series resource. We’ve quickly put Time Series through its paces and used it to better understand sequential trends in our data. Please join us again next time for the third blog post in this series, which will cover a detailed Dashboard tutorial for Time Series.

Want to know more about Time Series?

Please visit the dedicated release page for further learning. It includes a series of six blog posts about Time Series, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

Introduction to Time Series

We are proud to present Time Series as a new resource brought to the BigML platform. On July 20, it will be available via the BigML Dashboard, API and WhizzML. Time Series is a sequentially indexed representation of your historical data that can be used to solve classification and segmentation problems, in addition to forecasting future values of numerical properties, e.g., air pollution level in Madrid for the last two days. This is a very versatile method often used for predicting stock prices, sales forecasting, website traffic, production and inventory analysis, or weather forecasting, among many other use cases.

Following our mission of democratizing Machine Learning and making it easy for everyone, we will provide new learning material for you to start with Time Series from scratch and become a power user over time. We start by publishing a series of six blog posts that will progressively dive deeper into the technical and practical aspects of Time Series with an emphasis on Time Series models for forecasting. Today’s post sets the tone by explaining the basic Time Series concepts. We will follow with an example use case. Then there will be several posts focused on how to use and interpret Time Series through the BigML Dashboard, API, WhizzML to make forecasts for new time horizons. Finally, we will complete this series with a technical view of how Time Series models work behind the scenes.

Let’s get started!

Why Bring Time Series to BigML?

There are times when historical data inform certain behavior in the short or longer future. However, unlike the general classification or regression problems, Time Series needs your data to be organized as a sequence of snapshots of your input fields at various points in time. For example, the chart below depicts the variation of sales during a given month. Can we forecast the sales for future days or even months based on this data?

The answer is a resounding “Yes” since BigML has implemented the exponential smoothing algorithm to train Time Series models. In this type of models, the data is modeled as a combination of exponentially-weighted moving averages. Exponential smoothing is not new, it was proposed in the late 1950s. Some of the most relevant work was Robert Goodell Brown’s, in 1956, and later on, the field of study was expanded by Charles C. Holt in 1957, as well as Peter Winters in 1960. Contrary to other methods, in forecasts produced using exponential smoothing, past instances are not equally weighted and recent instances are given more weight than older instances. In other words, the more recent the observation the higher the associated weight.

From Zero to Time Series Forecasts

 

In BigML, a regular Time Series workflow is composed of training your data, evaluating it and forecasting what will happen in the future. In that way, it is very much like other modeling methods available in BigML. But what makes Time Series different?

1. The training data structure: The instances in the training data need to be sequentially arranged. That is, the first instance in your dataset will be the first data point in time and the last one will be the most recent one. In addition, the interval between consecutive instances must be constant.

2. The objective fields: Time Series models can only be applied to numeric fields, but a single Time Series model can produce forecasts for all the numeric fields in the dataset at once (as opposed to classification or regression models, which only allow one objective field per model). In other words, a Time Series model accepts multiple objective fields, and in fact you can use all numeric fields in the input dataset as objective at once.

3. The Time Series models. BigML automatically trains multiple models for you behind the scenes and lists them according to different criteria, which is a big boost in productivity as compared to hand tuning all the different combinations of the underlying configuration options. BigML’s exponential smoothing methodology models Time Series data as a combination of different components: level, trend, seasonality, and error (see Understanding Time Series Models section for more details).

When creating a Time Series, we have several options regarding whether to model each component additively or multiplicatively, or whether to include a component at all. To alleviate this burden, BigML computes in parallel a model for each applicable combination, allowing you to explore how your Time Series data fits within the entire family of exponential smoothing models. Naturally, we need some way to compare the individual models. BigML computes several different performance measures for each model, allowing you to select the model that best fits your data and gives the most accurate forecasts. Their parameters and corresponding formulas are described in depth in the Dashboard and API documentation.

4. Forecast: BigML lets you forecast the future values of multiple objective fields in short or long time horizons with a Time Series model. You will be able to separately train a different Time Series model for each objective field in just a few clicks. Your Time Series Forecasts come with forecast intervals: a range of values within which the forecast is contained with a 95% probability. Generally, these intervals grow as the forecast period increases, since there is more uncertainty when the predicted time horizon is further away.

5. Evaluation: Evaluations for Time Series models differ from supervised learning evaluations in that the training-test splits must be sequential. For an 80-20 split, the test set is the final 20% of rows in the dataset. Forecasts are then generated from the Time Series model with a horizon equal to the length of the test set. BigML computes the error between the test dataset values and forecast values and represents the evaluation performance in the form of several metrics: Mean Absolute Error (MAE), Mean Squared Error (MSE), R Squared, Symmetric Mean Absolute Percentage Error (sMAPE), Mean Absolute Scaled Error (MASE), and Mean Directional Accuracy (MDA). These metrics are fully covered in the Dashboard documentation.  Every exponential smoothing model type contained by a BigML Time Series model is automatically evaluated in parallel, so the end result is a comprehensive overview of all models’ performance.

Understanding Time Series Models

As mentioned in our description above, Time Series models are characterized by these four components:

  • Level: Exponential smoothing, as the name suggests, reduces noisy variation in a Time Series’ value, resulting in a gentler, smoother curve. To understand how it works, let us first consider a simple moving average filter: for a given order m, the filtered value at time t is simply the arithmetic mean of the m preceding values, or in other words, an equally-weighted sum of the past m values. In exponential smoothing, the filtered value is a weighted sum of all the preceding values, with the weights being highest for the most recent values, and decreasing exponentially as we move towards the past. The rate of this decrease is controlled by a single smoothing coefficient α, where higher values mean the filter is more responsive to localized changes. In the following figure, we compare the results of filtering stock price data with moving average and exponential smoothing.levelBoth smoothing techniques attenuate the sharp peaks found in the underlying Time Series data. The filtered values from exponential smoothing are what we call the level component of the Time Series model. Because the behavior of exponential smoothing is governed by a continuous parameter α, rather than an integer m, the number of possible solution is infinitely greater than the moving average filter, and it is possible to achieve a superior fit to the data by performing parameter optimization. Moreover, the exponential smoothing procedure for the level component may be analogously applied to the remaining Time Series components: trend and seasonality.
  • Trend: While the level component represents the localized average value of a Time Series, the trend component of a Time Series represents the long term trajectory of its value. We represent trend as either the difference between consecutive level values (additive trend, linear trajectory), or the ratio between them (multiplicative trend, exponential trajectory). As with the level component, the sequence of local trend values is considered to be a noisy series which we can again smooth in an exponential fashion. The following series exhibits a pronounced additive (linear) trend.austa
  • Seasonality: The seasonality component of a Time Series represents any variation in its value which follows a consistent pattern over consecutive, fixed-length intervals. For example, sales of alcohol may be consistently higher during summer months and lower during winter months year after year. This variation may be modeled as a relatively constant amount independent of the Time Series’ level (additive seasonality), or as a relatively constant proportion of the level (multiplicative seasonality). The following series is an example of multiplicative seasonality on a yearly cycle. Note that the magnitude of variation is larger when the level of the series is higher.ausbeer2
  • Error: After accounting for the level, trend, and seasonality components. There remains some variation not yet accounted for by the model.  Like seasonality, error may be modeled as an additive process (independent of the series level), or multiplicative process (proportional to the series level). Parameterizing the error component is important for computing confidence bounds for Time Series Forecasts.

In Summary

To wrap up, BigML’s Time Series models:

  • Help solve use cases such as predicting stock prices, sales forecasting, website traffic, production and inventory analysis as well as weather forecasting, among many other use cases.
  • Are used to characterize the properties of ordered sequences, and to forecast their future behavior.
  • Implement exponential smoothing, where the data is modeled as a combination of exponentially-weighted moving averages. That is, the recent instances are given more weight than older instances.
  • Train the data with a different split compared to other modeling methods. The split needs to be sequential rather than random, so you can test your model against the latter period in your dataset.
  • Are trained as level components and it can include other components such as: trend, damped, seasonality, or error.
  • Let you forecast one or multiple objective fields. For multiple objectives, you can train a separate set of Time Series models for each objective field.

You can create Time Series models, interpret and evaluate them, as well as forecast short and longer horizons with them via the BigML Dashboard, our API and Bindings, plus WhizzML and Bindings (to automate your Time Series workflows). All of these options will be showcased in future blog posts.

Want to know more about Time Series?

At this point, you may be wondering how to apply Time Series to solve real-world problems. Rest assured, we’ll cover specific examples in the coming days. For starters, in our next post, we will show a use case where we will be examining a dataset with the number of houses sold in the United States since January 1963 to see if we can predict general or even seasonal trends. Stay tuned!

We hope this post made wet your appetite to learn more about Time Series. Please visit the dedicated release page for further learning. It includes a series of six blog posts about Time Series, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

BigML Spring 2017 Release and Webinar: Time Series!

BigML’s Spring 2017 Release is here! Join us on Thursday July 20, 2017, at 10:00 AM US PDT (Portland, Oregon. GMT -07:00) / 07:00 PM CEST (Valencia, Spain. GMT +02:00) for a FREE live webinar to discover the updated version of BigML’s platform. We’ll be showcasing Time Series, the latest supervised learning method added to our toolset for analyzing time based data when historical patterns can explain future behavior.

 

Our new capability brought to the BigML Dashboard, API and WhizzML is Time Series, a well-known supervised learning method commonly used for predicting stock prices, sales forecasting, website traffic, production and inventory analysis as well as weather forecasting, among many other use cases. In BigML, a Time Series model is trained with Time Series data, that is, a field that contains a sequence of equally distributed data points in time. BigML implements exponential smoothing to train Time Series models, where the data is modeled as a combination of exponentially-weighted moving averages.

 

Time Series is a supervised learning model, as such, it’s ideal to evaluate its performance. As usual, prior to training your model you will need to split your dataset in two different subsets: one for training and the other one for testing. However, the split for Time Series has to be sequential rather than random, which means that you will test your model against the most recent instances in your dataset representing the latter period. BigML offers a special option (via API or Dashboard) for this type of sequential split. You can then easily interpret the results of your model by visually comparing those against the corresponding test data in a chart view.

 

As in every BigML resource, you can make predictions with your model. With Time Series Forecasts you can easily forecast events in short or longer time horizons. You can also employ a Time Series model to forecast the future values of multiple objective fields. Additionally, BigML offers the ability to generate your forecast in real-time on your preferred local device at no cost, which is an ideal context to make faster predictions.

Are you ready to discover this new BigML resource? Please visit the dedicated release page for further learning. It includes a series of six blog posts about Time Series, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

 

Joint Webinar Video: BigML Machine Learning meets Trifacta Data Wrangling

Yesterday we hosted a joint webinar with Trifacta to showcase how seamlessly both platforms fit together in turning raw data into real-life predictive use cases. What makes these tools special is their emphasis on ease of use, making Machine Learning viable for significantly more professionals than ever before. These developers, analyst and business experts routinely work with critical business data sources yet they lack the deep data engineering and/or Machine Learning technical skills that have been darn hard to acquire and retain for organizations that are not named Google or Facebook.

To solidify these benefits, Poul Petersen, BigML’s Chief Infrastructure Officer, and Victor Coustenoble, Technical Regional Manager EMEA at Trifacta, presented a live demo on how to solve a loan risk analysis use case. Special thanks to hundreds of curious minds that registered, attended and asked questions during the webinar. We know some of you couldn’t make it due to conflicts and others found out after the deadline. Don’t fret, you can now watch the full webinar recording on the BigML Youtube channel.

The accompanying presentations are also accessible on the BigML SlideShare page. As you will also find out in the recording, it doesn’t take much to leave behind the inertia and make a dash for sharpening your data wrangling and Machine Learning skills since both Trifacta and BigML offer FREE versions.

Stay tuned for future webinars with concrete examples of how to transform your data to actionable business insights. As always, let us know if there is a specific topic or technique you’d like to see covered next.

%d bloggers like this: