In this blog post, the fourth one of our series of six, we want to provide a brief summary of all the necessary steps to create a Time Series using the BigML API. As stated in our previous post, Time Series is often used to forecast the future values of a numeric field, which is sequentially distributed over time such as stock prices, sales volume or industrial data among many other use cases.
The API workflow to create a Time Series includes five main steps: first, upload your data to BigML, then create a dataset, create your Time Series, evaluate your Time Series and finally make forecasts. Note that any resource created with the API will automatically be created in your Dashboard too so you can take advantage of BigML’s intuitive visualizations at any time.
In case you never used the BigML API before, all requests to manage your resources must use HTTPS and be authenticated by using your username and API key to verify your identity. Find below a base URL example to manage Time Series:
https://bigml.io/timeseries?username=$BIGML_USERNAME;api_key=$BIGML_API_KEY
1. Upload your Data
You can upload your data, in your preferred format, from a local file, a remote file (using a URL) or from your cloud repository, e.g., AWS, Azure etc. This will automatically create a source in your BigML account.
First, you need to open up a terminal with curl or any other command-line tool that implements standard HTTPS methods. In the example below, we are creating a source from a local CSV file containing the monthly gasoline demand in Ontario from 1960 until 1975 that we previously downloaded from DataMarket.
curl "https://bigml.io/source?$BIGML_AUTH" -F file=@monthly-gasoline-demand-ontario-.csv
Remember that Time Series need to be trained with time-based data. BigML assumes the instances in your source data are chronologically ordered, i.e., the first instance in your dataset will be taken as the first data point in the series, the second instance is taken as the second data point and so on.
2. Create a Dataset
After the source is created, you need to build a dataset, which serializes your data and transforms it into a suitable input for the Machine Learning algorithm.
curl "https://bigml.io/dataset?$BIGML_AUTH"
-X POST
-H 'content-type: application/json'
-d '{"source":"source/595f76362f1dfe13c4002737"}'
3. Create your Time Series
You only need your dataset ID to train your Time Series and BigML will set the default values for the rest of the configurable parameters. By default, BigML takes the last valid numeric field in your dataset as the objective field. You can also configure all Time Series parameters at creation time. You can find an explanation of each parameter in the previous post.
You can evaluate your Time Series performance with new data. Since the data in Time Series is sequentially distributed, a quick way to train and test your model with different subsets of your dataset is by using the “range” parameter. The “range” allows you to specify the subset instances that you want to use when creating and evaluating your model. For example, if we have 192 instances and we want to take the 80% for training the model and the 20% for testing it, we can set a range of 1 to 154 so the Time Series only uses those instances. Then we will be able to evaluate the model using the rest of instances (from 155 to 192).
curl "https://bigml.io/timeseries?$BIGML_AUTH"
-X POST
-H 'content-type: application/json'
-d '{"dataset":"dataset/98b5527c3c1920386a000467",
"range":[1,154]}'
When your Time Series is created, you will get not one but several models in the JSON response. These models are the result of the combinations of the Time Series components (error, trend, and seasonality) and their variations (additive, multiplicative, damped/not damped) explained in this blog post. Each of these models is identified by a unique name that indicates the error, trend and seasonality components of that particular model. For example, the name M,Ad,A indicates a model with Multiplicative errors, Additive damped trend and Additive seasonality. You can perform evaluations and make forecasts for all models or you can select one or more specific models.
4. Evaluate your Time Series
When your Time Series has been created, you can evaluate its predictive performance. You just need to use the Time Series ID and the dataset containing the instances that you want to evaluate. In our example, we are using the same dataset that we used to create the Time Series. For the evaluation, we use the range from 155 to 192, which contains the last instances in the dataset that weren’t used to train the model.
curl "https://bigml.io/evaluation?$BIGML_AUTH"
-X POST
-H 'content-type: application/json'
-d '{"dataset":"dataset/98b5527c3c1920386a000467",
"timeseries":"timeseries/98b5527c3c1920386a000467"
"range":[155,192]}'
Evaluations for Time Series generate some well-known performance metrics such as MAE (Mean Absolute Error), MSE (Mean Squared Error) or R squared. You will also get other not-so-common ones like sMAPE (symmetric Mean Absolute Percentage Error), which is similar to MAE except the model errors are measured in percentage terms, MASE (Mean Absolute Scaled Error) and MDA (Mean Directional Accuracy), which compares the forecast direction (upward or downward) to the actual data direction. You can read more about these metrics in this article.
5. Make Forecasts
When you create a Time Series, BigML automatically forecasts the next 50 data points for each model per objective field. You can find the forecast along with the confidence interval (an upper and lower bound, where the forecast is located with 95% confidence) in the JSON response of the Time Series model.
If you want to perform a forecast for a longer time horizon, you can do it by using your Time Series ID as in the example below.
curl "https://bigml.io/forecast?$BIGML_AUTH"
-X POST
-H 'content-type: application/json'
-d '{"timeseries":"timeseries/98b5527c3c1920386a000467"
"horizon":100}'
Want to know more about Time Series?
Please visit the dedicated release page for further learning. It includes a series of six blog posts about Time Series, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.