Programming Image Processing

So far, to showcase BigML’s upcoming Image Processing release, we have demonstrated how composite sources work, how you can label your images on the platform, we have covered an example use case in Manufacturing, and how to execute the newly available features using the BigML Dashboard. In contrast, this installment demonstrates how to perform image classification by calling the BigML REST API. As mentioned before, image classification is a supervised learning technique for images to identify various classes of images and has a tremendous amount of applications. Let’s jump in and see how we can put it to use programmatically.

Authentication

Before using the API, you must set up your environment variables. You should set BIGML_USERNAME, BIGML_API_KEY, and BIGML_AUTH in your .bash_profile. BIGML_USERNAME is just your username. Your BIGML_API_KEY can be found on the Dashboard by clicking on your username to pull up the account page, and then clicking on ‘API Key’. Finally, BIGML_AUTH is simply the combination of these elements.

export BIGML_USERNAME=my_name
export BIGML_API_KEY=123456789
export BIGML_AUTH=“username=$BIGML_USERNAME;api_key=$BIGML_API_KEY;“

Upload Your Data

For this tutorial, we’re using the same MNIST dataset used in our previous tutorial demonstrating the new BigML Dashboard capabilities. Once again, a zip file containing a large number of images is organized into multiple folders, each one representing a handwritten digit.

Data sources can be uploaded to BigML in many different ways, so this step should be appropriately adapted to your data with the help of the API documentation. Here, we will create our data source using a local zip file.

curl "https://bigml.io/source?$BIGML_AUTH" -F file=@mnist-traing.zip

You could also upload images included in the MNIST dataset to BigML independently and later build a new composite source to group them like this.

curl "https://bigml.io/source?$BIGML_AUTH" \
  -X POST \
  -H 'content-type: application/json'
  -d '{"sources": ["603de3a8edc1580a58029253", "603de3a8edc1580a58029254", ...]}'

Once you have a composite source, you can add, remove or replace the images inside the source by updating the source with arguments: add_sources, remove_sources, or sources. For example, you can use the following command to remove an image from the composite source.

curl "https://bigml.io/source/602e12b74e72556f5100000a?$BIGML_AUTH" \
  -X PUT \
  -H 'content-type: application/json'
  -d '{"remove_sources": ["603de3a8edc1580a58029253"]}'

Preparing Your Data

In our previous tutorial of the BigML Dashboard, we explained image classification needs labels to build models and showed you how to create them in the Dashboard. Now, we explain how to do the same using the BigML API.

First off, BigML creates a label field automatically if it’s able to detect a folder structure in the zip file. To create a label field via the API, you can use the following command, which specifies the required info for the new field, that is, a name and the optype.

curl "https://bigml.io/source/602e12b74e72556f5100000a?$BIGML_AUTH" \
  -X PUT \
  -H 'content-type: application/json'
  -d '{
        "new_fields": [
          {"name": "Label", "optype": "numeric"}
        ]
      }'

You can define as many label fields as you need.

Now, you have to give a value to the label field for the images contained in the composite source with the class that better describes the image. You can do this by using this command, which specifies the list of components (images in composite) to update, the field id of label field, and the value to assign.

curl "https://bigml.io/source/602e12b74e72556f5100000a?$BIGML_AUTH" \
  -X PUT \
  -H 'content-type: application/json'
  -d '{
        "row_values": [
          {
            "components": [
              "603de3a8edc1580a58029253",
              "603de3a8edc1580a58029254"
            ],
            "field": "100002",
            "value": 6
          }
        ]
      }'

Kindly remember that the full list of arguments can be found in our API documentation.

Creating a Dataset

A BigML dataset is a separate resource and is a serialized form of your data. In the Dashboard, it’s displayed with some simple summary statistics and is the resource consumed by Machine Learning algorithms. To create a dataset from your uploaded data via the API, you can use the following command that specifies the source used to generate the dataset.

curl "https://bigml.io/dataset?$BIGML_AUTH" \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"source": "source/603de3344e72554f7d0000e0"}'

Creating a Deepnet

In order to create the Deepnet, you just need your dataset ID. In this case, BigML will train a particular class of deep neural networks — Convolutional Neural Networks (CNN).

curl "https://bigml.io/deepnet?$BIGML_AUTH" \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"dataset": "dataset/602e12b74e72556f5100000a"}'

You guessed it! The full list of Deepnet arguments can also be found in our API documentation.

Evaluating the Deepnet

Once ready, you can evaluate the deepnets predictive performance. You just need to use the deepnet ID and the dataset containing the instances that you want to evaluate against. By default, BigML will provide multiple performance metrics some of which may be more relevant than others for your use case.

curl "https://bigml.io/evaluation?$BIGML_AUTH" \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"dataset": "dataset/602ca0994e72552267000021",
       "deepnet": "deepnet/603534b44e72550e1e000000"}'

Once you have your Evaluation, you may decide that you want to change some of your Deepnets parameters to improve its performance. No problem. If so, just repeat the previous step with different parameters.

Classifying Images: Making Predictions

When you are satisfied with the outstanding performance of your Deepnet, you can begin to use it to classify new images.

To classify an image, which is also referred to as making a single prediction, you must have previously uploaded the image to the BigML platform, which implies that you already have a source created for it. You can select an individual image in existing sources, either single images or images included in composite sources. You just need to pass the value of the source id as the value for the label field.

curl "https://bigml.io/prediction?$BIGML_AUTH" \
-X POST \
-H 'content-type: application/json' \
-d '{
      "deepnet": "deepnet/603534b44e72550e1e000000", 
      "input_data": {
        "000002": "source/602fd3c48411750dac7bbd3a"
      }
    }'

You can also classify many images at the same time by using batch predictions. In this case, you need a dataset containing the images you want to classify.

curl "https://bigml.io/batchprediction?$BIGML_AUTH" \
-X POST \
-H 'content-type: application/json' \
-d '{"deepnet": "deepnet/603534b44e72550e1e000000", 
     "dataset" : "dataset/603de3344e72554f7d0000a5"}'

You can also configure the output of the batch prediction, whose full list of arguments can be found in our API documentation. How about that? Once the bath prediction is finished, you can easily download the result as a CSV file, or generate an output dataset whichever suits your needs.

Want to know more about Image Processing?

If you have any questions or you would like to learn more about how Image classification works, please visit the release page. It includes a series of blog posts to gently introduce Image Processing from scratch. And remember to register for our free live webinar that will take place on December 15 at 8:30 AM PST / 10:30 AM CST / 5:30 PM CET.