Inside BigMLKit: A Sample Predictive App for Apple ResearchKit
Following up our post announcing the availability of BigMLKit, we are now going to introduce the BigMLKit API and present a sample app that can be used as a playground to experiment with BigMLKit.
As already mentioned in the previous post, BigMLKit brings the capability of “one-click-to-predict” to iOS and OS X developers. This is accomplished through the notion of task, which is basically a sequence of steps. Each step has traditionally required a certain amount of work such as preparing the data, calling BigML’s REST API, waiting for the operation to complete, collecting the right data to prepare the next step and so on. BigMLKit takes care of all of this “glue logic” for you in a streamlined manner, while also providing an abstracted way to interact with BigML and build complex tasks on top of our platform.
BigMLKit’s classes can be grouped into three groups:
Everything in BigML is associated to resources, such as datasets, clusters, sources, etc. A resource’s identity in BigMLKit is defined through a name and a UUID (universally unique identifier), which are encapsulated in the BMLResourceProtocol protocol. A concrete implementation of BMLResourceProtocol will additionally provide more properties and/or methods according to the specific application that is being built. If you want to be able to filter available resources locally, you will possibly need to use Core Data and define a model whose entities contain the attributes you want to filter on. On the other hand, if you only want to support remote behavior for your entities, then their UUID is enough information for the REST API to handle them.
BigMLKit defines three basic types to build resource UUIDs:
The three types are typedef’ed NSStrings. According to how BigML REST API identifies resources, a BMLResourceFullUuid is made up of a BMLResourceType and a BMLResourceUuid joined through a slash, e.g. “model/de305d54-75b4-431b-adb2-eb6b9e546014”. The class BMLResourceUtils defines convenience methods to extract a BMLResourceType or BMLResourceUuid from a BMLResourceFullUuid.
Tasks and Workflows
Tasks and Workflows are what makes BigMLKit useful.
A workflow is a collection of BigML operations. It can be as simple as a single call to BigML’s REST API or it can include multiple steps, e.g., when creating a sequence of BigML resources starting with a dataset and ending with a prediction.
BigMLKit provides several classes to define and use tasks and workflows, as detailed below.
BMLWorkflow is an abstract base class that is used to build composite workflows combining lower-level workflows together. The simplest form of BMLWorkflow is a BMLWorkflowTask, which corresponds to a single step workflow. BigMLKit provides several BMLWorkflowTask-derived classes that represent basic operations that BigML REST API allows to execute:
BMLWorkflowTaskSequence is a higher-level workflow that is able to execute a sequence of workflows.
BMLWorkflowTaskContext provides the context for task execution, where input, output, and intermediate results can be stored. The context also acts as a monitor for remote operations: it will poll BigML API to check a resource state progress and handle it according to its semantics. The storage mechanism is exposed through an NSMutableDictionary. The association key/value is an implementation detail of the workflows that use the context to carry through their operation. A context also hosts a connector object, which is responsible for handling the communication with BigML through its API interface. Currently, the connector object is an instance of ML4iOS.
BigML’s REST API offers a lot of options to configure the available machine learning algorithms. For each resource type, BigMLKit provides a plist file that describes which options are available and what their type is, so a program can easily handle them, e.g., to display a list of available options or allowing users to set values for them. There are three main classes at play here:
- BMLWorkflowTaskConfiguration, which allows for collecting all options in a common place and accessing them in an organized way; e.g., by getting all option definitions, or their values, etc.
- BMLWorkflowTaskConfigurationOption, which is the atomic option. This basically provides a way to set whether the options should be effectively used in a given execution of the workflow, and to retrieve the current option value.
- BMLWorkflowConfigurator, which is a container for all the BMLWorkflowTaskConfiguration instances associated with a user session. A configurator can be shared across multiple executions of the same workflow, or even different workflows.
In many cases, it is enough to use BigML’s default values for configuration options so there is no need to tweak them. The topic of BigMLKit configuration will be explored in a further post.
Running a Workflow
Running a workflow requires two preliminary steps:
- creating the workflow
- creating and setting up the context for its execution.
As mentioned above, at the moment BigMLKit provides single-operation workflows and a task sequence workflow, but you can easily implement any kind of specific workflow that you might need. Creating and setting up a context is workflow specific. You can see an example in the sample app introduced below.
Once the preliminary steps are done, you can run a workflow by calling the runInContext:completionBlock: method. The completion block will be called at the end of the workflow execution with an NSError argument in case an error occurred.
BigMLKitSampleApp is a simple iOS app that shows how you can integrate BigMLKit into your apps. The sample app is available on GitHub and allows to create a prediction from a data file, which is used to train a model. All the steps from datasource creation up to the model creation are executed remotely on BigML servers, while the prediction step is executed locally based on the calculated model and does NOT require access to BigML services. Three sample source files are provided: iris.csv, diab.csv, and wines.csv.
To keep the sample app code simple enough, it defaults to creating a decision tree, although BigMLKit also provides support for training clusters, and, in the near future, anomaly detectors and other Machine Learning algorithms provided by BigML. Furthermore, the app uses static data files to train the models, but in a real application you could as easily read the data to train your models from iOS HealthKit and/or ResearchKit, or you could use HealthKit/ResearchKit data to make predictions based on an existing reference model.
To understand how BigMLKit is integrated into the app, you can inspect the
BMLPredictionViewController class, and in particular its two methods called
setupFromModel: method is called whenever the app delegate detects that the user tapped on any of the three available source files. On the other hand, startWorkflow is responsible for enabling the UI to provide the user with some visual feedback about the workflow being executed. It also handles the display of workflow results. In greater detail:
- When a tap on a resource file is detected, the app delegate stores the current source file in the shared view model, then calls
setupFromModel:creates a new workflow, properly initializes a context, and finally calls
startWorkflowwill update the UI and then it will start the workflow and provide a callback.
- The callback, if no errors are found, will use the workflow results (available in the workflow context) to build a prediction form, so the user can try different combinations of input arguments to make new predictions.
This is all that is required! As you can see, BigMLKit makes it really straightforward to run a simple workflow and use the power of machine learning in your apps, and we hope that you will find great applications for our technology and create extensions to BigMLKit that will make it even more convenient.
If you have any questions on how to get started with BigMLKit, feel free to contact us at firstname.lastname@example.org.