Data is the New Gold, but You’ll Have to Dig for It
Neelie Kroes is Vice-President of the European Commission, responsible for the “Digital Agenda” for the European union. When she announced the EU’s Open Data Strategy she opened with “Data is the New Gold”. We wish it were that simple.
Simply accumulating data does not change the financial outlook of your company. Finding insights and discovering actionable information are the keys to unlocking the true value of data. If there’s value in that data, you’ll have to dig. Here’s how the BigML site is set up to help you do this:
Start with creating a Source – simply drag and drop data to upload your data to BigML’s site (streaming and cloud storage coming soon). Besides Source, you’ll have 3 other types of resources: Dataset, Model and Prediction. Here’s a comprehensive picture of how this works.
“Source” represents raw data in BigML. As such, there’s not much useful information at this stage. However, from here you can create a Dataset, which interprets the raw data as appropriate categorical or numeric information. We analyze how the data is structured, and offer a preview of how the Dataset will look.
There’s many operations available here: You can indicate which characters represent missing values, or set the locale for date and number parsing.
You can change each field’s type (for instance turn a zip code from numeric to categorical) or even ignore the field, so it is not taken into account when creating the Dataset and Model. This way you can create multiple Datasets from the same Source and see how the results differ.
While we are creating your Dataset, you can take a closer look at it. We’ll give you a full overview of all the fields in the Dataset, minimum and maximum values, counts and a histogram of how your data is distributed.
You can configure your Dataset before creating the Model in a similar fashion as the Source. Maybe you want to deselect some more fields for a specific analysis. Or you want to select a different objective field and see how the Model predicts that field. Instead of predicting a price for a house, given a neighborhood and some features, maybe you want to know in which neighborhood to look for a house, given your budget and some features. There’s often many different things that are worth predicting in a Dataset.
We love decision trees. They are simple to understand, great for predictions and nice to visualize. In this post we showed off some of the features of BigML’s decision trees.
Now you have your model and you are ready to have some questions answered. You can choose a “full form” prediction or a “dynamic” one. The full form will let you input a specific case. Just enter all the values for the various fields. You can see which result is applicable for this case, give the prediction a name – “Price prediction property 129449” – and save it. Or you opt for the dynamic prediction, where the prediction will only ask the relevant inputs as defined by the Model. At every step of this prediction we’ll show you the most likely outcome.
One Click Models
If you have a well-structured CSV file or a Dataset you are happy with, it’s possible to jump straight to the Model creation phase. Just push the (1-click model) button and you are good to go. A Model (and/or Dataset) will be created automatically.
This is all just the beginning. We have a list of features we’d like to add but you have to start somewhere. What would be the next feature you’d value most? Let us know! Leave a comment below, send an email, or connect on twitter or Facebook.
Want to get invited? The invitation page is this way.