Fresh off the news on the opening of our new European headquarters, we are excited to make public that BigML has completed the acquisition of the groundbreaking Association Discovery software Magnum Opus. First released fifteen years ago, and progressively refined since, Magnum Opus has delivered reliable and actionable insights for retailers, financial institutions and numerous scientific applications and embodies the state-of-the-art in the field of association discovery. Consequently, this acquisition is a significant step forward in BigML’s vision to build the world’s premier cloud-based Machine Learning platform including carefully curated, most effective algorithms and data mining techniques that have already proven their mettle on complex real-world predictive analytics problems.
As part of the acquisition, world-renowned expert on Association Discovery and this year’s ACM SIGKDD Sydney Conference program co-chair Geoff Webb has joined BigML as Technical Advisor. Dr. Webb is a Professor of Information Technology Research in the Faculty of Information Technology at Monash University of Melbourne, where he heads the Centre for Data Science. He was editor in chief of the premier data mining journal, Data Mining and Knowledge Discovery, for ten years. He is co-editor of the Springer Encyclopedia of Machine Learning, a member of the advisory board of the Statistical Analysis and Data Mining journal, a member of the editorial board of the Machine Learning journal, and was a foundation member of the editorial board of ACM Transactions on Knowledge Discovery from Data. Dr. Webb is an IEEE Fellow and has received the 2013 IEEE ICDM Service Award and a 2014 Australian Research Council Discovery Outstanding Researcher Award.
Association discovery is one of the most studied tasks in the field of data mining. Stated simply, association mining identifies items that are associated with one another in data. Historically, far more attention has been paid to how to discover associations than to what associations should be discovered. Having observed the shortcomings of the dominant frequent pattern paradigm, Dr. Webb developed the alternative top-k associations approach. Magnum Opus employs the unique k-most-interesting association discovery technique as it allows the user to specify what makes an association interesting and how many associations s/he would like. The available criteria for measuring interest include lift, leverage, strength (also known as confidence), support and coverage. This approach effectively reveals the statistically sound, new and unanticipated core associations in the data whereas most other association discovery tools find so many spurious associations that it is next to impossible to find useful associations amongst the dross. Association mining complements other statistical data mining techniques in a number of ways as it:
- Avoids the problems due to model selection. Most data mining techniques produce a single global model of the data. A problem with such a strategy is that there will often be many such models, all of which describe the available data equally well. Association mining can find all local models rather than a single global model. This empowers the user to select between alternative models on grounds that may be difficult to quantify for a typical statistical system to take into account.
- Scales very effectively to high-dimensional data. The standard statistical approach to categorical association analysis (i.e. log-linear analysis) has complexity that is exponential with respect to the number of variables. In contrast, association mining techniques can typically handle many thousands of variables.
- Concentrates on discovering relationships between values rather than variables. This is a non-trivial distinction. If someone is told that there is an association between gender and some medical condition, they are likely to immediately wish to know which gender is positively associated with the condition and which is not. Association mining goes directly to this question of interest. Further, association between values, rather than variables, can be more powerful (discover weaker relationships) when variables have more than two values.
- Strictly controls the risk of making false discoveries. A serious issue inherent in any attempt to identify associations with classical methods is an extreme risk of false discoveries. These are apparent associations that are in fact only artifacts of the specific sample of data that has been collected. Magnum Opus is the only commercial association discovery software to provide strict statistical control over the risk of making any such errors.
The BigML product team has already started charting the path to a seamless integration of Magnum Opus capabilities into our platform in 2015. This means effective immediately, we will NOT be offering new Magnum Opus licenses or downloads. Existing Magnum Opus licensees will be supported as usual. Additional blog posts, a lecture series by Dr. Webb and more information on the integration timeline will be provided in the coming weeks so please stay tuned.