Gregory Piatetsky-Shapiro is the founder of KDnuggets, which is a leading resource on business analytics, big data, data mining, and data science. The KD in KDnuggets stands for “Knowledge Discovery,” and Gregory is a foremost expert on the subject — as founder of the KDD (Knowledge Discovery in Database) Conferences, and as an author of over 60 publications, with over 10,000 citations, including 2 best-selling books and several edited collections on topics related to data mining and knowledge discovery.
As the fields of knowledge discovery, data mining and machine learning continue to evolve and overlap, we thought it would be interesting to connect with Gregory to get his thoughts and perspective on key trends in the marketplace through a brief Q&A:
BigML: The “KD” in KDnuggets stands for “Knowledge Discovery” – how has knowledge discovery changed since your launch of KDnuggets in 1997? How has hype around big data impacted knowledge discovery?
Gregory Piatetsky-Shapiro (GPS): The term Knowledge Discovery in Data, or KDD, was adopted in the scientific community, and is now part of the names of several conferences, including KDD (the leading research conference in this area, US-based), ECML/PKDD – European-based conference, PAKDD – Pacific/Asia -based conference, the ACM Transactions on Knowledge Discovery in Data (TKDD), and others. However, the term “Knowledge Discovery” did not catch up in the business world, where “Data Mining” was much more popular (1996-2005) , and then it was supplanted by “Analytics” starting in 2006, and now the hottest term is “Big Data” in the popular press and “Data Science” among researchers. At the last KDD conference in Chicago, most attendees, including me, referred to themselves as data scientists, not “data miners” or “knowledge discoverers”.
BigML: In the past there’s been some acrimony between the data mining and machine learning communities. Knowledge discovery seems to be a common goal of both approaches – how do you see the two approaches contributing to knowledge discovery?
GPS: I see that the KDD / data mining / data science community is now more interested in working with big data / HPC, statistics and optimization communities as they have additional tools and methods to contribute. I don’t really see any hostility between machine learning and data mining. A big part of machine learning is interactive – learning for robots, cars, etc. that make dynamic decisions – those researchers deal with a separate class of problems. Machine learning on static data deals with essentially the same problems as data mining and the latest breakthroughs like deep learning are used in ML and KDD communities.
BigML: In your experience, what tools and skills are best suited for knowledge discovery? How do they differ from the tools and skills of data scientists?
GPS: “Data Science” is the latest name for the field of “Knowledge Discovery” , so the tools and skills are the same.
Data Scientists need to have a combination of 3 skills:
1) Math/Statistics,
2) Coding/Hacking, and
3) Business knowledge – understanding of the domain
See also Drew Conway’s Data Science Venn Diagram. For analysis of tools and skills on LinkedIn see also slide 41 of my presentation “Analytics Education in the Era of Big Data“
BigML: Do you see cloud-based platforms such as BigML being helpful for knowledge discovery?
GPS: Yes, cloud analytics platforms like BigML will enable data science to work on much larger data.
BigML: What does the future hold for knowledge discovery?
GPS: The future for knowledge discovery and data science is very bright. “Data Scientist” has been proclaimed as the sexiest job of the 21st century. The amount of data is rising exponentially, and who but data scientists will help businesses, governments, and organizations to make good use of big data?