KDD continues to be one of the most interesting data science conferences and this year’s event in San Francisco did not disappoint. With 163...

KDD continues to be one of the most interesting data science conferences and this year’s event in San Francisco did not disappoint. With 163 talks and over 2,700 attendees it is one of the largest and most important academic conferences in knowledge discovery and data science. In speaking with attendees, most were in agreement that the presenter lineup is second to none. The incredible effort the speaker committee puts into paper and speaker selection shows throughout the conference, and as one of the oldest and most respected data science conferences, it continues to attract some of the world’s top minds in the field. In fact, we were able to pose a few questions to Nando de Freitas of Google Deepmind fame and you can read his reply later in this post.

One of the unique aspects of KDD is that it’s a healthy mix of academic/research talks combined with applied data science talks. I especially enjoyed the talk by Danny Shapiro on how conventional computer vision is approaching its limits for advanced driver assistance systems (ADAS) and in order to develop a truly autonomous car, adaption of deep learning and artificial intelligence is necessary.

The evening poster sessions continue to be a favorite. Attendees gather around a poster on a particular KDD topic, which comprises both research and applied data science. It’s a great way for people to network around a particular area of interest. In the network sessions it’s obvious that many attendees were from outside the United States (74% according to KDD), which is a testament to the lure of KDD and the openness of the conference.


Even if you didn’t attend the conference I’d highly recommend reviewing the KDD accepted papers list. Many come with the links to both the experiment and open source projects in github and certainly worth a browse. One paper I found particularly intriguing Smart Reply: Automated Response Suggestion for Email. The paper describes the use of a state-of-the-art deep LSTM model, which is a kind of recurrent neural network that can predict full responses given an incoming email message.  


Another interesting paper titled “FRAUDAR: Bounding Graph Fraud in the Face of Camouflage” ranked amongst the award winners. By outlining the connections between data and highlighting relatedness, graphs and graph databases become especially useful in uncovering patterns and fraud. However, online review fraud can be particularly challenging when fraudsters use a technique called camouflage that involves, among other methods, hijacking a legitimate reviewers account. The authors tested their metrics by satisfying a series of intuitive “axioms” to prove that the metrics had several advantages for identifying suspiciousness.


Finally, a special thanks to the folks at KDD for facilitating some questions we posted to Nando de Freitas, KDD Keynote and senior staff research scientist at Google DeepMind. By all accounts it was a very insightful talk on deep learning. You can find the link here.


1.    What makes the KDD conference so important?

KDD attacks two of the most important problems in science and society at large: Knowledge Discovery, and Data science. It brings together scientists, engineers, VCs, companies, government organizations and other folks, while creating ample opportunity for interaction,  energizing creative efforts, allowing for the transfer of ideas, and creating many opportunities for business. It also addresses societal issues of great relevance including sustainability, healthcare, medicine, and improved public policies on a wide range of topics where data can guide us to make better decisions. – Nando de Freitas

2.    What is your main message to the many young data scientists attending KDD and are about to start their careers?

Be optimistic, learn as much as you can, and be prepared to adapt and lead the world in a responsible way. – Nando de Freitas

3.    What is one of the biggest challenges facing data science that concerns you?

There are many technical issues as well as ethical issues ahead. Data science has proved to be very powerful, and with power comes a need for transparency and responsibility. We have witnessed vast improvements in data management, computing and artificial intelligence in recent years. This technology offers the hope of reducing the number of deaths on the road, cure diseases that we have been failing to cure for decades, advance healthcare, and improve water, food and energy management. But the technology can also be poorly used and thereby create problems such as unemployment and inequality. I feel our biggest challenge is to ensure this technology is used to benefit all of us and to ensure our environment is looked after responsibly. I was delighted to see so many workshops and sessions at KDD attacking this issue. A more immediate challenge is to change our culture so that the fields of data and computer science improve their diversity with regard to gender and race. This will require education at all levels and a commitment by tech companies, government organizations and the media. – Nando de Freitas

4.    What direction do you see the field of deep learning taking in the next five years?

The trends indicate major advances in control and industrial automation, dialog systems and evidence-based reasoning using vast databases and world measurements, generative models for images and other media, agents with memory, and multi-agents that learn to communicate and use language so as to solve problems in a distributed manner. There will be many applications without any doubt, but I am also hoping for some breakthroughs in our understanding of intelligence and our brains. – Nando de Freitas


©ODSC 2016