ODSC was started with a desire to build a community around data science and to make it more accessible. Sheamus McGovern, the founder of ODSC, said: “in this attempt, we owe a debt of gratitude [to the open-source data science community].” Open source work is really the backbone of collaboration and advancement within the community, and so, ODSC is giving credit where credit is due, and hosting the ODSC West 2019 Data Science Award ceremony. This year, we gave awards recognizing companies that significantly contributed to the staples of data science that represent some of the most important aspects of the community.
Here’s some info on each of the ODSC West 2019 Data Science Award winners:
Open Source Deep Learning – TensorFlow:
Martin Wicke, a software engineer on Google’s TensorFlow team received the award, “I was gonna do a little dance, but the mic is messing up so I’ve gotta be still,” he laughed. Already a commonly-used tool for data scientists, TensorFlow is a leading end-to-end open-source platform for machine learning. Its comprehensive and flexible landscape of tools, libraries, and community resources allows data scientists to create machine learning-powered applications to the next level. TensorFlow’s history is rich, as it was initially developed by researchers and engineers working on the Google Brain team within Google’s Machine Intelligence Research organization with the goal of researching ML and DNN applications. On the present and future of the company, Wicke elaborated, “Tensorflow 2.0 is now available, which is a big step for us because TensorFlow was early to the game so we’ve learned a lot since it launched, so 2.0 is a huge step forward for us.”
[Related article: What is TensorFlow?]
Open Source Machine Learning – mlr:
mlr is a machine learning package for R that provides an interface to many other packages. This framework provides supervised methods like classification, regression, and survival analysis along with their corresponding evaluation and optimization methods, as well as unsupervised methods like clustering. Additionally, the package is nicely connected to the OpenML R package and its online platform, to encourage online collaboration within machine learning projects.
Pictured left to right: Sheamus McGovern (Founder of ODSC), Caitlin Augustin (DataKind), Lars Kotthoff (MLR) and Scott Lundberg (SHAP). Not pictured: Martin Wicke (TensorFlow)
Lars Kotthoff explained how impressive the developments mlr has made actually are. “We are all volunteers working on this in our spare time, we have no funding. But all investors, if you’re interested in funding an award-winning—award-winning” he said, and picked up the ODSC award, “project, we’re happy to take your money.” He also explained the launch of mlr3, the next generation of mlr, which is completely redesigned and re-integrated. It will include: “almost everything you loved in mlr, support for spatial and temporal data, data backends, integration with bayesian optimization hyperband, racing, open ML, pipelines for preprocessing, feature selection, ensembles, stacking, graphs that they themselves can be trained, you can build your own automated data analysis, data cleaning solutions, auto ml systems, etc.” They’re also planning on offering visualizations probabilistic learning, functional data analysis, and deep learning, among a few other things.
Open Source Data Science Project – SHAP
SHAP (SHapley Additive exPlanations) is a combined approach to explain the output of any machine learning model. SHAP connects game theory with local explanations, representing the only possible consistent and locally accurate additive feature attribution method based on expectations. In this article by Gabriel Tseng, he explains how useful SHAP is for interpreting as a result of its accuracy and speed: “the SHAP library, [is a] powerful tool to uncovering the patterns a machine learning algorithm has identified.” Scott Lundberg, who accepted the award for SHAP said, “Interpretability is important because you might not want to trust something an algorithm is saying, but you don’t want to just throw it out entirely and rely on a person. So the algorithm needs to be interpretable, and that’s where SHAP comes in.”
Data Science for Good – DataKind
DataKind brings top data scientists and leading social change organizations together to collaborate on cutting-edge analytics and advanced algorithms that can maximize social impact. The projects they’ve worked on include everything from identifying food bank dependency, to using predictive modeling to boost college success, to using mobile surveys to give rural women a voice. If that’s not enough to get you excited, DataKind’s Director, Caitlin Augustin who received the award said they’re launching, “Datakind 2025. [We’re] moving away from just dabbling in projects across the board, they actually have the funding now to focus on major issues within the same space and go deep to understand the issues.” They’d love for people to reach out and get involved with the mission.
It’s exciting to see the field of data science evolve in real-time and to learn more about where these projects and groups will go in the future. We’re confident that the winners will continue to pave the way for further developments that contribute to the open-source nature of data science.