Hackathon Results, Gates Foundation & ODSC
BlogData Science for GoodData Science for Good|ODSC London 2016posted by Gordon Fleetwood October 20, 2016 Gordon Fleetwood
Health. It’s perhaps the thickest thread that runs throughout the human race. What more could we relate than the continuous, futile war against our mortality? In certain areas of the world this war is fought with the most advanced weapons known to man – the best medicines, the best doctors, the best technology. In others, battles are constantly fought while in full retreat. These are the battles that the eyes of the Bill and Melinda Gates Foundation are focused on. These are the soldiers that they want to give a fighting chance.
At the recent ODSC UK conference, ODSC joined forces with the foundation to hold a hackathon to explore the impact Data Science could have in this arena. The hackathon encompassed data in two areas concerning childhood development in Earth’s poorest areas. The first challenge, Project Ultrasound, sought to see if the trajectory of brain development could be inferred from measurements of somatic growth. The second, Project Myelination, took a step back temporally. It involved predicting fetal weight from ultrasound measurements. The seven teams started the day with a speech from Sofia Trommlerova, a development economist with a special interest in child health and human development, and then set to their task. Five of the seven chose the second challenge. Teams 1 and 6 attacked the quest filled with the most missing data. About seven hours later, it was time to display the fruits of their labor.
Team 4 won the day with some impressive feature engineering (and the best team name: “Weight for it”). This intensive work lead to an absolute error of 6.5% after cross validation, and less than 20% error in most regions of the test set. Team 3 followed with a full Data Science workflow capped by 7.4% absolute error after going through the modeling building process and log transforming variables. Team 7 came in third after focusing on how prediction quality varies with the timing of the ultrasound – early or late. Team 1’s approach to dealing with Project Myelination’s mountains of missing data was to cleverly devise their own imputation method based on the mean. They also removed outliers before modeling with Linear and Lasso Regression. Among the other teams some unsupervised learning was applied through the use of clustering and Principal Component Analysis. There is definitely more room for exploration on that horizon.
The analyses produced in such a brief time interval were a tribute to hard work and ingenuity. These two characteristics will be essential components of future Data Science efforts to help arm the less unfortunate with the knowledge necessary for good health.
The Teams. And some notable comments.
Team 1 – Blue Waters “Correlation of somatic and brain development with cognitive development in children”
Team 2 – The Corner Table “Ultrasound”
WINNER: Team 4 – Ebury Labs “Weight for it… Weighting Gestation” – created interesting features, 10 extra features, created smart ratios btw the different indicators, followed data science protocol. Summary: created the most interesting features (from the point of view of medicine), followed a nice protocol (from the point of view of data science) Project link.
1st Runner up: Team 3 – Critical Care “Happy Moments” – managed to accomplish a whole data mining project, explored the data thoroughly, created features, created different log transformations of indicators, all data science standards were met, missing values were dealt with. Comment: very thorough work
Team 5 – Foetal Favourites “Challenge 2”
Team 6 – Team 6 “Predict Cognitive Score”
Team 7 – Pandas “Challenge 2”
Chris Fregly is a Research Scientist at PipelineIO – a Streaming Analytics and Machine Learning Startup in San Francisco. He’s also an Apache Spark Contributor, Netflix Open Source Committer, Founder of the Global Advanced Spark and TensorFlow Meetup, and Author of an Upcoming Book and Video Series on Spark and TensorFlow.
Ajit’s work spans research, entrepreneurship and academia relating to IoT, predictive analytics and Mobility. His current research focus is on applying data science algorithms to IoT applications. This research underpins his teaching at Oxford University (Data Science for Internet of Things) and ‘City sciences’ program at UPM(Madrid). Ajit is also the Director of the newly founded AI/Deep Learning labs for Future cities at UPM(University of Madrid).
Rafe is an independent Startup mentor working with various organisations such as Startup Grind. He has produced and run numerous accelerator programs for corporate clients such as John Lewis and DPD. His main focus is design thinking and promoting accelerators. Rafe graduated from University College London, U. of London.
Ankur Modi is the CEO and co-founder of StatusToday, which was recently recognized as one of UK’s hottest Artificial Intelligence start-ups by Business Insider and TechWorld. Ankur left India at 17 to study Computer Science from Jacobs University in Germany and subsequently received further training in Psychology from Oxford University. Prior to StatusToday, Ankur was a project manager, data scientist and engineer at Microsoft in Denmark and Ireland.
Amanda Schierz is a Data Scientist from the UK and was one of the first users of Kaggle. She is a Kaggle Grandmaster and Prize Winner and has had several Top 10 finishes. After working in Computational Biology and on the Robot Scientist project, Amanda is now a data scientist at DataRobot.