fbpx
Data Science for Good, Part 1 Data Science for Good, Part 1
Introduction This is the first a three-article series about Data Science for Good. This article explains what what this idea is about and how... Data Science for Good, Part 1

Introduction

This is the first a three-article series about Data Science for Good. This article explains what what this idea is about and how you can get involved in it. The second article we’ll introduce people, organizations, and projects that use data science for good. The third and last article discusses resources and technological tools that serve that purpose.

What is Data for Good?

Data for Good, is the use of data science and big data to solve social and humanitarian problems. In academia is also known as data for humanity.

Motivation

Every organization, for-profit or non for-profit, needs to make decisions on a daily basis. We can agree that decisions supported by data, are effective and repeatable, improving the way organizations are managed. But even when all organizations are making decisions, there is a gap, on the of use of technologies, between private and social organizations.

This gap has been observed in several studies, some of them mentioned in the Appendix 6 of the report “Data Evolution Project” elaborated by Data Orchard and DataKind UK. Not to mention, that the social sector is barely or not mentioned on the main surveys about data science, or big data that we all read.

It is estimated that the social sector is around 5 years behind of private companies in areas such as business intelligence, data warehousing, machine learning, and artificial intelligence.

In words by Jeff Hammerbacher, co-founder of Cloudera, and one of the first employees at Facebook: “The best minds of my generation are thinking about how to make people click ads.”

Jeff’s motifs weren’t specifically about Data for Good but were quite parallel. How can we better spend our time working on problems that matter. Data science and big data represent great opportunities, why not use them on problems that really matter? If we can get the data, there may be a solution.

The same algorithms that companies use to decide what ads shows to you, can help to improve people’s lives.

In the paper Machine Learning that Matters, from 2012. K. L. Wagstaff noticed that: “Many machine learning problems are phrased in terms of an objective function to be optimized…”,  “It is as if we have forgotten, or chosen to ignore, that each data set is more than just a matrix of numbers.” 

What you can do?

The value organisations can get from data analysis comes from a combination of the right people, technology, and culture. People plus technology transform organizational culture, transitioning to a data-driven culture. Organizational culture is highly shaped by the people working at the organization, and when technology impacts the way people perform their tasks, this causes an impact on the culture of the company.

Nowadays, technology is not the problem, there are many open source solutions available, also free user accounts are often offered to NGOs by commercial software providers. The problem is the lack of skilled data people reaching out to organizations and projects that matter to them.

This is the same problem that for-profit organizations face when they constantly search for data scientists, data engineers, and data analysts to hire. But the experience of working or volunteering in an NGO or social enterprise is most of the time different than working for the company that hired you.

There are several things you can do. From personal projects to joining teams for long term projects. You could get hired by an organization working on problems that matters to you or offer your services in a pro bono capacity.

Recipe for help

  1. Find your interest(s).
  2. Start with a question
  3. Find data and explore it
  4. Show your results to expert domains and others.
  5. Consider their feedback and repeat
  6. Publish your method/process, findings, and insights

How to set up your projects? Find what you are interested about, something you really would like to work on, it’s important to match your interests because it will require effort and dedication.

Once you find your interest, start with a question, something you don’t know, something people who you are working with you doesn’t know.

Find data, think about what kind of data you need to answer your question. There are open data portals, make sure the data you are using is valid, you can do that by checking the metadata.

In a recent talk by Barton Poulson at the ODSC West in Boston, MA, suggested organizing your own event where you ask for clear questions to answer, provide analysis templates and prepared data.

There are ways to collaborate and help using Data.

  • Individual projects. Data Citizen focuses on social problems.
  • Volunteer for a NGO. Helping out with their database issues and trying to apply data science to their data.
  • Participating in data challenges related with social problems, join hackathons or datathons that present problems that matters to you.
  • Join an organization like DataKind or the UN Global Pulse.

There are several social companies working in the third sector, you could contact them and propose projects to work with them, you need ideas?

You can read this about five principles for applying data science for social good by the founder of DataKind. In addition, check this podcast out with Emma Prest from DataKind UK about Data Science for Good.

Conclusions

In my experience I’ve encountered many people interested in being of help but lacking of the instances to do so.

This post presents a definition for data for good, the current gap of the social enterprises and organizations when we talk about the use of technology.

And finally advices to encourage data scientists and citizen data scientist to pursue the subjects that matters to them and to make a difference.

In the next post we will talk about organizations and projects going on to give you some ideas to start collaborating in the world of Data Science for Good.

 

©ODSC2018

Diego Arenas

Diego Arenas, ODSC

I've worked in BI, DWH, and Data Mining. MSc in Data Science. Experience in multiple BI and Data Science tools always thinking how to solve information needs and add value to organisations from the data available. Experience with Business Objects, Pentaho, Informatica Power Center, SSAS, SSIS, SSRS, MS SQL Server from 2000 to 2017, and other DBMS, Tableau, Hadoop, Python, R, SQL. Predicting modelling. My interest are in Information Systems, Data Modeling, Predictive and Descriptive Analysis, Machine Learning, Data Visualization, Open Data. Specialties: Data modeling, data warehousing, data mining, performance management, business intelligence.

1