Data Science for Good, Part 3 Data Science for Good, Part 3
Data Science for Good, Part 3

This is the third and final part of a three-article series about Data Science for Good. To recap, the first article explains what Data Science for Good is and how you can get involved. The second article discusses the institutions and projects that have come about in this field as well as the problems that they strive to solve. In this final installment, we will introduce some of the key people, research and resources in the field of Data Science for Good.

A recompilation of the links shared in these articles can be accessed via this Github repository. Feel free to make suggestions to include them on the repository of Data Science for Good.

Introduction

In many ways, it can be said that Data Science for Good was invented in the 18th century. How could this be? The field of data science was not even “invented” until the 1960s-1970s. One has to understand the philosophical context of the field to see why the roots of Data Science for Good reside in the 1700s, through the rise of the Enlightenment. The principles of the Enlightenment are foundational components of Data Science of Good due to the similarity of mindsets for those in each field.. Principles such as reason, fact-based science, and humanism, align quite well with the spirit of using data for societal good.

I came upon this realization while reading Enlightenment Now by Steven Pinker. It is a worthwhile book, that will update your knowledge of the world to its current status through the lens of the principles of Enlightenment. The book summarizes and explains in a very straightforward manner many important findings in societal topics such as life expectancy, inequality, the environment, poverty, health, wealth, human rights, etc. Throughout these explanations, the book brings to light many truths that are misconstrued to readers such as myself. Let me put it this way: If I were running a Data Science for Good company, this would be one of the books I would give to everyone in the company.

Free image from www.gapminder.org

People

While there are plenty of researchers, developers, scientists, businesspeople and others to highlight in the Data Science for Good space, there are a few that stand out to me. The below professionals I highlight due to their leading mindsets and work to illustrate and resolve many of the most prominent problems that society faces such as poverty, global health and wealth distribution.

Hans Rosling

To me, Hans Rosling, was one of the great data-educators of our times. Sadly, we lost him in February 2017. In his passionate talks, he communicated that the state of the world we live in is different from the state of the world we learned in school many years ago. He could make you change your mindset with his datasets.. His talks have millions of views worldwide. In this documentary Hans shows how the first Global Goal, to end global poverty, is possible to achieve by 2030. Rosling strived to show audiences from all walks of life how the state of the world changes through the use of data.

Check out more of his talks here, here, and here.

Thomas Piketty

Thomas Piketty is an economist with some of the leading work on global wealth distribution analysis. His research on income, wealth distribution, and inequality is well explained in his best-selling book Capital in the Twenty-First Century. You can watch a summary talk here.

Angus Deaton

Professor (Sir) Angus Deaton has researched various aspects of poverty, inequality, health, wellbeing, and economic development through the decades of his career. As a result of the impact he has made in these fields, he was awarded the Nobel Prize in Economic Science in 2015. Deaton is also the author of the book The Great Escape where he does quantitative analyses on health, wealth, and the origins of the inequality.

Research

There are hundreds of different organizations as well as researchers throughout the world that make it their lives’ mission to leverage the field of data science to solve some of the most pressing problems that humanity faces. While each group or person has a special place in the data science ecosystem, there are a few groups that rise above.

The Frankfurt Big Data Lab

The Frankfurt Big Data Lab does research based in the principles stated in their open letter of Data for Humanity. You can sign it. The five ruling principles are:

  1. Do no harm
  2. Use data to help create peaceful coexistence.
  3. Use data to help vulnerable people and people in need.
  4. Use data to preserve and improve natural environment.
  5. Use data to help create a fair world without discrimination.

If you agree or identify with Frankfurt Big Data Lab’s leading principles or their vision, you can sign the open letter as well.

Professor Dirk Helbing

Prof. Dirk Helbing, ETH Zurich, has several essays and papers around the implications of Big Data and digital technologies on our current and future society. The Automation of Society is Next and Thinking Ahead are both interesting readings about Big Data and Society.

The FAT* Conference

The FAT* Conference a Conference on Fairness, Accountability, and Transparency. Between 2014 and 2017 was called FATML (for Fairness, Accountability, and Transparency in Machine Learning), they bring together researchers and practitioners in these topics.

The Alan Turing Institute

The Alan Turing Institute, recently named as the National Data Science Institute for the UK. They “Believe that data science will change the world.” They are in a mission of training the next generations of data science leaders for the public good.

The Edinburgh Futures Institute

The Edinburgh Futures Institute is an example of the use of data to improve societal issues, its slogan of “Where Data meets Society” represents the spirit of the use of science for the greater good of humanity.

Resources

A common starting point is using open data. Starting with the assessment of data that is publicly available acts as a good starting point to see how to extract insights or create a solution for a particular problem. One of the many ways that data is becoming openly available is via the Open Government Data Principles in which governments are making specific datasets available to its citizens.

Ideally, it would be wonderful if all data were open but it is understandable that there are still limitations of some datasets such as those from private companies or datasets that hold sensitive information. These limitations, however, inspired the creation of platforms such as the Data Collaboratives which offers a way to bypass blocks to specific dataset accessibility. Data Collaboratives is an initiative of GovLab that is self-described as a “new form of collaboration, beyond the public-private partnership model, in which participants from different sectors – in particular – companies- exchange their data to create public value.”

Data.World is a similar platform to that of Data Collaboratives in the sense that it is a platform where you can analyze and share your datasets and analyses with others. It works like a social network for data enthusiasts. You can use SQL or SPARQL and the site integrates with many data tools. Read Linked Data and Data Science for more information.

Here is a list of interesting websites with data sources for your ideas in how to collaborate with others for data science project:

  • Open Corporates, a website with information of more than 140 millions companies that stands as the largest open database of companies around the world. You can search in the website or connect using their API.
  • Open Street Map, is a project that maps the world and is maintained by volunteers around the world. It has valuable information that you can visualize in layers or download it for your projects.
  • The Humanitarian Data Exchange, is a data portal with more than 6,600 datasets maintained by OCHA, the UN Office for the Coordination of Humanitarian Affairs.

Conclusions

Data Science for Good is a wake up call to reason about and make sense of the world we live in based on facts, data and extracted insights. In this series of articles, we see many initiatives, organizations and teams working on collaboration, research, and innovation in the data science field to solve and attempt to solve global issues. One key commonality that all these projects and initiatives hold is that they seek to increase people’s awareness and understanding of how our ways of life impact those around us on a social, economic, political and cultural level.

As the Enlightenment did, data science for good opens the door for us to start taking care of our future selves.

My expectation is that initiatives derived from data science will start challenging current systems such as the economic, healthcare, and pension systems of different societies. Efforts from data science projects can determine how to improve wealth distribution ranges, food supply systems, resource allocation processes and distributions leveraging simulations and collected data. In addition, data science can help political systems adapt & adopt better public policies especially as more data is made public. There are countless possibilities for how data science can aid societies in remedying different types of inequalities existing throughout the world and aid in creating sustainable solutions future generations. The exciting thing to keep in mind is that this is only the beginning.

Diego Arenas

Diego Arenas, ODSC

I've worked in BI, DWH, and Data Mining. MSc in Data Science. Experience in multiple BI and Data Science tools always thinking how to solve information needs and add value to organisations from the data available. Experience with Business Objects, Pentaho, Informatica Power Center, SSAS, SSIS, SSRS, MS SQL Server from 2000 to 2017, and other DBMS, Tableau, Hadoop, Python, R, SQL. Predicting modelling. My interest are in Information Systems, Data Modeling, Predictive and Descriptive Analysis, Machine Learning, Data Visualization, Open Data. Specialties: Data modeling, data warehousing, data mining, performance management, business intelligence.

Open Data Science - Your News Source for AI, Machine Learning & more