Data Science for Good
BlogData Science for GoodTech Updatesposted by Diego Arenas, ODSC September 15, 2017 Diego Arenas, ODSC
“Knowledge for knowledge’s sake is just as barbarous as hatred of knowledge. [Knowledge must be] tamed to fit with life, so that one may live what one has learned.” – Nietzsche.
Data for Good or for Social Good is referring to the use of Big Data and Data Science to solve humanitarian and social problems. In academia, we can find referred as well as Data for Humanity.
The same algorithms that companies use to decide what ads show to you, can help to improve people’s lives.
Reading this article you will know who is doing what and where in the world of data for social good. I will describe some organisations and projects you could join. Some people to follow and read their work and research. And some advice to build your own projects or host your own events around Data for Good.
Data for Good
In my experience, I’ve encountered many people interested in being of help but lacking the instances to do so. This post will fill this gap in information around Data for Good.
DataKind works with charities, social enterprises, NGOs, or third sector in general. They provide support in a wide spectrum of projects. Their motto is “Harnessing the power of data science in the service of humanity.” They have several chapters around the world.
DataKind is in Bangalore, Singapore, San Francisco, New York, UK, Dublin and Washington. These chapters run local meetups every month or every couple of months gathering people in one evening for talks and data work. You can apply to work with them here.
If you are a data enthusiast you can contact them to volunteering in a long-term project with a charity. You could help them to organise DataDives. In the DataDives they partner with other organisations to work on their problems. The organization provides datasets and states their challenges. Then data scientists will work on the challenges during a weekend. Also, you can submit your own projects.
If you work in the third sector, you could contact them during their open office hours every week or by filling the form on their website.
DrivenData is a data science competitions website for good causes. They partnering with institutions working on social issues, in their words they “…find real-world questions where data science can have positive social impact.” You can take a look here to the active competitions running right now. Check out and participate in one of their last challenges around early detection of lung cancer.
UN Global Pulse is a United Nations innovation initiative on Big Data. The idea is to use big data for sustainable development and humanitarian action. UN Global Pulse has three labs around the globe in Jakarta, Kampala, and New York. They work on several projects. Their work is usually, but not limited to, around the 17 Sustainable Development Goals for 2030, defined by the UN. Check out their projects and their resource library (the reports are very good).
Open Data For Development or OD4D in short, is a global partnership funded by IRDC, the government of Canada, The World Bank, and the UK Department for International Development. “…OD4D is scaling Open Data approaches that work, improving transparency and accountability, service delivery and the well-being of the poorest and most marginalized”.
OD4D is a source of research around Open Data providing data sources and ideas to apply Data Science for a better world.
The university of Chicago, Data Science for Social Good is a summer program aimed to train data scientists to work on social problems. They also organise a conference around Data Science and Social Good since 2016.
Projects & Problems
If you are running out of ideas or problems to solve here I present to you The Sustainable Development Goals for 2030 defined by the UN in 2015. Sustainable Development means, in words of Prof Charles Hopkins, “…the balance of Economic and Social development without compromising the Environment for us and future generations, also considering social justice, equity, human rights, transparency, intergenerational responsibility.”
The SDGs are 17 Global Goals with 169 targets. This aim all countries and anyone can work on them.
Inequality is one of the biggest problems nowadays, health and wealth distribution. Also, climate change urges for solutions and efforts from citizens and parties involved. There are datasets and studies available but is still an issue in many parts of the world today. Good luck with your projects.
Hans Rosling, for me a great world-data-facts-educator. He passed away in February 2017. In his passionate talks, he communicated that our world is changing. The state of the world we live in is different from the state of the world we learned at school many years ago. He could make you change your mindset with his datasets. His talks have millions of views worldwide. In this documentary, Hans shows how the first Global Goal, to end global poverty, is possible to achieve by 2030.
Check out more of his talks here, here, and here.
Thomas Piketty is an economist. His research around income, wealth distribution, and inequality are well explained in his book Capital in the Twenty-First Century. You can watch a summary talk here.
Professor Angus Deaton, Nobel Prize in Economic Science in 2015. Author of the book The Great Escape where he does quantitative analyses on Health, Wealth, and the Origins of the Inequality. A recommended reading.
The Frankfurt Big Data Lab does research based on the principles stated in their open letter of Data for Humanity, you can sign it. The five ruling principles are:
- Do no harm
- Use data to help create peaceful coexistence
- Use data to help vulnerable people and people in need
- Use data to preserve and improve natural environment
- Use data to help create a fair world without discrimination.
Prof. Dirk Helbing, ETH Zurich, has several essays and papers around the implications of Big Data and digital technologies on our current and future society. The Automation of Society is Next and Thinking Ahead are both interesting readings about Big Data and Society.
FATML stands for Fairness, Accountability, and Transparency in Machine Learning. FATML is a machine learning conference that brings together researchers and practitioners in these topics.
Open Data plays an important role in Data for Good. Governments, for several years now, have been releasing datasets. Open data by default is the first principle of the Principles of Open Government Data. A commitment made in 2015 by the countries of the G8. Followed by the countries of the G20. And by many countries that are publishing and opening their datasets. Use updated versions of census data and economic surveys on your analysis.
The Open Data Institute and The Open Knowledge Foundation are working on Open Data. Check them out.
A list of data sources and websites with resources:
- Our World in Data
- Google Public Data Explorer
- World Bank Open Data
- Data and Social Good free ebook
- Center for Humanitarian Data
- IBM Science for Good
What you can do
There are several things you can do. From personal projects to join data science teams for long-term projects. You could apply to work for an organization working on problems that matters to you. Also, you could offer your time pro bono (short for “pro bono publico“) which means “for the public good“.
If you want to start your own project(s), here is a list of things to consider before you start:
- Find your interest.
- Start with a question
- Find data and Explore it
- Show your results to domain experts and others.
- Use their feedback and repeat
- Release/Publish your findings and methodology
Your interest must be something you are keen to work on. You will put effort and dedication and be quitting is a risk if you don’t have a strong commitment to the tasks.
The question to answer should be something the people you are working with don’t know. Then, think what kind of data you need to answer that question. Search in open data portals include “data source” in your searches. Make sure you use valid and updated data.
Explore the data and do the analysis you think will work based on the problem and the type of data you have. Show the results to domain experts and use their feedback to improve your analysis. Finally, share your results (if you are allowed to), the code and the method.
In a recent talk by Barton Poulson at the ODSC West in Boston, MA. He made suggestions to organize your own event: “Ask for clear questions to answer. Prepare analysis templates. If possible, prepare data. Follow up the results and insights from the event” he said.
Another thing you can do is volunteering in an NGO. Helping out with their database issues and trying to apply data science to their data.
Often, organisations in the third sector have access to full licensed software but they lack capacitated people to make the best use of them. Software companies grant free licences or at a reduced price for the use of their software to NGOs. Even with that kind of concessions, it is estimated that they are about 5 to 10 years behind in the application of technologies.
Many non-profit organisations would accept your collaboration. Make sure to contact them and mention your skills and availability. Also, suggest projects to work with them.
“Only about 3% of the world population gets to go to higher education, but that 3% with higher education are the people who probably will be responsible for more than 85% to 90% of the shapers of the world of the future” are words by Prof Charles Hopkins at Edinburgh University in 2015 about the UN SDG. You can make a difference. There are options for you to collaborate and use your data skills (in wild demand in the for-profit sector) to work in global and social problems that matter to our society.
Data Science works deep with quantitative analysis, maths and statistics, computer sciences are given in the field. But to follow the path of Data Science for Good we should include Humanities and Social Sciences as a working group.
Personal Ethics and Data Science have a big responsibility and also impact on its surroundings and future world, we must become more responsible and accountable for our actions and efforts.
Diego Arenas, ODSC
I've worked in BI, DWH, and Data Mining. MSc in Data Science. Experience in multiple BI and Data Science tools always thinking how to solve information needs and add value to organisations from the data available. Experience with Business Objects, Pentaho, Informatica Power Center, SSAS, SSIS, SSRS, MS SQL Server from 2000 to 2017, and other DBMS, Tableau, Hadoop, Python, R, SQL. Predicting modelling. My interest are in Information Systems, Data Modeling, Predictive and Descriptive Analysis, Machine Learning, Data Visualization, Open Data. Specialties: Data modeling, data warehousing, data mining, performance management, business intelligence.
Report: ChatGPT Banned in Italy Due to Privacy Concerns – Other European Nations Closely Watching
AI and Data Science Newsposted by ODSC Team Mar 31, 2023
FTC Files Compliant Against OpenAI’s ChatGPT
AI and Data Science Newsposted by ODSC Team Mar 31, 2023
13 More Companies Leading the Way in AI and Data Science
East 2023Conferencesposted by ODSC Team Mar 31, 2023