fbpx
The Potential of Communities and Machine Learning for Good The Potential of Communities and Machine Learning for Good
For any technology to be successful, it needs to move from the early adopter market segment to the majority, i.e., crossing the... The Potential of Communities and Machine Learning for Good

For any technology to be successful, it needs to move from the early adopter market segment to the majority, i.e., crossing the chasm. Up until now, machine learning has been primarily in the hype phase and adoption has been mostly driven by the early adopters and innovators. I envision that in the next few years ML (and in general AI) will move from the hype phase to the correction phase (Figure 1) if the focus is kept on user adoption and value creation.

However, to move to the majority phase — major challenges need to be overcome.

Challenges to Overcome

High Development Cost

The salaries of data scientists are exuberantly high, for example, a data scientist in California can earn somewhere around $150,000 and up. However, there remain major challenges with product development that cannot be solved by a single person; e.g., data gathering and preparation.

For machine learning to be widely used by a wide range of mission-driven organizations, the cost of development has to come down. To achieve that, we can take ideas from two areas: open source development community and decentralized product development.

With today’s technological advancement and online tools, talent can be accessed worldwide. For example, a few months ago I have started working on a machine learning project to improve roof-top solar adoption, where we are using decentralized, and open source product development approaches. We have around 50 highly engaged junior ML engineers contributing and developing the products. Tasks are announced in the community and students can take up those tasks, and start working on them under the supervision of an expert. To make the process smooth, we are following a product development protocol (figure 2). This approach works very well as a lot of knowledge is already available online, and the work of the expert ends up being just mentoring. We have seen that the product development cost has been reduced by a factor of three.

High Development Cost

The salaries of data scientists are exuberantly high, for example, a data scientist in California can earn somewhere around $150,000 and up. However, there remain major challenges with product development that cannot be solved by a single person; e.g., data gathering and preparation.

For machine learning to be widely used and adopted by a wide range of mission-driven organizations, the cost of development has to come down. To achieve that, we can take ideas from two areas: open source development and decentralized product development.

With today’s technological advancement and online tools, talent can be accessed worldwide. For example, a few months ago I have started working on a machine learning project to improve roof-top solar adoption, where we are using decentralized, and open source product development approaches. We have around 40 highly engaged junior ML engineers contributing and developing the products. Tasks are announced in the community and students can take up those tasks, and start working on them under the supervision of an expert. To make the process smooth, we are following a product development protocol (figure 2).

Using the above approach we have seen that the product development cost has been reduced by a factor of three.

Intellectual Property Concerns

One of the concerns about open source projects is IP — how to protect the IP if the team is decentralized and somewhat open. A few years ago it would have been a valid argument as IP was driven by ownership of code, but in the era of Machine Learning, IP is driven by ownership of data. So as long as one has the data, the IP is protected.

What we are following is a hybrid model of open source and traditional software development (closed team). A selected community of data scientist and data engineers are selected, tasks with bounties are announced to the community, and people from the community can take up the tasks. The code is open sourced and the open source protocols are being followed.

In this way, the data is kept within a small community and yet leveraging the strengths of an open source development.

Figure 3: A screenshot of the community slack channel where tasks are being announced and are being taken up by the students.

Data and Trust

A community behind a product can give access to a large amount of data. The wisdom of the crowd, fueling diversity through people from different backgrounds and locations, can result in innovative approaches to gather and work with the data. For specific projects, members can even bring in their data through, e.g., images, music, movie recommendations, text and so on. For the solar project mentioned above, we are also using a community-driven approach to gather data. The engineers are creating tagged data for their given task, thus giving access to a large amount of tagged data.

Figure 4. Masked images generated by the community.

A community can also help to build trust. Companies that emerge from communities share common values, beliefs, and often a bigger vision that serves the long-term interests of those communities. Though, intrinsic motivation can play a more important role than in traditional company settings.

“The company’s interests are for the short term. The community’s interests are for the long term.” — Seth Godin

Table 1 summarizes the three approaches of development

I firmly believe that the future of AI and Machine Learning will not be driven by the ‘elites’ but by the community through grass root movement. We need a global community across countries, ingraining different values, and perspectives, to build great products that augment us and solve pressuring problems in today’s and tomorrow’s world. The development of the solar project is a classic example of a grass root movement. We are working with students from all over India, who are contributing to building the machine learning models. Most of the code is open sourced, and we regularly add more contributors to the project. Even the data is being generated by the community of enthusiasts. I believe this is the direction that future development of intelligent products should follow.

In the next post, I will show some more results and outputs from work done in the community-driven solar project.

Rudradeb recently published a book titled “Creating Value with Artificial Intelligence.” The Kindle and the Paperback version of the book are available on Amazon. Connect with him via LinkedIn if you want to get a free copy of the book.

Rudradeb Mitra

Rudradeb Mitra

Author of Creating Value with AI (https://amzn.to/2MuuEOh) | 6 startups | 10 yrs as an AI Engineer/Researcher

1