Can You Make Your System Smarter Than a 4th Grader? Using AI to Answer Questions Can You Make Your System Smarter Than a 4th Grader? Using AI to Answer Questions
You may be smarter than a fifth grader, but the Allen Institute for Artificial Intelligence is training its machines to be... Can You Make Your System Smarter Than a 4th Grader? Using AI to Answer Questions

You may be smarter than a fifth grader, but the Allen Institute for Artificial Intelligence is training its machines to be smarter than a fourth grader in science in the Aristo Project, an artificial intelligence-based question answering system. In his lecture at ODSC East 2017, Joel Grus, a research engineer at the institute, explained the AI project and various methods to train a machine to reason.

The Aristo Project aims to create a knowledgeable machine, or a system that can process a large corpus of information and use this for reasoning. To tackle this goal, the team has worked to train a system on fourth and eighth grade standardized science exams, a feat which has proven more complex than expected due to the knowledge of science, question comprehension, and syntax understanding required to reason through what seems to be simple logic.

Previously working as a software engineer at Google and a data scientist at several startups, Grus came to the institute with a strong grasp of statistics, analysis, and development. He then was tasked with pulling on these skill sets to experiment with developing the most accurate guesser.

“I can come up with ideas for science question solvers, but not good ideas,” he joked in his talk; however, his modesty underplayed the several relatively accurate solutions he presented- that is if one could appreciate a computer receiving a failing mark as a good outcome. While Grus’ ideas only saw success at most once for every two questions, the same principles are being researched and applied throughout the world with impressive success. Earlier this year, Microsoft reported that its own questioning system beat the scores of humans on comprehension.

Want to take a shot at outsmarting yourself? Keep reading for some of Grus’ solver ideas.


The Baseline: The Random Guesser

Most can remember a day in school when they entertained the idea of randomly selecting answers in hopes of at least attaining a 25 percent, and this baseline solver for the Aristo does just that. As the name suggests, this solver picks random answers.

Accuracy: 25 percent


Solver #1: Information Retrieval

This solver assumes that the correct answer will frequently be mentioned with words in the question. For this method, question and answer pairs are searched online with elasticsearch to find which pairs appear together most frequently. The pair that returns the most number of references is then the answer; however, the phrasing of the questions and answers can highly impact the accuracy.

Accuracy: 43 percent


Solver #2: Word2Vec

Using a two-layer neural network, Word2Vec is a way of mapping words in a vector space where the distance and position of the words show the closeness of the relation. This idea is then implemented with a solver by parsing the question and answers into a word vector. A pre-trained word vector like Google News vector can produce an accuracy of 27 percent; however, training a word vector from science sentences can then lead to a better system.

Accuracy: 38 percent

Solver #3: Pointwise Mutual Information (PMI)

The pointwise mutual information of words can be found by looking at the frequency of both words appearing both apart and together.

PHOTO CREDIT: Joel Grus, ODSC East 2017


This solver calculates a word’s PMI to balance the likelihood of finding phrases together versus seeing each of the words separately, which can be used to see which words are related and, thus, perhaps the correct answer.

Accuracy: 51 percent


Solver #4: Abstract Concrete Mapping Engine (ACME)

In combination with PMI, ACME associates concepts to question/answer pairs to find which phrases in the pairs appear most frequently with defined concepts from the knowledge base. While this solution gave slightly less correct answers, it has the potential to be built upon. When following the reasoning process of the machine, relevant concepts that are not even mentioned in the questions were able to be connected to find the answer.

Accuracy: 46 percent


Solver #5: Character-Level Recurrent Neural Network

Grus’s Char-level RNN uses long short-term memory sentences (LSTM) to keep state throughout the processing of the questions in the RNN. This solver did not pass the test of outperforming the odds of random guessing. Grus used long short-term memory sentences (LSTM), but in comparison to the other methods, this idea is not the most effective without a more robust and relevant dataset to train the machine.

Accuracy: 24.7 percent


Solver #6: Deep Neural Net

Probably the most complex of the ideas given, Grus briefly explained how one could use deep neural networks to train the system to associate words to the right answers; however, Grus didn’t touch much on the black box of neural networks.


While none of these solvers just yet challenge the wit of humans, these ideas are jumping grounds to use artificial intelligence, data, and creativity to create machines that can think for themselves.


How is it built?

Both the Aristo and Aristo Mini share the same foundations; however, the Aristo Mini is an open source, minified version for others to test and use while the Aristo tackles more complex, open-ended questions as well.

The systems depend upon a knowledge base of information from the Waterloo corpus, websites and science sentences to give some context. A question is then run through a solver of varying complexities with some of the simpler solutions outlined above. The effectiveness of the system is then tested by the last part, the evaluation, which then looks at the accuracy rate of the Aristo.


About the Institute


Based out of Seattle, The Allen Institute for Artificial Intelligence was founded as a non-profit in 2014 by Paul Allen, one of the founders of Microsoft, and is now run by CEO Oren Etzioni. The institute conducts research in artificial intelligence to create systems that can reason, learn and read. It currently is working on four main projects: Aristo, which was discussed in this piece; Semantic Scholar, which is an AI-based text search; Euclid, which uses a computer to answer math problems; and Plato, which is a computer vision project.

Key Takeaways:

  • The Aristo Mini is an open-source question answering system where one can develop and test different methods to train the machine.
  • Some solvers may get a higher percentage of questions correct; however, one should consider the reasoning process that the machine follows to look for signs of “learning” rather than just luck based on text matches when measuring effectiveness.
  • The accuracy of question answering systems varies greatly depending on the words and phrasing in the questions and answers.
  • To a system, even elementary-level multiple choice science questions are difficult to answer.

Jacquelyn Elias, ODSC

Jacquelyn is a recent graduate of Southern Methodist University in Dallas with a triple major in journalism, creative computation, and computer science. You'll find her now making sense of unstructured data, finding stories in spreadsheets and downing espresso as a data intern for the Atlanta Journal-Constitution. She intends to pursue a career in data or computational journalism, tinkering with coding, storytelling, and data to report the truth. https://jacquelynrelias.wordpress.com/