Google’s TensorFlow framework spread like wildfire upon its release. The slew of tutorials and extensions made an already robust ecosystem even more so. Recently, Google released one of their own extensions. It’s called SyntaxNet, a TensorFlow based syntactic parser for Natural Language Understanding.
SyntaxNet uses neural networks to model the precise relationships between words in a sentence. This is a vital concept for machines to navigate. It’s the gateway to their understanding the ambiguity in everyday language. SyntaxNet’s process is straightforward. The algorithm goes through a sentence from left to right extracting features as it moves, then feeds them into a neural network. The algorithm then does part of speech tagging, i.e assigning a label (noun, verb, etc.) to each word. The algorithm then explores possible syntax tree hypotheses to explore the relationship between sets of words, and then assigns each hypothesis a score. A comparison of these scores determines the most probable hypothesis.
The model uses these precise relationship representations to make inferences. Consider the sentence, “Bob took the book and gave it to Shion.” SyntaxNet facilitates answers to questions like, ‘Who did Bob give the book to?” and “What did Bob give to Shion?”
Interested parties can build their own SyntaxNet models from scratch. However, there is also a pre-trained version for English called Parsey McParseface. Google claims that this model is the most accurate in the world. It achieved a reported 94% accuracy on the standard benchmark.
You can start exploring SyntaxNet and Parsey McParseface here.
©ODSC 2016, Feel free to share + backlink!
Gordon studied Math before immersing himself in Data Science. Originally a die-hard Python user, R's tidyverse ecosystem gradually subsumed his workflow until only scikit-learn remained untouched. He is fascinated by the elegance of robust data-driven decision making in all areas of life, and is currently involved in applying these techniques to the EdTech space.