Top 10 Signs of the Textpocalypse – Steve Cohen ODSC Boston 2015
Top 10 Signs of the Textpocalypse from odsc Human-generated text may be the next frontier for big data analysis, but we humans are complicated beasts and the text we generate is messy and complicated in ways that can confound analysis. We'll describe the top ten mistakes people make when...
Machine-in-the-loop for Knowledge Discovery – Max Kleiman-Weiner ODSC Boston 2015
Machine-In-The-Loop for Knowledge Discovery from odsc I'll present the new knowledge discovery tools we are building at Diffeo. Unlike traditional search engines that use keywords, Diffeo provides an in-browser knowledge base that accelerates information gathering about people, companies, chemical compounds, cyber events, or other real world entities. I'll describe...
Beyond Names – Gregor Stewart ODSC Boston 2015
Beyond Names from odsc Finding and classifying the mentions of the things named in text, often called Named Entity Recognition or NER, is a fundamental task in many search and analysis applications. Mature, robust NER technology is available for many languages and domains, from people, places, and products, to...
Domain Expertise and Unstructured Date – William Macmillan & Evan Schnidman ODSC Boston 2015
Domain Expertise and Unstructured Data from odsc Data science allows us to turn a dark forest into a world of perpetual twilight by giving us the tools to better understand the data that surrounds us. Unfortunately, in this world of twilight we still need a flashlight to get a...
Intro to Text Mining Using tm, openNLP and topicmodels – Ted Kwartler ODSC Boston 2015
Intro to Text Mining Using tm, openNLP and topicmodels from odsc You will learn how modern customer service organizations use data to understand important customer attributes and how R is used for workforce optimization. Topics include real world examples of how R is used in large scale operations to...
Recurrent Neural Networks for Text Analysis – Alec Radford ODSC Boston 2015
Recurrent Neural Networks for Text Analysis from odsc Recurrent Neural Networks hold great promise as general sequence learning algorithms. As such, they are a very promising tool for text analysis. However, outside of very specific use cases such as handwriting recognition and recently, machine translation, they have not seen...
Vector Space Word Representations – Rani Nelken ODSC Boston 2015
Vector Space Word Representations – Rani Nelken PhD from freshdatabos NLP has traditionally mapped words to discrete elements without underlying structure. Recent research replaces these models with vector-based representations, efficiently learned using neural networks. The resulting embeddings not only improve performance on a variety of tasks, but also show...