Text modeling with R, Python, and Spark
Text analysis with R, Python, and Spark, using the State of the Union Address and Congressional Hearings
Frank D. Evans, data scientist at Exaptive, provides a conceptual and technical look at text analysis on big data with open source tools. His fodder is 70 years of State of the Union Addresses, from Truman to Obama, and 20 years of Congressional hearing transcripts. Using R, he performs text clustering to identify trends. Using Python, he goes a step further to create topic models that identify commonalities and differences between presidencies. Then combining Python with Spark, he topic models a data set 2500 times as big. He’ll explain the statistics methods behind each analysis and show how he implemented them in code. He’ll also explain how to produce plots or even some live data applications to let others explore the modeled data.
Link to slides: http://www.slideshare.net/frankdevans
Github repo: https://github.com/frankdevans/odsc_meetup_text_processing
About the Speaker
Frank D. Evans (@frankdevans) is a data scientist with an expertise in Big Data, especially text analysis. His work spans financial, behavioral, and political analytics. Frank recently gave a TEDx talk on how data science can fix gerrymandering.