Text modeling with R, Python, and Spark

Tags: , ,

Text analysis with R, Python, and Spark, using the State of the Union Address and Congressional Hearings

Frank D. Evans, data scientist at Exaptive, provides a conceptual and technical look at text analysis on big data with open source tools. His fodder is 70 years of State of the Union Addresses, from Truman to Obama, and 20 years of Congressional hearing transcripts. Using R, he performs text clustering to identify trends. Using Python, he goes a step further to create topic models that identify commonalities and differences between presidencies. Then combining Python with Spark, he topic models a data set 2500 times as big. He’ll explain the statistics methods behind each analysis and show how he implemented them in code. He’ll also explain how to produce plots or even some live data applications to let others explore the modeled data.

Link to slides: http://www.slideshare.net/frankdevans
Github repo: https://github.com/frankdevans/odsc_meetup_text_processing


About the Speaker

Frank D. Evans (@frankdevans) is a data scientist with an expertise in Big Data, especially text analysis. His work spans financial, behavioral, and political analytics. Frank recently gave a TEDx talk on how data science can fix gerrymandering.

Text analysis with R, Python and Spark

Wednesday, Feb 10, 2016, 6:00 PM

WeWork
(South Station) 745 Atlantic Ave Boston, MA

313 Data Miners Went

6 – 6:30 PM – Networking6:30 – 8 PM – Presentation and Q&ATitle & Abstract:Text analysis with R, Python, and Spark, using the State of the Union Address and Congressional HearingsFrank D. Evans, data scientist at Exaptive, provides a conceptual and technical look at text analysis on big data with open source tools. His fodder is 70 years of Sta…

Check out this Meetup →