Using Spark, Python, and Parquet for Loading Large Datasets – Douglas Eisenstein ODSC Boston 2015
Spark, Python and Parquet from odsc Have you been in the situation where you’re about to start a new project and ask yourself, what’s the right tool for the job here? I’ve been in that situation many times and thought it might be useful to share with you a... Read more
Intro to Text Mining Using tm, openNLP and topicmodels – Ted Kwartler ODSC Boston 2015
Intro to Text Mining Using tm, openNLP and topicmodels from odsc You will learn how modern customer service organizations use data to understand important customer attributes and how R is used for workforce optimization. Topics include real world examples of how R is used in large scale operations to... Read more
Bridging the Gap Between Data and Insight using Open-Source Tools – Nicholas Arcolano ODSC Boston 2015
Bridging the Gap Between Data and Insight using Open-Source Tools from odsc Despite the proliferation of open-source tools for analysis (such as Python and R) and those used for visualization (such as Javascript / D3), there often exist significant gaps between these areas, and those of us trying to... Read more
Vowpal Wabbit – Paul Mineiro ODSC Boston 2015
Vowpal Wabbit from odsc Vowpal Wabbit is both an open-source machine learning toolkit and an active research platform. In this talk I introduce Vowpal Wabbit, discuss some of the design decisions, and the types of problems for which VW is (or is not) a good fit. The talk includes... Read more
Monary: Really fast analysis with MongoDB and NumPy – Anna Herlihy ODSC Boston 2015
Monary from odsc “MongoDB is a scalable, flexible and easy to use way of storing large data sets. Python and NumPy provide a comprehensive toolkit for data analysis. Unfortunately they don’t work together as well as they could: the official Python driver for MongoDB, PyMongo, is inefficient at loading... Read more
Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of Data-Parallel Graph Analytics (Application to Bioinformatics) – Brad Bebee ODSC Boston 2015
Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of Data-Parallel Graph Analytics (Application to Bioinformatics) from odsc From social networks to protein networks to financial transactions, graphs are everywhere. Graph Analytics represent a key tool for data science to take advance of this type of network information. Many... Read more
Data Workflows for Iteration, Collaboration, and Reproducibility – David Chudzicki ODSC Boston 2015
http://www.davidchudzicki.com/slides/odsc-2015-workflow/ For other data scientists to improve, build on, or even just trust your analysis, they need to be able to reproduce it. Even if you have shared code and data, reproducing your analysis may be difficult: which code was executed against which data in what order? And even... Read more
Predictive Modeling Workshop – Max Kuhn ODSC Boston 2015
Predictive Modeling Workshop from odsc The workshop is an overview of creating predictive models using R. An example data set will be used to demonstrate a typical workflow: data splitting, pre-processing, model tuning and evaluation. Several R packages will be shown along with the caret package which provides a... Read more
Making R Go Faster and Bigger – Jared Lander ODSC Boston 2015
http://bit.ly/JaredLanderPresentation The features of R that make it easy to use–dynamically typed, in-memory analysis, the interpreter engine and REPL–can also slow it down. Fortunately the R Core Team has made dramatic improvements in recent years with better memory management and faster interpretation of code. We look at some of... Read more
Probabilistic Programming in Data Science – Thomas Wiecki ODSC Boston 2015
http://bit.ly/ThomasWieckiPresentation There exist a large number of metrics to evaluate the performance-risk trade-off of a portfolio. Although those metrics have proven to be useful tools in practice, most of them require a large amount of data and implicitly assume returns to be normally distributed. Bayesian modeling is a statistical... Read more