Intro to Text Mining Using tm, openNLP and topicmodels – Ted Kwartler ODSC Boston 2015
You will learn how modern customer service organizations use data to understand important customer attributes and how R is used for workforce optimization. Topics include real world examples of how R is used in large scale operations to...
Making R Go Faster and Bigger – Jared Lander ODSC Boston 2015
The features of R that make it easy to use–dynamically typed, in-memory analysis, the interpreter engine and REPL–can also slow it down. Fortunately the R Core Team has made dramatic improvements in recent years with better memory management and faster interpretation of code. We look at some of...
Probabilistic Programming in Data Science – Thomas Wiecki ODSC Boston 2015
There exist a large number of metrics to evaluate the performance-risk trade-off of a portfolio. Although those metrics have proven to be useful tools in practice, most of them require a large amount of data and implicitly assume returns to be normally distributed. Bayesian modeling is a statistical...
Recurrent Neural Networks for Text Analysis – Alec Radford ODSC Boston 2015
Recurrent Neural Networks hold great promise as general sequence learning algorithms. As such, they are a very promising tool for text analysis. However, outside of very specific use cases such as handwriting recognition and recently, machine translation, they have not seen...
Machine Learning for Suits – Rahul Dave ODSC Boston 2015
You will learn the basic concepts of machine learning – such as Modeling, Model Selection, Loss or Profit, overfitting, and validation – in a non-mathematical way, so that you can ask for data analysis and interpret the results of a model in the...
On Demand Analytic and Learning Environments with Jupyter – Kyle Kelley and Andrew Odewahn ODSC Boston 2015
The Jupyter/IPython project has been building systems to enable collections of users to work on a shared system within their team, lab, and on a wide web audience. There is the multi user server JupyterHub, the temporary notebook system (tmpnb), blossoming Google Drive integration (jupyter-drive), notebook spawning in...
Adventures in Using R to Teach Mathematics – Paul Bamberg ODSC Boston 2015
In 2014 I launched a new course, "Mathematical Foundations of Statistical Software," in the Harvard Extension school, aimed at students with a solid background in calculus. Lectures were a mixture of proofs and R scripts, all homework was done in...
A Hybrid Approach to Data Science Project Management – Elaine Lee ODSC Boston 2015
In recent years, Data Science evolved into its own profession as a response to the proliferation of data that needed to be analyzed and made actionable — a job that could not be adequately addressed by any single one...
Jumping to Conclusions – Richard Robehr Bijjani ODSC Boston 2015
Data Science is the study of the extraction of knowledge from data. What if we extract partial or inaccurate knowledge? This illusion of knowledge would lead us to make wrong decisions, with sometimes disastrous consequences such as in the case of medical diagnosis, security...
Machine Learning Based Personalization Using Uplift Analytics: Examples and Applications – Victor Lo ODSC Boston 2015
Traditional randomized experiments allow us to determine the overall causal impact of a treatment program (e.g. marketing, medical, social, education, political). Uplift modeling (also known as true lift, net lift, incremental lift) takes a further step to identify individuals who are truly positively influenced...