Intro to Text Mining Using tm, openNLP and topicmodels – Ted Kwartler ODSC Boston 2015
Intro to Text Mining Using tm, openNLP and topicmodels from odsc You will learn how modern customer service organizations use data to understand important customer attributes and how R is used for workforce optimization. Topics include real world examples of how R is used in large scale operations to... Read more
Making R Go Faster and Bigger – Jared Lander ODSC Boston 2015 The features of R that make it easy to use–dynamically typed, in-memory analysis, the interpreter engine and REPL–can also slow it down. Fortunately the R Core Team has made dramatic improvements in recent years with better memory management and faster interpretation of code. We look at some of... Read more
Probabilistic Programming in Data Science – Thomas Wiecki ODSC Boston 2015 There exist a large number of metrics to evaluate the performance-risk trade-off of a portfolio. Although those metrics have proven to be useful tools in practice, most of them require a large amount of data and implicitly assume returns to be normally distributed. Bayesian modeling is a statistical... Read more
Recurrent Neural Networks for Text Analysis – Alec Radford ODSC Boston 2015
Recurrent Neural Networks for Text Analysis from odsc Recurrent Neural Networks hold great promise as general sequence learning algorithms. As such, they are a very promising tool for text analysis. However, outside of very specific use cases such as handwriting recognition and recently, machine translation, they have not seen... Read more
Machine Learning for Suits – Rahul Dave ODSC Boston 2015
Machine Learning for Suits from odsc You will learn the basic concepts of machine learning – such as Modeling, Model Selection, Loss or Profit, overfitting, and validation – in a non-mathematical way, so that you can ask for data analysis and interpret the results of a model in the... Read more
On Demand Analytic and Learning Environments with Jupyter – Kyle Kelley and Andrew Odewahn ODSC Boston 2015 The Jupyter/IPython project has been building systems to enable collections of users to work on a shared system within their team, lab, and on a wide web audience. There is the multi user server JupyterHub, the temporary notebook system (tmpnb), blossoming Google Drive integration (jupyter-drive), notebook spawning in... Read more
Adventures in Using R to Teach Mathematics – Paul Bamberg ODSC Boston 2015
Adventures in using R to teach mathematics from odsc In 2014 I launched a new course, “Mathematical Foundations of Statistical Software,” in the Harvard Extension school, aimed at students with a solid background in calculus. Lectures were a mixture of proofs and R scripts, all homework was done in... Read more
A Hybrid Approach to Data Science Project Management – Elaine Lee ODSC Boston 2015
A Hybrid Approach to Data Science Project Management from odsc In recent years, Data Science evolved into its own profession as a response to the proliferation of data that needed to be analyzed and made actionable — a job that could not be adequately addressed by any single one... Read more
Jumping to Conclusions – Richard Robehr Bijjani ODSC Boston 2015
Jumping to Conclusions from odsc Data Science is the study of the extraction of knowledge from data. What if we extract partial or inaccurate knowledge? This illusion of knowledge would lead us to make wrong decisions, with sometimes disastrous consequences such as in the case of medical diagnosis, security... Read more
Machine Learning Based Personalization Using Uplift Analytics: Examples and Applications – Victor Lo ODSC Boston 2015
Uplift Modeling Workshop from odsc Traditional randomized experiments allow us to determine the overall causal impact of a treatment program (e.g. marketing, medical, social, education, political). Uplift modeling (also known as true lift, net lift, incremental lift) takes a further step to identify individuals who are truly positively influenced... Read more