fbpx
Intro to Text Mining Using tm, openNLP and topicmodels – Ted Kwartler ODSC Boston 2015
Intro to Text Mining Using tm, openNLP and topicmodels from odsc You will learn how modern customer service organizations use data to understand important customer attributes and how R is used for workforce optimization. Topics include real world examples of how R is used in large scale operations to... Read more
Can We Automate Predictive Analytics – Thomas Dinsmore ODSC Boston 2015
Can We Automate Predictive Analytics from odsc Recent news about the pending shortage of data scientists prompts speculation about automation: will machines replace human analysts? We propose a model of automation, and briefly review progress in automated machine learning over the past twenty years. Summarizing the current state of... Read more
Opening the Doors to Innovation in Developing Countries through the Democratization of Data – Ari Hamalian ODSC Boston 2015
Opening the Doors to Innovation Through the Democratization of Data from odsc Initiatives such as a Wikipedia and the Human Genome Project have demonstrated the multiplicative positive impact that data can have when shared openly. Increasingly countries and governments across the globe have begun to embrace and recognize the... Read more
Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of Data-Parallel Graph Analytics (Application to Bioinformatics) – Brad Bebee ODSC Boston 2015
Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of Data-Parallel Graph Analytics (Application to Bioinformatics) from odsc From social networks to protein networks to financial transactions, graphs are everywhere. Graph Analytics represent a key tool for data science to take advance of this type of network information. Many... Read more
Learning to Love Bayesian Statistics – Allen Downey ODSC Boston 2015
http://tinyurl.com/lovebayes Bayesian statistical methods provide powerful tools for answering questions and making decisions. For example, the result of Bayesian analysis is a set of values and probabilties that can be fed directly into a cost-benefit analysis, which is not possible with conventional statistics. But there are several barriers to... Read more
Data Workflows for Iteration, Collaboration, and Reproducibility – David Chudzicki ODSC Boston 2015
http://www.davidchudzicki.com/slides/odsc-2015-workflow/ For other data scientists to improve, build on, or even just trust your analysis, they need to be able to reproduce it. Even if you have shared code and data, reproducing your analysis may be difficult: which code was executed against which data in what order? And even... Read more
Predictive Modeling Workshop – Max Kuhn ODSC Boston 2015
Predictive Modeling Workshop from odsc The workshop is an overview of creating predictive models using R. An example data set will be used to demonstrate a typical workflow: data splitting, pre-processing, model tuning and evaluation. Several R packages will be shown along with the caret package which provides a... Read more
Making R Go Faster and Bigger – Jared Lander ODSC Boston 2015
http://bit.ly/JaredLanderPresentation The features of R that make it easy to use–dynamically typed, in-memory analysis, the interpreter engine and REPL–can also slow it down. Fortunately the R Core Team has made dramatic improvements in recent years with better memory management and faster interpretation of code. We look at some of... Read more
Probabilistic Programming in Data Science – Thomas Wiecki ODSC Boston 2015
http://bit.ly/ThomasWieckiPresentation There exist a large number of metrics to evaluate the performance-risk trade-off of a portfolio. Although those metrics have proven to be useful tools in practice, most of them require a large amount of data and implicitly assume returns to be normally distributed. Bayesian modeling is a statistical... Read more
Recurrent Neural Networks for Text Analysis – Alec Radford ODSC Boston 2015
Recurrent Neural Networks for Text Analysis from odsc Recurrent Neural Networks hold great promise as general sequence learning algorithms. As such, they are a very promising tool for text analysis. However, outside of very specific use cases such as handwriting recognition and recently, machine translation, they have not seen... Read more