The Art of Data Science from odsc Keynote Presenter Bio Josh Wills is Cloudera's Senior Director of Data Science, working with customers and engineers to develop Hadoop-based solutions across a wide-range of industries. He is the founder and VP of the Apache Crunch project for creating optimized MapReduce pipelines...
Frontiers of Open Data Science Research from odsc Keynote Presenter Bio Ani loves writing about herself in third person and has written this all true bio. Ani is a Data Scientist for the Digital Platforms Group in McGraw-Hill Education company. She has a diverse educational background (some say she...
Machine Learning for a Pet Insurance Company from odsc As an insurance company, we receive a monthly premium from policy holders and in return, we pay claims on veterinary bills. Insurance risk for pet health is relatively uncharted territory; identifying key patterns can affect the company in a big...
Feature Engineering from odsc One of the most important, yet often overlooked, aspects of predictive modeling is the transformation of data to create model inputs, better known as feature engineering (FE). This talk will go into the theoretical background behind FE, showing how it leverages existing data to produce...
A Hybrid Approach to Data Science Project Management from odsc In recent years, Data Science evolved into its own profession as a response to the proliferation of data that needed to be analyzed and made actionable — a job that could not be adequately addressed by any single one...
Doing open data science on government financials is not easy. A lot of the info is not, well, open. The good news is that data on government spending, borrowing, pensions and the like exists, but often lies hidden in bulky PDFs that are difficult to work with. In my...
http://bit.ly/Odewahn_KelleyPresentation The Jupyter/IPython project has been building systems to enable collections of users to work on a shared system within their team, lab, and on a wide web audience. There is the multi user server JupyterHub, the temporary notebook system (tmpnb), blossoming Google Drive integration (jupyter-drive), notebook spawning in...
Machine Learning for Suits from odsc You will learn the basic concepts of machine learning – such as Modeling, Model Selection, Loss or Profit, overfitting, and validation – in a non-mathematical way, so that you can ask for data analysis and interpret the results of a model in the...
Recurrent Neural Networks for Text Analysis from odsc Recurrent Neural Networks hold great promise as general sequence learning algorithms. As such, they are a very promising tool for text analysis. However, outside of very specific use cases such as handwriting recognition and recently, machine translation, they have not seen...
http://bit.ly/ThomasWieckiPresentation There exist a large number of metrics to evaluate the performance-risk trade-off of a portfolio. Although those metrics have proven to be useful tools in practice, most of them require a large amount of data and implicitly assume returns to be normally distributed. Bayesian modeling is a statistical...