

Using Spark, Python, and Parquet for Loading Large Datasets – Douglas Eisenstein ODSC Boston 2015
ConferencesPythonToolsTools & LanguagesODSC East 2015|Speaker Slidesposted by Open Data Science June 15, 2015 Open Data Science

Have you been in the situation where you’re about to start a new project and ask yourself, what’s the right tool for the job here? I’ve been in that situation many times and thought it might be useful to share with you a recent project we did and why we selected Spark, Python, and Parquet. My plan is take you through a use case that involves loading, transforming, aggregating, and persisting the dataset. We’ll use an open dataset consisting of full fund holdings graciously provided by Morningstar. My goal in presenting this use case are to have the audience learn about how these technologies can be applied to a real world problem and to inspire members of the audience to start learning these technologies and applying them to their own projects.
Presenter Bio
Douglas Eisenstein is a founder of a company that’s focused on financial analytics with a vision to make systematic investors (quants) lives easier by making their data preparation/aggregation process vendor agnostic, writes/reads fast for exploratory analysis through to alpha generation, and making sure the underlying data tech stack is agile and reliable. He’s worked directly with over 24 financial companies over the past 10 years, and this has allowed him to see a lot of the tradeoffs made with off-the-shelf and homegrown solutions to address data related problems. His typical toolkit consists of Python, R, Spark, HIVE, Cassandra, TitanDB, MongoDB, and Matplotlib, what can he say he loves the challenge of finding solutions for data-driven analytic problems. Oh and he’s a CrossFit addict so don’t get him started there…
George R.R. Martin And Other Authors Sue OpenAI
AI and Data Science Newsposted by ODSC Team Sep 22, 2023
9 Open Source LLMs and Agents to Watch
Modelingposted by ODSC Team Sep 21, 2023
Conversational Data Analysis: Cutting Through the Noise to Find the Real Deal
Modelingposted by ODSC Community Sep 21, 2023