The Rise of Notebooks Extended
Tools & Languagescommunity interactionsOpen Datarapidsposted by RAPIDS July 5, 2019 RAPIDS
I recently had the privilege of presenting a workshop at the AI + Education Curiosity Conference 2019. There, I demonstrated to educators, school district staff, researchers, and students how RAPIDS software enables students to learn and iteratively practice data science using full datasets all within classroom time constraints. Compared to current methods and workarounds, which can take overnight to execute or only use subset samples of data, RAPIDS empowers students with more accurate results, the ability to collect data, and hands on experience trying out data science recipes or testing algorithm performance. Basically, it took RAPIDS two minutes to run a GPU-based (single Titan RTX) UMAP (blog) on the 60,000 image Fashion MNIST dataset while I answered a few questions. Conversely, I had to extend our 15 minute break to fully demonstrate the multi-core Xeon CPU run in UMAP.
[Related Article: Three Challenges for Open Data Science]
However, all of this performance doesn’t matter if users can’t get started or find the necessary RAPIDS documentation and foundational tutorials. In the past, our developers have expressed challenges with our docs in their pursuit of mastering RAPIDS, especially with our aggressive release cycle as we journey to 1.0. We launched a entire new documentation update (https://docs.rapids.ai/) with our 0.6 release. And with our 0.7 release and in support of our growing community, we’d like to share our new Notebooks Extended repo on GitHub.
The Birth and Growth of Notebooks Extended
After I joined, my job was to break things. I happen to be great at my job. Too great at times. I set my breaking skills loose on our notebooks which are included in our standard containers shipped out to our enterprise customers via NGC and DockerHub. These notebooks must provide examples, some libraries, and be small and light. The more notebooks we added the more libraries we needed. This posed a challenge. For our customers who require air-gapped machines, they need to download a complete, working container with all the example data and libraries inside. Our new notebooks required libraries and data sets that needed to be installed from external sources. These customer couldn’t use the containers. We had two choices:
- Add more libraries and data sets to our containers, which could lead to container bloat and maintenance issues.
- Remove all notebooks that needed external libraries and data, which was very restrictive for the future.
We picked choice two. And out of choice two was born our Notebooks Extended Repo. It was a silent yet important birth, as it provided the foundation for greater open source freedoms. You can think of this as the “RAPIDS Community’s” notebooks. Today, Notebooks Extended adds user-centric restructuring, easy docker integration, and fostering meaningful community interactions.
User centric restructuring: built for you to grow
The Notebooks Extended repository has been structured to provide data practitioners a place to grow their skills and teach others what they’ve learned. It is broken down into 3 skill levels:
– Beginner: which teaches newcomers how RAPIDS works and what it can do. It contains “hello worlds”, easy to digest examples to aid in transitioning from a CPU workload to a GPU workload, and where we will be posting the “RAPIDS Machine Learning Cookbook” notebook series, spearheaded by Paul Hendricks. Consider it middle to high school level content
– Advanced: Which helps practitioners piece together their own data pipelines and recipes for their needs by providing sample workflow ideas through examples, blog and conference content, and benchmarking tools with data scientist friendly sets. This should satisfy the IT hobbyist/college to enterprise data scientist level.
– Expert: where practitioners learn how to unleash the full performance of their RAPIDS workflows. Examples here will showcase best practices to tame real world data, accelerate large scale deployments, and kernel tweaks to boost output. This is enthusiast level content – here be dragons. Go wild.
Easy Docker Integrations
We distribute base RAPIDS via Conda installs and Docker containers, which serve our enterprise users. These containers require everything to be frozen, tested, and always working. But we want Notebooks Extended to be a constantly growing and adapting repository that serve our community. Freezing code wouldn’t work well with the constantly updating stream of community contributions, so Notebooks Extended exists outside of the container. We’ve made it easy to include and stay up to date with Notebooks Extended in your preferred RAPIDS container with 3 easy steps :
Step 1: Download your RAPIDS container.
Step 2: Git pull Notebooks Extended into the folder of your choice (change “#/folder/of/your/choice” into wherever you desire Notebooks Extended to be).
Step 3: Run Docker mounting your Notebooks Extended folder as a volume, changing “/folder/of/your/choice/” to where you put Notebooks Extended.
When you open Jupyter in the container, you will see a folder called “extended” with the Notebooks Extended repo assets there. Git updating works on the host machine via your favorite method. Updating docker containers won’t interfere with the notebooks. This models the deployment of Notebook Extended to how we build and test notebooks against our nightlies (and we love it). You have the flexibility to be as bleeding edge as you want to be with this repo. To use previous versions of Notebooks Extended, do a
git checkout to the relevant commit.
[Related Article: Open Dataset of the Day – Oregon’s List of Pesticides Prohibited from Use on Cannabis Plants]
Built for Enhanced Community Interactions
What’s more, Notebooks Extended will be the place we invite Community collaborators to show their stuff and contribute their ideas within this structure. At this point, about half the content in Notebooks extended was built by the Community and not the RAPIDS team. We look forward to becoming the authorship minority. We expect that Notebooks Extended will be the go-to place where latest tips and tricks are pushed for publication and traded, and where budding RAPIDS practitioners grow and master RAPIDS. Not only have we heard of exciting notebook series’ being proposed, but we also are going to expand the ways of tracked community contributions, such as keeping a list of community videos in the multimedia notebook. We can’t wait to see what else you all creatively contribute!
For now, contribute your feedback! Are we moving in the right direction? Is there anything missing from our documentation? What would you want to see in Notebooks Extended? How would you use these resources? Are you interested in contributing? Let us know!
[Originally Posted here]