The COVID-19 pandemic has super-charged the trend of remote work around the world. Rather than a continued gradual transition towards remote-friendliness, many industries have been thrown head-first into the remote-first experiment. While it seems feasible to think that this would have ended poorly from a productivity standpoint, studies suggest that the net effect has been neutral or even positive in some cases . Combine these results with the fact that the post-pandemic world will get kids back into schools and extracurriculars (read: distractions), and you can imagine that the parental portion of the workforce will only increase productivity.
But while the aggregate results look positive, still, many hurdles exist on the way to achieving remote collaboration self-actualization for data science organizations. While tools for video conferencing (Zoom, Webex, etc.) and document editing (Google Suite, Microsoft Office, etc.) have long been present to enable virtual collaboration, solutions for seamless remote collab around data are still very much developing.
Enter Deepnote, with its Jupyter-backed and cloud-based shared notebook environments. These environments have several features that foster collaboration, from live editing to Docker customization to external integrations. While many of these features deserve independent recognition for the benefits they provide, others fit perfectly for the needs of a global collaborative workforce. Let’s take a look at some of them in turn, specifically the following:
– Teamwide Code Repository Integration
– Interactive, Reproducibility Code Debugging
– Technical Document Collaboration
– Stakeholder Empowerment
Teamwide Code Repository Integration
What was your startup process the last time you saw a Jupyter notebook that you wanted to run through or reproduce? You probably did some combination of repository cloning, local execution, and debugging. What you really want in this scenario is a one-hop way to “make this happen in my browser now”. There are two ways in which we can accomplish this, depending on who owns the repository.
Notebooks Owned By Others
For launching somebody else’s notebook, you simply need to construct a URL as follows:
For example, say you just found @norvig’s 2018 Advent of Code work and want to work off of it. Since the URL for that notebook is the following:
your launch link would be the following:
Navigating to that URL will result in a Deepnote environment set up with that Jupyter notebook initialized. The environment will be on Deepnote’s cloud hardware and ready for immediate execution, without your laptop’s fan making as much noise as a commercial airplane.
Notebooks Owned By You
On the flip side, you may be the owner of a repository or notebook and want to enable simple access for your teammates. To do this, you can invite by email, send the link directly, or add a launch button to your repository’s readme.
When inviting others by email, you have control over the permissions granted to each collaborator. The permission given can be View, Comment, Edit, or Execute. This granular control is great to invite a large and diverse audience to the environment. Many simply need to see it, while others want to run it themselves and provide feedback for improvement. With these permission bands, most use cases are covered, from single-owner sharing to full-team editing to teacher-to-classroom education.
For simpler use cases where all external stakeholders of the notebook need only a single level of access (e.g. View or Comment), you can also just create a share URL that grants that level of access to anyone who has the link. This approach works great when widely distributing access to an environment, so you don’t end up managing a massive email list in your sharing tab.
To add a launch button in a code repository, All you need is the link to your repository’s Deepnote environment and your choice of a button to display. You can see choices for the SVG buttons here.
You may have noticed that we jumped from having a Github repo to suddenly having a Deepnote environment for that repo. To set it up, you could use the URL option I mentioned above. However, I prefer to directly integrate a new Deepnote environment with the Github repo of choice.
This integration comes right out of the box and allows you to use the environment’s terminal to perform all of your
push, etc repo management directly! This is really convenient, as it requires much less computation and disk space from your local machine. Your entire workflow (and that of your teammates) can stay in the browser.
Now that your Deepnote environment is created, all you need to do is place the linked button in your readme for everyone to use. Here’s an image below of an example of a button that links to a notebook detailing Microsoft’s Hummingbird library. Note that the SVG itself is typically used in the Markdown or reStructuredText readme.
Don’t use Github, or focused on just sharing data? Check out all of the other integrations offered, including AWS S3, Snowflake, and Google BigQuery.
Interactive, Reproducible Code Debugging
Code issue reproducibility is hard. It’s no wonder that bug reports are typically closed at “Cannot Reproduce.” There are many different factors that contribute to the presence of a bug. These factors can be tied to the OS, dependency structure, the code, or something else entirely. To combat this, code maintainers put structure around bug reports, asking the reporters to provide “minimum reproducible code.” The report draft usually asks for other standard environment information, resulting in a template for all bug reports to follow.
While a great step forward, there is still much to be desired. For example, infamous “copypasta” errors still occur, where the reporter fails to accurately provide a snippet that re-creates the bug (or works at all!). This, of course, frustrates code maintainers, and places doubt on the credibility of the bug claim. Additionally, many bug report templates fail to ask for all of the information required to faithfully reproduce the issue! The report is only as good as the template.
Debugging shared code can be much smoother if the minimum reproducible code to create the issue is provided as a link to a notebook environment which is guaranteed to produce the issue. This way, the repository maintainer can just click the “Open in Deepnote” button provided in the report and immediately see the issue. As an added benefit of this workflow, the reporter can use the isolated notebook environment to ensure that the issue can indeed be reproduced. Naturally, some issues are so sporadic that it will be difficult to consistently reproduce. We’ll leave these as the exception and not the rule.
Deepnote environments work great for this setup because they provide customization over the OS, hardware specifications, system state, and code state.
– Dockerfile -> OS and system state
– Machine Choice -> Hardware specifications
– Notebook -> Code state
Additionally, interactive code review, pairs coding, and debugging sessions are seamless! Say goodbye to awkward video conferencing calls like these:
“Ok, now you need to call the function with
“No, not that method, the other one!”
“Wait…scroll back up, please.”
“Too far. A little bit down, now.”
In live shared environments, as shown below, you can have a much more direct and collaborative debugging session. Immediate visual feedback lets your work progress at the speed of the conversation, and the interactivity helps everyone involved bring the group to a solution.
Technical Document Collaboration
I’ve been in numerous projects where I’ve given feedback such as “This is great work! For the final bit, consider plotting <> instead of <>”. Then, the presenter takes that feedback as an action item and a 24–48 hour delay occurs before all stakeholders see the new plot. When creating technical documents and presentations with Deepnote environments, teammates can immediately jump in and make small changes.
Yes, this sounds a lot like document collaboration tools made by Google or Microsoft. The critical difference lies in how Deepnote environments integrate the technical product and the documentation/presentation layers under one hood. For example, many of my colleagues like to use separate tools for technical results (Jupyter/plotting libraries), presentation (Google Slides), and whitepaper (Google Doc/LaTeX). All three of these things could be in a single Deepnote environment! One notebook for each aspect. A notebook for technical results is straightforward; typical data science workflows are chock full of these. For presentation, many Jupyter-based solutions already exist, from native to extensions like RISE. For paper drafting, simply follow the same process as for shared simultaneous code development! Markdown, LaTex, and HTML are all supported out of the box.
On the subject of horizontal integration, it’s worth mentioning specifically how useful it is to leave comments and opinions in-place in someone’s work, without changing the core content of it. I have frequently reviewed work that was a Jupyter notebook with a link to a Google doc for discussion and feedback. In these cases, the Google doc has been chosen to get the great iterative feedback loop of multiple reviewers making comments and suggestions. With comments built into Deepnote notebooks, there’s no need to context switch to a different tool to get these real-time feedback capabilities.
Besides plots and charts that have some built-in interactivity like mouse-hover displays, we can put even more control in the project stakeholders’ hands. Provide them a link to your Deepnote environment, which lets them tweak any high-level parameters as they see fit. Based on the stakeholder’s technical proclivity, you can make this interface as simple as you want, all the way to some widget buttons and sliders. As stated previously, the most frequent response in technical presentations is usually “how would these results change if <> was <> instead of <>?” By empowering stakeholders to satisfy their curiosity asynchronously (read: not on your time 🙂 ), you ensure stakeholder confidence while staying efficient.
Data science teams usually are connected to several other teams within a company. In fact, many organizations center their data scientists in the org structure (more on that here). Aside from parametrized reports, dynamic dashboards enable company-wide visibility into key performance indicators (KPIs). Powering your KPI dashboarding from the same environment of experiments and analyses enables much easier management and makes the data available to your audience asynchronously. Easier management leads to consistent delivery, consistent delivery leads to happy stakeholders, and happy stakeholders lead to successful projects.
The “normal” that the global workforce returns to post-pandemic will be much different than before. Remote work will continue to increase at a faster rate, as companies trade office space for location flexibility and a wider talent pool. Organizations that figure out how to excel at both technical and non-technical collaboration in virtual workspaces will have a massive leg up on the competition. To make sure you and your colleagues are on the right side of history, focus on how you can achieve the main outcomes described in this article as you sustain these elevated levels of remote collaboration.
– Launch my Deepnote notebook directly
– An article about different ways companies are structuring data teams in organizations
– A collection of awesome Deepnote applications and use cases
– More from yours truly at Life With Data, Twitter, and Medium
Reposted with permission: Source.