The data science world is split into two parts: the (i)Python and the R community. Both groups offer a plethora of tools and libraries enriching our work-life as a data scientist.
Interestingly, many of the offerings are complementary, such that professional data scientists should know both environments to pick the right tool for the job. In many cases, it even makes sense to use Python and R together in the same project.
Sadly, today these two worlds don’t integrate very well, so we need to switch back and forth between different tools and environments.
RIDE is a new development environment for data science. It aims to be the cockpit for professional data scientists working in multiple languages. By leveraging and extending the awesome JupyterLab, RIDE combines advanced tool support with the interactivity of Jupyter notebooks.
We are firm believers in instant feedback and quick development turnarounds. RIDE provides feedback to the user all the time. This starts with features like intellisense and diagnostics that update as you type and goes much further.
In code editors, you can send code of current line for evaluation to the active session. RIDE’s flexible layouts allows you to see editors and consoles side-by-side, like in the following example.
In case, you do not want to waste screen estate with a console, you can enable “inline sourcing”. Doing so will render the results right below the statement in the editor:
Furthermore, you can source the whole file explicitly or automatically on save.
Notebooks, RMarkdown, Shiny and More
Jupyter Notebooks are natively supported because RIDE directly leverages the new awesome JupyterLab. Additionally, RIDE can handle other presentation formats very well. For instance, RMarkdown can be auto-processed on save.
Of course, you can use the “inline sourcing” feature in *.rmd files as well.
Furthermore, users can start, stop, and refresh Shiny apps within an editor containing a app.
A data model ultimately is a piece of software written in one or more programming languages. Such projects can quickly grow and become complex, such that traditional software engineering practices like testing and debugging become necessary. Why should data scientists not get the same kind of advanced language and tool support, that regular software developers have today?
Language support in RIDE goes beyond what the usual data science tools such as RStudio or StatET offer today. We get our inspiration from the best tools in software like JetBrain’s IntelliJ IDEA or the Java support in Eclipse.
RIDE supports the new “Language Server Protocol” (LSP), through which you get many useful features in code editors, consoles, and notebooks. Such features are
- and many more
Supporting such features in a dynamically typed language such as Python or R is, of course, a bit more challenging than with a statically-typed language such as TypeScript, Scala or Java. Traditionally such tool support is provided by a compiler that parses the source code and collects information through type systems and static analysis. For R we have implemented such a language server, but additionally, we combine it with information from a running kernel.
When working in a kernel session, the user wants to see what values are available. RIDE offers an environment view where you can navigate through the current scope and inspect any values.
Furthermore, you can debug your code by setting breakpoints and stepping through it. Again RIDE’s support for that is not tied to R but is based on a generic kernel extension and will soon be supported for Python and other languages, too.
Please see this article for a more detailed description and comparison of RIDE’s debug support.
Data Science is all about data. So of course, we need to be able to have a glimpse at it now and then. The challenge here is that often we process large amounts of data and we need to be careful not to inflate the memory footprint unnecessarily. Since RIDE is a cloud service, you could scale up your workspace if needed but still we cannot send all data over the network. Moreover, all the available memory should be used in meaningful ways and not wasted carelessly.
Therefore RIDE’s data viewer will only fetch the data it needs to present. In fact, it even allows looking at an infinite stream of data. Since the data viewer directly connects to the kernel through a protocol extension, no unnecessary copying of data happens.
Today, RIDE provides all these features for R already. For Python some features, such as debugging, are still in development and will follow soon. Furthermore, we are going to implement a SQL Kernel using the same protocols.
Furthermore, we will keep working closely with the awesome JupyterLab team, who were not only very open to our (sometimes quite extensive) pull-request but also generally always super friendly and supportive in all kinds of ways. We are currently looking into making even more of the things we did open-source. For instance, it would make sense to open up the kernel extensions we have defined, to allow third parties to use them as well.
If you got interested in trying out RIDE, you can create a free account and start using it now.
Originally posted by Sven Efftinge, VP Technology at r-brain r-brain.io