Though Data Science is still a young field, in many ways it is an amalgamation of many roles that have previously...

Though Data Science is still a young field, in many ways it is an amalgamation of many roles that have previously existed. The range of backgrounds represented by Data Scientists clearly illustrates this point. (A snapshot of this can be seen in the speakers who will be at ODSC East.) Going even deeper, there is the issue of separating the Data Scientist from previous roles with similar job descriptions. One of these is the Business Analyst, a role more associated with being a more equal mix between technical and non-technical skills. One reason why this divide persists is because the usual platform choice for analyses,vMicrosoft Excel, is a piece of software so ubiquitous that the Jupyter notebook, RMarkdown documents and other parts of the Data Science toolbox just can’t come close to matching it. On the other hand, Excel’s limitations in a world of ever increasing data are well documented.What if there were a way to meld these two workflows?

That, perhaps, was the train of thought that brought about the creation of AlphaSheets, a new startup from the minds of several MIT graduates. The product’s tagline is simple: “The power of programming in a spreadsheet.” Savvy readers may be wondering if that isn’t already fulfilled by Excel VBA or an Excel plugin like xlwings. The programming referred in this case is Python and R, the two leading languages in Data Science, and SQL. Furthermore, VBA and xlwings require the user to write code in a separate window and then run it in the spreadsheet. AlphaSheets allows the user to write code in the cells themselves a la Excel formulas. There is also a global code editor where more verbose code can be encapsulated in a function and referenced from a cell.

The workflow is quite interesting. In essence, each cell is dedicated to a language, but a block will be necessarily assigned to a language if the output requires it. In one of the demos on the company’s website, one column contained a sample from a probability distribution generated with Python, further to the right was a cell with a sum found using Excel’s sum formula, and elsewhere there was an embedded histogram of the aforementioned sample produced with R.

Will Alpha Sheets meet their goal of narrowing the gap between the technical and non-technical worlds? It’s too early to say. There are a lot of questions to ask, two of which concern latency and scalability. The issue of package management is another, but the Beaker notebook has already given a preview of how that can be seamlessly integrated. What is visible so far is intriguing enough to incite curiosity. Only time will tell if the project will develop enough to carve out a niche in a crowded field.

Gordon Fleetwood

Gordon studied Math before immersing himself in Data Science. Originally a die-hard Python user, R's tidyverse ecosystem gradually subsumed his workflow until only scikit-learn remained untouched. He is fascinated by the elegance of robust data-driven decision making in all areas of life, and is currently involved in applying these techniques to the EdTech space.