Tracey’s workshop: a story of a data science workshop speaker
Tracey is a data scientist and a content creator. She is preparing a workshop about Python language for a global conference. She has all the content ready, but she needs to draft a guided tutorial with a full list of prerequisites the workshop attendees will need to set up the environment and replicate her demo on their own. She is feeling frustrated about how long this pre-requisite list is and how much time her workshop attendees will take to only open and execute their first Python notebook. Being the workshop targeted to a beginner audience, it’s very likely that her attendees will take up to 30 minutes before having their machines ready to kick off. Also, she is concerned about creating disparities between attendees that can and cannot afford high-performance machines, able to manage high-volume training datasets or complex machine learning algorithms.
How does this story change if Tracey starts using GitHub Codespaces? Instead of asking her audience to install all the pre-requisites needed for the demo on their local machine, Tracey can simply create a customized cloud-hosted environment by pushing to a GitHub repository her Python code and the configuration files needed to automatize the development environment set-up with all the pre-requisites. In fact, working on a codespace means working on a Docker dev container hosted on a virtual machine: the requirements listed in the dev container configuration files are used to build the container in a transparent manner for the user. This creates a repeatable codespace configuration for all the attendees of her workshops, ensuring a similar experience for all of them. She could even re-use this customized environment for other workshops by simply applying the needed edits to the configuration files.
What are the steps Tracey needs to follow to start using GitHub Codespaces? For the sake of simplicity, let’s suppose her workshop demo includes only one Python notebook like this.
To open it in a codespace she just needs to click on the “<> Code” button on the main page of her GitHub repository and then create a new codespace with options.
Now, she has the choice to specify:
- The branch of the repo (default is main).
- A dev container configuration file, including the Python version and the Visual Studio Code settings and extensions, and referring to a docker file with all the requirements to execute the notebook.
- The region in which the codespace will run.
- The machine type (number of cores, RAM size, and CPU size).
By simply sharing the GitHub repo link with the workshop audience, Tracey can provide her attendees with the same environment she is using, with the only requirement of having a GitHub account.
Paul’s experience: a story of a data science beginner
Let me introduce another character in this narration, by changing the focus to Tracey’s audience. Paul is a data science student, who is getting started with the Python language and who attended Tracey’s workshop to start learning Python essentials. Coming back home, he would like to repeat Tracey’s demo on his own and then proceed with the workshop’s recommended next steps. During the workshop, Tracey showed how to read a CSV dataset, how to get a quick overview of a dataset and to perform data visualization. As a next step, Paul would like to retrieve some specific items of the dataset, for example, filtering by one of the item’s properties. Being a Python beginner, he doesn’t really know what’s the right syntax to perform this task. What can Paul do? He has different options:
- Searching on the web: this will eventually help Paul complete the assignment, but, without any guidance, untwisting himself in a labyrinth of millions of resources could be time-consuming.
- Asking for help from a peer who also attended the same workshop, if any.
- Reaching out directly to Tracey, who is probably busy with other work, and she might eventually reply in a few days.
GitHub Copilot might offer a quicker and simpler 4th option to Paul. Empowered by the OpenAI Codex generative model – trained on billions of lines of GitHub open source code, issues and PRs – Copilot is available on Visual Studio, Visual Studio Code and GitHub Codespaces, meaning that Paul can receive the support he needs without even leaving the workshop environment.
To get started, he should install the GitHub Copilot Visual Studio Code extension and then create a new cell in the notebook with a comment describing what he wants to achieve. In this case, since he wishes to filter a dataset of wines according to their region of origin, the comment could be similar to “#filtering dataset to extract only wines whose region contains France”. By pressing “Enter”, the code performing the requested task is automatically generated by GitHub Copilot. It is generated in a way that the underlying model takes into account not only the input prompt but also the neighbor cells in the notebook to ensure a more coherent result.
And what if, by reviewing Tracey’s code, Paul is unsure about what a piece of code is actually doing? In this case, he can use another GitHub Copilot-based extension called GitHub Copilot Labs and ask for an explanation of a highlighted piece of code in one click.
Write your own story!
1. Get started with GitHub Codespaces in 3 steps:
– Follow a full introductory video tutorial.
– Explore and build upon one of the GitHub Codespaces templates available for you.
Be aware that if you are a student you can sign up for a special offer, including free pro-level access to Codespaces (i.e. 180 core hours per month).
2. Meet GitHub Copilot on your favorite IDE:
– Sign up for a free trial of GitHub Copilot.
– Explore the documentation to learn more.
Warning: Copilot code is not perfect! Like any other code you find on the web, code suggested by GitHub Copilot should be carefully tested, reviewed, and vetted. As a developer, you are always in charge of the code you produce (with or without the help of an AI pair programmer).
Cover Image Source: github.blog