Get started with GitHub Codespaces

I’m on such a development-environment-manangement roll lately
github
workflow
Author

Lindsay Lee

Published

June 9, 2024

After getting started with conda, the next development environment management challenge on my list was GitHub Codespaces. A codespace is a development environment hosted in the cloud, which you can configure with the specific development tools needed for your project. You include the configuration file for the codespace with your repository, so it makes it easy for anyone who is using your code or collaborating with you to work in an environment that reflects yours, preventing difficult-to-diagnose dependency errors.

There are various tutorials and templates out there to help you get started, but I still found it difficult to configure my first codespace how I wanted to. There are also things to consider regarding machine size and cost control that I didn’t spend time thinking about and I won’t get in to here. While I still don’t quite understand all the issues I encountered along the way, I landed on a configuration that I think I can adapt and build on for future projects, which installed R and Python and packages for each.

You can see your codespaces and initialize a new one in GitHub here. You can also initialize a codespace directly from a repository. I built my first codespace in the repository babys-first-codespace. I first created the repository, then from the repository clicked Code -> Codespaces -> Create codespace on main. Then you are taken to Visual Studio Code in the browser. You can see your code is running on a virtual machine from the codespace name in the bottom left corner of the window.

Initializing a codespace this way starts you off with a completely blank slate, and at first it wasn’t very clear to me what in the world you’re supposed to do next. After some googling, I saw you can search for templates by opening the Command Palette (Cmd+Shift+P) and hitting “Codespaces: Add Dev Container Configuration Files…” then “Create a new configuration”. There is one template that mentions R and Python called Data Science with Python and R. I had issues adding this template directly to the repository this way, but instead I manually copied the src/datascience-py-r folder in the template’s repository into my repository. The key file is .devcontainer/devcontainer.json, which holds all the configurations for the codespace.

I could see from this file that there’s a step "postCreateCommand" that mentions installing python packages from a requirements.txt file. I added this file to my repository, specified a few packages, and rebuilt the codespace (open the Command Palette and search for “Codespaces: Rebuild Container” if you aren’t prompted to rebuild automatically after making changes to devcontainer.json). This seemed to install the python packages I wanted without an issue. One coding language down, one to go!

This devcontainer.json template uses the features r-apt to install R and apt-packages to install some R packages. I tried adding more R packages to the list for apt-packages but ran into some issues. After more googling I found this Stack Overflow response that indicated that not all CRAN packages are available via apt. I tried installing only packages I could find on the Ubuntu Packages Search, and that seemed to work.

Next I tried adding the r-packages feature to install other CRAN packages, but I couldn’t get it to work. The error message that showed up in creation.log after a failed container rebuild was vague.

I found this blog post where someone shared the devcontainer.json they use for developing with Quarto. They use a rocker/tidyverse image as a base. I hoped that by using this image instead, I could install the R packages I need with the r-packages feature and still use the python feature. After updating the key parts of my devcontainer.json script to look like this…

...
"image": "ghcr.io/rocker-org/devcontainer/tidyverse:4.3",
// Features to add to the dev container. More info: https://containers.dev/features.
"features": {
        "ghcr.io/devcontainers/features/python:1": {
            "version": "latest"
        },
        "ghcr.io/rocker-org/devcontainer-features/r-packages:1": {
            "packages": "blogdown",
            "installSystemRequirements" : true
        }
    },
    // Use 'postCreateCommand' to run commands after the container is created.
    "postCreateCommand": "pip install ipykernel ipywidgets && if [ -f requirements.txt* ]; then pip install -r requirements.txt; else pip install pandas numpy matplotlib seaborn scikit-learn; fi",
...

…I still got an error. But a different one! With the help of GitHub Copilot, I figured out it was something about not being able to find the remoteUser of "vscode". I commented out the remoteUser specification (who knows what this does anyway) in the template and rebuilt the container, and the error went away!

We did it! A minimally useful GitHub Codespace! This codespace is not particularly sophisticated, but I think knowing how to set one of these up is going to become more and more important as more development is pushed to the cloud in the age of AI (don’t I sound smart?). I know for me at least I’m pushing my 7 year old Macbook to the limit trying to run AI stuff locally (blog post forthcoming, hopefully). Running on a codespace is going to give my lil laptop some relief.