Python
Published

June 12, 2023

Jupyter notebooks in Git

Working with Jupyter notebooks in Git repositories has always caused some struggles for me with the Git commit history. Jupyter notebooks use the json-format and include some metadata such as the number of times a notebook has been run. So far, I have just cleared the output restarted and run notebooks once to have somewhat clean metadata. This has not always worked and let to some slightly noisy git commits.

Now, I have tried a solution that I remembered when I wanted to commit a Jupyter notebook and run into noisy git diffs again. nbdev_clean removes cell execution related metadata from a notebook.

nbdev_clean --fname . cleans all Juypter notebooks in a repository. I run it on this learning diary Git repository to clean the two notebooks in the snippets folder. I also used it in the project that initially caused me to look into nbdev.

Finally, i used Codespaces to quickly create minimal, reproducible examples of git diffs without and with nbdev_clean. Again, Codespaces are really nice.