Zenodo API

Python
API
Published

October 31, 2023

This week, I worked with the Zenodo API. My goals was to add Zenodo metadata to GitHub for the partypositions-wikitags project.

The code of the project is archived at Zenodo with the DOI 10.5281/zenodo.7043510.

Code
import requests

headers = {"accept": "text/x-bibliography"}
r = requests.get("https://doi.org/10.5281/zenodo.7043510", headers=headers)

r.text
'Döring, H., &amp; Herrmann, M. (2024). <i>Party positions from Wikipedia tags</i> (Version 24.07) [Computer software]. Zenodo. https://doi.org/10.5281/ZENODO.7043510'

DOI API

So here is some metadata information. Lets start with the DOI API.

import requests

doi = "10.5281/zenodo.7043510"

headers = {"accept": "application/vnd.citationstyles.csl+json"}
r = requests.get(f"https://doi.org/{doi}", headers=headers)

r.json()
{'type': 'book',
 'id': 'https://doi.org/10.5281/zenodo.7043510',
 'language': 'en',
 'author': [{'family': 'Döring', 'given': 'Holger'},
  {'family': 'Herrmann', 'given': 'Michael'}],
 'issued': {'date-parts': [[2024, 7, 31]]},
 'abstract': 'Estimation of party positions from Wikipedia tags with Stan',
 'DOI': '10.5281/ZENODO.7043510',
 'publisher': 'Zenodo',
 'title': 'Party positions from Wikipedia tags',
 'URL': 'https://zenodo.org/doi/10.5281/zenodo.7043510',
 'copyright': 'MIT License',
 'version': '24.07'}

Zenodo record

We can also access a record through the Zenodo API. This does not require a Zenodo access token.

r = requests.get("https://zenodo.org/api/records/8275697")
record = r.json()

record.keys()
dict_keys(['created', 'modified', 'id', 'conceptrecid', 'doi', 'conceptdoi', 'doi_url', 'metadata', 'title', 'links', 'updated', 'recid', 'revision', 'files', 'swh', 'owners', 'status', 'stats', 'state', 'submitted'])

I was interested in the Zenodo metadata.

record['metadata']
{'title': 'Party positions from Wikipedia tags (July 2023)',
 'doi': '10.5281/zenodo.8275697',
 'publication_date': '2023-08-23',
 'description': '<p>Estimation of party positions from Wikipedia tags with Stan (July 2023)</p>',
 'access_right': 'open',
 'creators': [{'name': 'Holger Döring',
   'affiliation': 'GESIS – Leibniz Institute for the Social Sciences'},
  {'name': 'Michael Herrmann', 'affiliation': 'University of Konstanz'}],
 'related_identifiers': [{'identifier': 'https://github.com/hdigital/partypositions-wikitags/tree/23.07',
   'relation': 'isSupplementTo',
   'scheme': 'url'}],
 'version': '23.07',
 'resource_type': {'title': 'Software', 'type': 'software'},
 'license': {'id': 'other-open'},
 'relations': {'version': [{'index': 1,
    'is_last': False,
    'parent': {'pid_type': 'recid', 'pid_value': '7043510'}}]}}

GitHub metadata

Some of the metadata is imported by Zenodo from Github and some metadata needs to be added or updated manually.

You can specify some of the addtional metadata in a .zenodo.json file in your GitHub repository.

https://developers.zenodo.org/#github

I used the archived Zenodo record for a json dump of the metadata.

import json

print(json.dumps(record['metadata'], indent=2))
{
  "title": "Party positions from Wikipedia tags (July 2023)",
  "doi": "10.5281/zenodo.8275697",
  "publication_date": "2023-08-23",
  "description": "<p>Estimation of party positions from Wikipedia tags with Stan (July 2023)</p>",
  "access_right": "open",
  "creators": [
    {
      "name": "Holger D\u00f6ring",
      "affiliation": "GESIS \u2013 Leibniz Institute for the Social Sciences"
    },
    {
      "name": "Michael Herrmann",
      "affiliation": "University of Konstanz"
    }
  ],
  "related_identifiers": [
    {
      "identifier": "https://github.com/hdigital/partypositions-wikitags/tree/23.07",
      "relation": "isSupplementTo",
      "scheme": "url"
    }
  ],
  "version": "23.07",
  "resource_type": {
    "title": "Software",
    "type": "software"
  },
  "license": {
    "id": "other-open"
  },
  "relations": {
    "version": [
      {
        "index": 1,
        "is_last": false,
        "parent": {
          "pid_type": "recid",
          "pid_value": "7043510"
        }
      }
    ]
  }
}

Then, I manually cleaned up the metadata:

  • kept only entries that are not imported automatically
  • used unicode characters for umlaute
  • specified license id
metadata_json = """
{
  "title": "Party positions from Wikipedia tags",
  "description": "Estimation of party positions from Wikipedia tags with Stan",
  "creators": [
    {
      "name": "Döring, Holger",
      "affiliation": "GESIS – Leibniz Institute for the Social Sciences"
    },
    {
      "name": "Herrmann, Michael",
      "affiliation": "University of Konstanz"
    }
  ]
}
"""

Here, I validate the json. Running the cell should not raise an error.

metadata_json = json.loads(metadata_json)

Finally, I added it to the GitHub repository – commit ac7c462 😊