One component of the broader Throughput project is the ability to link to resources on the web that indicate ways in which individuals have linked records, data resources or objects to provide scientific insight.
To help establish a baseline of data integration this project uses Python to search for code on GitHub (with other implementations to come) that invoke commands such as import packageName
(Python) or library(packageName)
(R), and adds these to a graph database described elsewhere using the W3C annotation model.
- Chris Heiser - University of Northern Arizona
- Nick McKay - University of Northern Arizona
- Simon Goring - University of Wisconsin -- Madison
We welcome contributions from all individuals, but expect contributors to follow the Code of Conduct for this repository.
The list of packages to be searched includes packages from the ROpenSci registry, as well as Python packages, including lipd and packages in the SciTools repository.
To scrape the GitHub API you must have a valid user token. The .gitignore
and the current R script look for that file in gh.token
. You can generate a token using your developer settings in GitHub.
This work is funded through the National Science Foundation's EarthCube Program through awards 1740699 and 1740667.