Skip to content

Latest commit

 

History

History
25 lines (14 loc) · 1.91 KB

README.md

File metadata and controls

25 lines (14 loc) · 1.91 KB

Throughput Cross-Resource Workflow Scraper

One component of the broader Throughput project is the ability to link to resources on the web that indicate ways in which individuals have linked records, data resources or objects to provide scientific insight.

To help establish a baseline of data integration this project uses Python to search for code on GitHub (with other implementations to come) that invoke commands such as import packageName (Python) or library(packageName) (R), and adds these to a graph database described elsewhere using the W3C annotation model.

Contributions

  • Chris Heiser - University of Northern Arizona
  • Nick McKay - University of Northern Arizona
  • Simon Goring - University of Wisconsin -- Madison

We welcome contributions from all individuals, but expect contributors to follow the Code of Conduct for this repository.

Current Packages of Interest

The list of packages to be searched includes packages from the ROpenSci registry, as well as Python packages, including lipd and packages in the SciTools repository.

Using this repository

To scrape the GitHub API you must have a valid user token. The .gitignore and the current R script look for that file in gh.token. You can generate a token using your developer settings in GitHub.

Support

This work is funded through the National Science Foundation's EarthCube Program through awards 1740699 and 1740667.