Throughput Cross-Resource Workflow Scraper

One component of the broader Throughput project is the ability to link to resources on the web that indicate ways in which individuals have linked records, data resources or objects to provide scientific insight.

To help establish a baseline of data integration this project uses Python to search for code on GitHub (with other implementations to come) that invoke commands such as import packageName (Python) or library(packageName) (R), and adds these to a graph database described elsewhere using the W3C annotation model.

Contributions

Chris Heiser - University of Northern Arizona
Nick McKay - University of Northern Arizona
Simon Goring - University of Wisconsin -- Madison

We welcome contributions from all individuals, but expect contributors to follow the Code of Conduct for this repository.

Current Packages of Interest

The list of packages to be searched includes packages from the ROpenSci registry, as well as Python packages, including lipd and packages in the SciTools repository.

Using this repository

To scrape the GitHub API you must have a valid user token. The .gitignore and the current R script look for that file in gh.token. You can generate a token using your developer settings in GitHub.

Support

This work is funded through the National Science Foundation's EarthCube Program through awards 1740699 and 1740667.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Throughput Cross-Resource Workflow Scraper

Contributions

Current Packages of Interest

Using this repository

Support

Files

README.md

Latest commit

History

README.md

File metadata and controls

Throughput Cross-Resource Workflow Scraper

Contributions

Current Packages of Interest

Using this repository

Support