Throughput Cross-Resource Workflow Scraper

One component of the broader Throughput project is the ability to link to resources on the web that indicate ways in which individuals have linked records, data resources or objects to provide scientific insight.

To help establish a baseline of data integration this project uses Python to search for code on GitHub (with other implementations to come) that invoke commands such as import packageName (Python) or library(packageName) (R), and adds these to a graph database described elsewhere using the W3C annotation model.

Contributions

Chris Heiser - University of Northern Arizona
Nick McKay - University of Northern Arizona
Simon Goring - University of Wisconsin -- Madison

We welcome contributions from all individuals, but expect contributors to follow the Code of Conduct for this repository.

Current Packages of Interest

The list of packages to be searched includes packages from the ROpenSci registry, as well as Python packages, including lipd and packages in the SciTools repository.

Using this repository

To scrape the GitHub API you must have a valid user token. The .gitignore and the current R script look for that file in gh.token. You can generate a token using your developer settings in GitHub.

Support

This work is funded through the National Science Foundation's EarthCube Program through awards 1740699 and 1740667.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
.gitignore		.gitignore
README.md		README.md
R_packages_toScrape.csv		R_packages_toScrape.csv
code_of_conduct.md		code_of_conduct.md
crossResourcePyScript.py		crossResourcePyScript.py
crossResourceWorklowScraper.Rproj		crossResourceWorklowScraper.Rproj
scrapeGitResults.csv		scrapeGitResults.csv
scrapeGithub.Rmd		scrapeGithub.Rmd
scrapeGithubPython.md		scrapeGithubPython.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Throughput Cross-Resource Workflow Scraper

Contributions

Current Packages of Interest

Using this repository

Support

About

Releases

Packages

Contributors 3

Languages

throughput-ec/crossResourceGithubScraper

Folders and files

Latest commit

History

Repository files navigation

Throughput Cross-Resource Workflow Scraper

Contributions

Current Packages of Interest

Using this repository

Support

About

Resources

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages