Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: docker-check-R-sysdeps.yml should only check sys deps of dependencies #79

Open
jdhoffa opened this issue Apr 24, 2024 · 1 comment

Comments

@jdhoffa
Copy link
Member

jdhoffa commented Apr 24, 2024

confirmed that going with this is probably more ideal (and doesn't have the same problem)

req_r_pkgs <- sort(unique(pak::pkg_deps(pkg = ".")$package))
pak::sysreqs_check_installed(packages = req_r_pkgs)

Originally posted by @cjyetman in RMI-PACTA/workflow.data.preparation#221 (comment)

@AlexAxthelm
Copy link
Contributor

Hmm. Even upon first glance, this looked like a non-trivial problem to solve, but as I'm looking at it more closely, it's getting trickier.

I think a reasonable approach would be to only check sysdeps for packages in IMPORTS (not SUGGESTS) of some vector (possibly length 1) of packages (such as just the workflow.* package in those repos), and on down through the IMPORTS tree. Generating that vector procedurally (possibly with some manual override) seems finicky, but doable.

The harder part here is in determining what sysreqs are actually relevant to include in an image. For example, the DESCRIPTION for arrow (proximal cause for this issue) has:

SystemRequirements: C++17; for AWS S3 support on Linux, libcurl and openssl (optional);
    cmake >= 3.16 (build-time only, and only for full source build)

which means that although there's no actual Requirements (and some are only needed at build-time), they're all getting picked up by pak as requirements.

The best solution would be if R had some way to distinguish between these, but I don't think that's coming any time soon, so maybe a better option would be to have an ignorelist, along the lines of

"sysreqs_ignore": [
  "arrow": ["C++17", "cmake"]
]

But overall, I'm wondering if the better option is to drop this check, and have practical (in situ) testing check for missing dependencies instead. That would be tougher to set up on the front end, but we'd only get true positives. For example, if we're using RPostgres without libpq installed, we'll get errors. The tricky part of testing anything with an acual sysdep, is that a lot of the time, those are there for communication to an external system, and that means we would need to have integration testing (against an actual PSQL instance, in this eaxmple) that would probably need to run in sequence/parallel with the "unit" tests. For example, the unit tests for workflow.factset use an in-memory SQLite, rather than a Postgres instance for unit tests, so we would need to set up a set of checks that actually test against a Postgres instance in addition to the SQLite to feel confident we've captured the necessary sysreqs.

@jdhoffa @cjyetman

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants