Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Handling missing license / package information. #165

Open
alex-torok opened this issue Dec 9, 2024 · 1 comment
Open

Proposal: Handling missing license / package information. #165

alex-torok opened this issue Dec 9, 2024 · 1 comment

Comments

@alex-torok
Copy link

alex-torok commented Dec 9, 2024

Problem

The current implementation of rules_license silently ignores targets that do not have any license information and does not pass any information out of its collection providers so that missing licenses can be asserted on. This poses a problem, as people who are generating sbom license reports will need to know if one of their dependencies is not properly specifying the software license that it is under.

From bazel slack, it sounds like people have been custom-building this logic outside of rules_license.

Until rules_license adoption is widespread, there be cases where people want to track license information, but their dependencies may be missing the necessary metadata targets. Someone may be on an old version of a language ruleset that doesn't populate license info, but should still be able to use rules_license without doing a ruleset version upgrade.

Proposal

I'd like to update gather_metadata_info_common in licenses_core.bzl to add a MissingInfo provider that will be used to denote when a target is missing one of the requested metadata types:

MissingInfo = provider(
    doc = """Denotes that a target is missing requested information""",
    fields = {
        "target_missing_info": "Label: The target label.",
        "missing_providers": "tuple(Provider): The requested providers that were not found in the metadata for this target."
    },
)

This will be stored in the gathering providers in a new struct field:

def licenses_info():
    return provider(
        doc = """The transitive set of licenses used by a target.""",
        fields = {
            "target_under_license": "Label: The top level target label.",
            ...
            "missing": "depset(MissingInfo)",
        },
    )

# This provider is used by the aspect that is used by manifest() rules.
TransitiveLicensesInfo = licenses_info()


TransitiveMetadataInfo = provider(
    doc = """The transitive set of licenses used by a target.""",
    fields = {
	    "target_under_license": "Label: The top level target label.",
        ...
        "missing": "depset(MissingInfo)",
    },
)

Any rules that consume the TranstiveLicensesInfo or the TransitiveMetadataInfo providers can still use the licenses field to get the populated licenses without knowing anything about missing license handling.

Rules that wish to know about targets that are missing LicenseInfo or PackageInfo can look through missing depset and see if any of the providers listed there have LicenseInfo or PackageInfo inside of the missing_providers list. Because the MissingInfo provider has a field that returns provider types, it can be shared across the License and Package gathering functionality.

Populating missing license or metadata information could be done with a rule that runs the gather aspect and provides an attr for giving replacement licenses via a string_keyed_label_dict:

gather_license_metadata(
    name = "license_info",
    deps = [":foo"],
    licenses = [
        "@foo": "//third_party/licenses:foo_license",
        "@bar": "//third_party/licenses:bar_license",
    ],
)

This rule could look through the missing metadata information and update the TransitiveLicensesInfo when it finds a target that matches the given licenses lookup. Any target missing a license under @foo//... would use the given foo_license and any target under @bar//... would use bar_license. Any target that doesn't match the given mappings would remain in the missing data. Additional test rules could be written to assert that there are no targets that are missing the LicenseInfo provider.

People who are using upstream language rules that do not yet populate license information could generate a bzl file that contains a repo->label mapping so that they could still leverage rules_license on their existing codebase.

I have a rough cut of this described functionality working inside of a private repo, and it is working well for the initial experiments that I'm doing with pulling in rules_license to our codebase.

Open Quesitons

  • I don't have experience with BZLMOD - Would any of the above (particularly around handling external repos) be difficult / impossible there?
  • Should the missing license info handling be done in a rule inside of rules_license, or should it be put in the gather_licenses_info aspect (similar to the _trace attr)? It seems like the repo currently expects end-users of the ruleset to write their own custom rules that use the gathering aspects. If we put it in the aspect, it would pin a minimum version of 7.4 to rules_license, since that is the earliest version with string_keyed_label_dict patched in`.
  • Have there been alternative ideas for handling this kicking around in peoples minds?
@fmeum
Copy link
Contributor

fmeum commented Dec 9, 2024

Regarding Bzlmod: string-typed attributes collecting labels are problematic since they don't go through repo mapping, which makes it problematic to have label-to-label dicts as rule attributes. This can be solved by making the rule a macro (or using a rule initializer for Bazel 7 and up).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants