Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate --pkgid and --hdrid commandline switches #2633

Closed
pmatilai opened this issue Aug 28, 2023 · 17 comments · Fixed by #3335
Closed

Eliminate --pkgid and --hdrid commandline switches #2633

pmatilai opened this issue Aug 28, 2023 · 17 comments · Fixed by #3335
Labels
API API related v6 Related to rpm v6 (readiness)
Milestone

Comments

@pmatilai
Copy link
Member

pmatilai commented Aug 28, 2023

As pointed out by @dralley in #2374 (comment): we support indexed db queries for "pkgid" and "hdrid", which are actually backed by SIGMD5 and SHA1HEADER tags. These aren't really security sensitive because they're just an identifier in this context, but both will be gone in the v6 format, so we need to eliminate these indexes based on obsolete crypto, and if pkgid/hdrid concepts are to be preserved (I don't know anybody using them so dunno), figure out something that doesn't have an expiration date like the crypto functionality nowadays.

AC:

  • --pkgid and --hdrid cli switches and corresponding documentation is removed
  • RPMDBI_SIGMD5 and RPMDBI_SHA1HEADER are documented as deprecated
@pmatilai pmatilai added this to the 4.20.0 milestone Aug 28, 2023
@pmatilai pmatilai added this to RPM Aug 28, 2023
@github-project-automation github-project-automation bot moved this to Backlog in RPM Aug 28, 2023
@pmatilai pmatilai moved this to Todo in RPM v6 format Aug 28, 2023
@pmatilai pmatilai added the API API related label Aug 28, 2023
@pmatilai pmatilai moved this from Backlog to Todo in RPM Aug 28, 2023
@dralley
Copy link
Contributor

dralley commented Sep 2, 2023

(I don't know anybody using them so dunno)

@Conan-Kudo I believe you may have mentioned in the past that the MD5 headers (pkgid) are used in some build systems?

@Conan-Kudo
Copy link
Member

Yes. I think Koji did at one point, and OBS does last I checked.

@pmatilai
Copy link
Member Author

pmatilai commented Sep 4, 2023

Yes various places show it because ... it's there. And course, for the longest time that was the only checksum kind of thing that we had on the payload.

@pmatilai pmatilai added the v6 Related to rpm v6 (readiness) label Sep 27, 2023
@pmatilai pmatilai moved this from Todo to Priority in RPM Nov 2, 2023
@pmatilai pmatilai moved this from Priority to Todo in RPM Nov 2, 2023
@pmatilai pmatilai moved this from Todo to Priority in RPM Feb 21, 2024
@pmatilai pmatilai moved this from Priority to Todo in RPM Apr 9, 2024
@pmatilai pmatilai modified the milestones: 4.20.0, 6.0.0 Apr 11, 2024
@pmatilai pmatilai removed this from RPM v6 format Sep 18, 2024
@pmatilai pmatilai modified the milestones: 6.0.0, 6.0.0 alpha Sep 18, 2024
@pmatilai
Copy link
Member Author

pmatilai commented Sep 26, 2024

This is basically three different but related parts:

  • the db indexes themselves
  • the --hdrid and --pkgid query flags
  • RPMTAG_PKGID and RPMTAG_HDRID tag aliases

Then there's also RPMTAG_SOURCEPKGID that's still md5 but that is a separate topic (#3329 )
And, tools like koji will need to be updated, so there are a whole bunch of external dependencies. Koji actually does a low-level read of the rpm package by itself, instead of using bindings. I guess it needs this to handle stuff that system rpm doesn't, but ouch. Filed https://pagure.io/koji/issue/4200

@pmatilai
Copy link
Member Author

Split the pkgid/hdrid tag aliases to a separate ticket too, they're not actually used for the database operation: #3330

@pmatilai
Copy link
Member Author

pmatilai commented Sep 26, 2024

And, from there we get to something resembling an AC:

  • RPMDBI_SIGMD5 and RPMDBI_SHA1HEADER indexes are not generated for new databases. In existing databases, stale references must not be left behind.
  • --hdrid and --pkgid cli switches are removed - in my >= 20 years of rpm, I've yet to see anybody actually use these

Thoughts?

@dmnks
Copy link
Contributor

dmnks commented Sep 26, 2024

  • --hdrid and --pkgid cli switches are removed - in my >= 20 years of rpm, I've yet to see anybody actually use these

Is the removal of these two switches settled then? Also, isn't the new #3330 related to this in a way? I.e. if we end up keeping the aliases, shouldn't we also keep the CLI switches?

@dmnks
Copy link
Contributor

dmnks commented Sep 26, 2024

Well, OK, that's why you were asking for feedback in the first place (while you also justified the removal) 😄

@dmnks
Copy link
Contributor

dmnks commented Sep 26, 2024

I also think that removing obscure functionality makes sense unless we know of some users (and we don't here). It can always be added back in the future if there's enough interest...

@pmatilai
Copy link
Member Author

The PKGID and HDRID tags are not used for the index generation, or query. The db index uses RPMDBI_SIGMD5 and RPMDBI_SHA1HEADER, which "happen to be" equivalent to RPMTAG_SIGMD5 and RPMTAG_SHA1HEADER.
so the PKGID and HDRID tags could be removed and I think rpm would still compile and work, something external may be using those though.

So while they appear related, they're much less so than you'd initially think - I too thought it was more closely bound.

@dmnks
Copy link
Contributor

dmnks commented Sep 26, 2024

Oh, looking at the code here too, you're right.

@pmatilai
Copy link
Member Author

So in other words, its two practically unused tags, practically unused cli-switches, backed by two practically unused db indexes 🤦 😆

We could keep --pkgid and --hdrid without the indexes and just do a brute-force search, but there seems little point in preserving this functionalty. Who wants to query the db based on an header hash they likely don't know even exists in the first place?

@dmnks
Copy link
Contributor

dmnks commented Sep 26, 2024

Yup, makes sense now, thanks 😄 As for the first AC point:

In existing databases, stale references must not be left behind.

This part still sounds quite vague (to me)...

@pmatilai
Copy link
Member Author

pmatilai commented Sep 26, 2024

In existing databases, stale references must not be left behind.

This part still sounds quite vague (to me)...

Basically that means in existing databases, we need to keep updating the indexes. Minimally, we need to update the index on package removal. If we just drop the code, we could leave orphans in the index, and that cannot be allowed, so it's a hard requirement. If we actually cared about this functionality we'd require maintining the index for new items on existing dbs too, but since we dont, just taking care of orphans seems enough to me. Whether it actually makes sense to implement it that way or preserve both, I don't know, I haven't really looked (trying to play brave 😄 )

@pmatilai
Copy link
Member Author

pmatilai commented Sep 26, 2024

Hmm. On a second thought: we could simply just remove the query switches for now, and leave the db indexes to die of old age, to be buried in rpm 7.0. That preserves them for existing databases which in theory matters because we'll be dealing with v4 databases for years to come. In new ones they'll eventually just grow empty because the data is not there. These indexes are not expensive to maintain in any way, and actually trying to remove them is far more tricky than just letting them be.

@pmatilai
Copy link
Member Author

So... that would make an AC something like:

  • --pkgid and --hdrid cli switches and corresponding documentation is removed
  • RPMDBI_SIGMD5 and RPMDBI_SHA1HEADER are documented as deprecated

That would let us off the hook with minimal effort, this isn't a very interesting item from v6 perspective.

@dmnks
Copy link
Contributor

dmnks commented Sep 26, 2024

Ack, this looks like the right amount of effort for such a tiny thing, indeed.

@pmatilai pmatilai changed the title Eliminate RPMDBI_SIGMD5 and RPMDBI_SHA1HEADER rpmdb indexes Eliminate --pkgid and --hdrid commandline switches Sep 26, 2024
pmatilai added a commit to pmatilai/rpm that referenced this issue Sep 26, 2024
These obscure switches are backed up by obsolete crypto that wont be,
cannot be, present in v6 packages. It would be possible to keep them
alive by making them point to non-obsolete data, but for that there
would need to be a use-case and users. These have neither as far as
I've learned in the last ~20 years.

Fixes: rpm-software-management#2633
pmatilai added a commit to pmatilai/rpm that referenced this issue Sep 26, 2024
Nothing and nobody should be using these for anything at all.
We'de rather remove these indexes entirely but doing so while keeping
compatibility with existing databases and so on is far more trouble
than letting these die of old age is. V6 packages will not have such
tags so in time these indexes will be all empty and then we can just
axe them without having to work out transitional details.

Related: rpm-software-management#2633
pmatilai added a commit to pmatilai/rpm that referenced this issue Sep 26, 2024
Nothing and nobody should be using these for anything at all.
We'de rather remove these indexes entirely but doing so while keeping
compatibility with existing databases and so on is far more trouble
than letting these die of old age is. V6 packages will not have such
tags so in time these indexes will be all empty and then we can just
axe them without having to work out transitional details.

Related: rpm-software-management#2633
pmatilai added a commit to pmatilai/rpm that referenced this issue Sep 26, 2024
Nothing and nobody should be using these for anything at all.
We'd rather remove these indexes entirely but doing so while keeping
compatibility with existing databases and so on is far more trouble
than letting these die of old age is. V6 packages will not have such
tags so in time these indexes will be all empty and then we can just
axe them without having to work out transitional details.

Related: rpm-software-management#2633
@ffesti ffesti closed this as completed in 7b8ff81 Sep 30, 2024
ffesti pushed a commit that referenced this issue Sep 30, 2024
Nothing and nobody should be using these for anything at all.
We'd rather remove these indexes entirely but doing so while keeping
compatibility with existing databases and so on is far more trouble
than letting these die of old age is. V6 packages will not have such
tags so in time these indexes will be all empty and then we can just
axe them without having to work out transitional details.

Related: #2633
@github-project-automation github-project-automation bot moved this from Todo to Done in RPM Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API API related v6 Related to rpm v6 (readiness)
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants