Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing Javascript files may extract incorrect (and very long) pkg name/version #723

Open
marob opened this issue Nov 17, 2023 · 10 comments
Labels
Consider Funding enhancement New feature or request

Comments

@marob
Copy link
Contributor

marob commented Nov 17, 2023

In my use case, I have a projekktor-1.3.09.min.js file (N.B.: this project looks dead, its website links to dead site/advert spam) that starts with the following line (yes, the version doesn't match, but that's not the issue here):

/*! Projekktor v1.2.37 -jarisflash | http://www.projekktor.com | Copyright 2010, 2011, Sascha Kluger, Spinning Airwhale Media, http://www.spinningairwhale.com | GNU General Public License - http://www.projekktor.com/license/
 */

This first line is generated via Grunt.

When parsed through cdxgen parseMinJs function, delimiter is identified as - because of the one at the end of the line (...License - http://...), which leads to a mis-identification of the name and version.

In the end, cdxgen generates:

    {
      "group": "",
      "name": "projekktor-v1.2.37--jarisflash-|-http://www.projekktor.com-|-copyright-2010,-2011,-sascha-kluger,-spinning-airwhale-media,-http://www.spinningairwhale.com-|-gnu-general-public-license",
      "version": "http://www.projekktor.com/license/",
      "scope": "optional",
      "purl": "pkg:npm/projekktor-v1.2.37--jarisflash-%257C-http:%2F%2Fwww.projekktor.com-%257C-copyright-2010%252C-2011%252C-sascha-kluger%252C-spinning-airwhale-media%252C-http:%2F%2Fwww.spinningairwhale.com-%257C-gnu-general-public-license@http:%2F%2Fwww.projekktor.com%2Flicense%2F",
      "type": "library",
      "bom-ref": "pkg:npm/projekktor-v1.2.37--jarisflash-%7C-http://www.projekktor.com-%7C-copyright-2010%2C-2011%2C-sascha-kluger%2C-spinning-airwhale-media%2C-http://www.spinningairwhale.com-%7C-gnu-general-public-license@http://www.projekktor.com/license/",
      "evidence": {... },
      "properties": [...]
    },

This incorrect name/version leads to generating an error in Dependency Track when feeding it that SBOM because of the purl exceeding 255 chars (which is the current size limit):

ERROR [BomUploadProcessingTask] Error while processing bom
javax.jdo.JDOFatalUserException: Attempt to store value "pkg:npm/projekktor-v1.2.37--jarisflash-%257C-http%3A%2F%2Fwww.projekktor.com-%257C-copyright-2010%252C-2011%252C-sascha-kluger%252C-spinning-airwhale-media%252C-http%3A%2F%2Fwww.spinningairwhale.com-%257C-gnu-general-public-license@http%3A%2F%2Fwww.projekktor.com%2Flicense%2F" in column ""PURLCOORDINATES"" that has maximum length of 255. Please correct your data!
@prabhu
Copy link
Collaborator

prabhu commented Nov 17, 2023

@marob, good investigation.

cdxgen/utils.js

Line 1365 in b397bae

const tmpPV = l.split(delimiter);

Could we change this line to below to see if it works? You can send a PR including this and the version fix.

const tmpPV = l.split(delimiter).split("--")[0];

@marob
Copy link
Contributor Author

marob commented Nov 17, 2023

@prabhu That would'n work for 2 reasons:

  • l.split(delimiter) returns an array that doesn't have a split function
  • the -- String would only appear after executing .replace(/ /g, "-") several lines below

I was wondering on which specification you based your code to try to parse the package name and version? I didn't find any specification on how those comments should be written in JavaScript files (minified or not).
Whatever change we'll add in the code, I think we should have several example files to add as unit tests to make sure the code is parsing the data as expected.

@prabhu
Copy link
Collaborator

prabhu commented Nov 17, 2023

@marob There are no specifications like you figured. So this will be identifying things based on some patterns so there is some visibility into the min files that are being stored or bundled. I am open to improving the logic.

@prabhu
Copy link
Collaborator

prabhu commented Nov 20, 2023

@marob, are you planning to send a pull request for this?

@marob
Copy link
Contributor Author

marob commented Nov 20, 2023

@prabhu Maybe, but probably not in the near future. If you have time and an idea on how to improve the parsing, feel free to do it.
In my mind, we should try to grab several (tens or hundreds of) JS minified files examples to guess the usual format of the comment header and then write a test to check that our (new) code can parse all the identified formats.

@prabhu
Copy link
Collaborator

prabhu commented Nov 20, 2023

@prabhu Maybe, but probably not in the near future. If you have time and an idea on how to improve the parsing, feel free to do it.

In my mind, we should try to grab several (tens or hundreds of) JS minified files examples to guess the usual format of the comment header and then write a test to check that our (new) code can parse all the identified formats.

Same. Let's keep this open and see if anyone else contributes.

@marob
Copy link
Contributor Author

marob commented Sep 2, 2024

The size has been increased from 255 to 786 chars in Dependency Track 4.11.0 (DependencyTrack/dependency-track#3560) which should decrease BOM upload errors for cases where the purl length was > 255 but <= 786, but the issue is still here for length > 786.

I still don't have any idea on how to better parse name and version from JS comments as there is no convention/spec. Should we truncate? At which size? How to balance the sizes of name and version if both are long?

Anyway, as @prabhu asked here, it looks like there is no length limitation on purl specification, so we'll always have cases where a valid purl is longer than the arbitrary 786 chars limits defined in Dependency Track.

@marob
Copy link
Contributor Author

marob commented Sep 2, 2024

There is a (currently open) issue on Dependency Track about the 786-chars limit: DependencyTrack/dependency-track#3876

@nscuro
Copy link
Member

nscuro commented Sep 2, 2024

I still don't have any idea on how to better parse name and version from JS comments as there is no convention/spec. Should we truncate? At which size? How to balance the sizes of name and version if both are long?

What are the chances that a valid package name and version combined would exceed 786 characters?

Depending on package manager, one could probably pin packages to a digest. Worst case... SHA-512 maybe? That's 128 characters just for the version. That still leaves ~600 characters for the name and some qualifiers maybe.

This could probably be verified in a study of public packages, but my gut feeling tells me that if you're dealing with such lengths, chances are quite high that something is wrong.

@marob
Copy link
Contributor Author

marob commented Sep 2, 2024

@nscuro I agree that PURL of more that 786 characters shouldn't exist.
But in a real world, and because the spec says there's not limit in length, we should expect encountering PURL with more than 786 characters and conform to the spec by allowing unlimited length PURL (or change the PURL spec to set a length limit, ideally under 786 chars).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Consider Funding enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants