Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable shadowrealm testing for url api #41985

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

rwaldron
Copy link
Contributor

Starting with tests that don't have additional dependencies.

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the HTML PR for this is still outstanding, right? And there was also talk of defining some kind of IDL construct which I don't think has happened. What is the standardization status of this feature?

url/historical.any.js Show resolved Hide resolved
@rwaldron
Copy link
Contributor Author

So the HTML PR for this is still outstanding, right?

That's correct. We have updates to complete that work forthcoming.

And there was also talk of defining some kind of IDL construct which I don't think has happened.

Yes, that matches my understanding as well.

What is the standardization status of this feature?

As of now the feature is Stage 3, with requests from Mozilla for more feature testing such this—or be downgraded to Stage 2. For now my goal is to enable as much testing as possible, but limited to APIs that are [Exposed=*].

@rwaldron rwaldron force-pushed the rwaldron/shadowrealm-url-tests branch from 57a0ec2 to c816d1f Compare September 15, 2023 13:07
@rwaldron
Copy link
Contributor Author

@annevk thanks for the review! Changes made as requested

@annevk
Copy link
Member

annevk commented Sep 15, 2023

So the problem from my perspective is that we typically land test changes alongside the specification PR change. Landing test changes ahead of the specification being formalized is not something we have precedent for, apart from when tests have a .tentative suffix.

@rwaldron
Copy link
Contributor Author

@annevk I understand, and it should be fine to leave the PR open until the spec change work is complete, if that's what you'd prefer. I will follow up here when the time is right.

@ptomato
Copy link
Contributor

ptomato commented Jan 18, 2024

Rebased and added some additional changes.

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you carefully reviewed the before and after results this seems okay. However, I'm not sure what the impact of this is on Interop 2024. This late in the year it might require someone else to double check this doesn't end up impacting the Interop scores in some weird way.

So not approving until we're fully clear on that.

url/urlencoded-parser.any.js Outdated Show resolved Hide resolved
url/urlencoded-parser-request-response.any.js Outdated Show resolved Hide resolved
url/url-setters.any.js Show resolved Hide resolved
rwaldron and others added 3 commits October 25, 2024 09:31
Starting with tests that don't have additional dependencies.
Some tests required interfaces such as Request, Response, or FormData that
are not available in ShadowRealm. These should be split into separate test
files that are not run in ShadowRealm scopes.
In url/failure.html there were tests passing invalid URLs to the second
parameter of the URL constructor. Split these into their own file so they
can be run in workers and ShadowRealms, separately from the other tests in
url/failure.html that use the current document's URL.
@ptomato ptomato force-pushed the rwaldron/shadowrealm-url-tests branch from b4e0e1e to f382c0c Compare October 25, 2024 16:31
@ptomato
Copy link
Contributor

ptomato commented Oct 25, 2024

Assuming you carefully reviewed the before and after results this seems okay. However, I'm not sure what the impact of this is on Interop 2024. This late in the year it might require someone else to double check this doesn't end up impacting the Interop scores in some weird way.

I spent some time digging into this. I've been using wpt.fyi for before and after results.

Here are the relevant splits from this PR:

  • urlencoded-parser.any.js → split out urlencoded-parser-request-response.any.js
  • failure.html → split out url-constructor-base-failure.any.js
  • IdnaTestV2.window.html → renamed to IdnaTestV2.any.html (and so is additionally executed in Worker global scope)

There were also splits of tests that weren't included in the interop label, according to wpt.fyi, and so should have no relevance:

  • urlsearchparams-constructor.any.js → split out urlsearchparams-constructor-formdata.any.js
  • toascii.window.js → toascii.any.js, toascii-elements.window.js

and, of course, the new .shadowrealm.html tests are not labeled.

So I'd propose to include the following new tests in the interop stats:

  • IdnaTestV2.any.html
  • IdnaTestV2.worker.html
  • urlencoded-parser-request-response.any.html
  • urlencoded-parser-request-response.worker.html
  • url-constructor-base-failure.any.html
  • url-constructor-base-failure.worker.html

Here are the before and after stats for the tests touched by this PR that are included in the label. Note, I don't know exactly how the interop scores are calculated, but I'm assuming 'percentage of subtests passed' is probably a good measure to see if something's up.

Test Chrome before after Firefox before after Safari before after
failure.html 1146/1211 742/785 1151/1211 749/785 1211/1211 785/785
url-constructor-base-fai… D.N.E. 405/427 D.N.E. 403/427 D.N.E. 427/427
" .worker.html D.N.E. 405/427 D.N.E. 403/427 D.N.E. 427/427
IdnaTestV2.window.html 1320/2032 D.N.E. 2032/2032 D.N.E. 2032/2032 D.N.E.
IdnaTestV2.any.html D.N.E. 1320/2032 D.N.E. 2032/2032 D.N.E. 2032/2032
IdnaTestV2.worker.html D.N.E. 1320/2032 D.N.E. 2032/2032 D.N.E. 2032/2032
historical.any.html 8/8 8/8 8/8 8/8 8/8 8/8
" .worker.html 7/7 7/7 7/7 7/7 7/7 7/7
url-constructor…exclude=… 664/704 664/704 701/704 701/704 704/704 704/704
" .worker.html 664/704 664/704 701/704 701/704 704/704 704/704
url-origin.any.html 376/385 376/385 383/385 383/385 385/385 385/385
" .worker.html 376/385 376/385 383/385 383/385 385/385 385/385
url-searchparams.any.html 4/4 4/4 4/4 4/4 4/4 4/4
" .worker.html 4/4 4/4 4/4 4/4 4/4 4/4
url-setters-stripping.an… 248/260 248/260 260/260 260/260 260/260 260/260
" .worker.html 248/260 248/260 260/260 260/260 260/260 260/260
url-setters.any.html?exc… 206/226 206/226 219/226 219/226 222/226 222/226
" .worker.html 206/226 206/226 219/226 219/226 222/226 222/226
url-tojson.any.html 1/1 1/1 1/1 1/1 1/1 1/1
" .worker.html 1/1 1/1 1/1 1/1 1/1 1/1
urlencoded-parser.any.ht… 105/105 36/36 105/105 36/36 105/105 36/36
" .worker.html 105/105 36/36 105/105 36/36 105/105 36/36
urlencoded-parser-reques… D.N.E. 71/71 D.N.E. 71/71 D.N.E. 71/71
" .worker.html D.N.E. 71/71 D.N.E. 71/71 D.N.E. 71/71
Subtest Total 5689/6628 3827/4032 6544/6628 3972/4032 6620/6628 4024/4032
" in percentage 86% 95% 99% 99% 100% 100%
With proposed stats update 7419/9092 8984/9092 9084/9092
" in percentage 81% 99% 100%

(D.N.E. = did not exist)

I'm not sure why there are no results for Edge on the PR stats, but I've omitted Edge from the above table for that reason.

As you can see, only Chrome is affected in a significant way. This is primarily because of the 2000+ subtests in IdnaTestV2, they only pass a bit over half while Firefox and Safari pass all of them. Without adjusting what tests are covered by the Interop label, Chrome would get an artificially inflated score because IdnaTestV2.window.html no longer exists. With my proposed adjustment, Chrome would get an artificial drop in score without any actual regression causing it, because IdnaTestV2 is now executed in two environments.

Here are some solutions I've thought of to avoid this artificial drop:

  • Adopt my proposal above and do nothing else; no idea if this drop is significant in the actual final score!
  • Adopt my proposal above, except omit IdnaTestV2.worker.html from the stats; this should put Chrome approximately back at the score they previously had.
  • In f382c0c#diff-eb593412b052468f9080ba5210283245bce8697c5b966f6f05f408e048f5d067, omit dedicatedworker from the meta comment, or drop that hunk of the diff altogether; open a followup PR to be merged after Interop 2024 is over; adopt the rest of my proposal above for the files other than IdnaTestV2.

@annevk Let me know how you'd prefer to proceed!

@annevk
Copy link
Member

annevk commented Nov 5, 2024

Thanks @ptomato.

I think for split tests we should only include the any.html and not the any.worker.html. That way we don't increase or decrease the overall number of tests in Interop. Assuming the labeling is done well that should then amount to this being a no-op, which is what we'd want at this point. Which would reduce your list down to:

  • IdnaTestV2.any.html
  • urlencoded-parser-request-response.any.html
  • url-constructor-base-failure.any.html

And we'd continue to include what we include now.

@gsnedders @jgraham @foolip please let me know if you have any concerns.

@ptomato
Copy link
Contributor

ptomato commented Nov 7, 2024

Sounds good to me.

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, but please give @jgraham @gsnedders @foolip one week to weigh in before merging this. And if this does turn out to not be a no-op for Interop 2024 it'll have to be reverted.

@jgraham
Copy link
Contributor

jgraham commented Nov 8, 2024

The scoring system is sum over all tests of fraction of subtests passing divided by the total number of tests e.g. if you have two tests with scores 2/2 and 9/18 then the Interop score will be (1 + 0.5)/2 = 0.75, not 10/20 = 0.5.

So I think to land this we also need a PR against wpt-metadata to add the labels to whatever new tests require them, so that it can land right afterwards (I think it will be blocked until this PR is landed due to the missing tests). And it needs to be clear that the interop scores are not going to change once both PRs are landed.

@ptomato
Copy link
Contributor

ptomato commented Nov 8, 2024

Ah, that does change things somewhat, as test failures in a file with few tests have a much larger impact than test failures in a file with many tests.

I've done a quick spreadsheet calculating the scores that way and given Anne's proposed list of wpt-metadata changes from #41985 (comment). I'll spare you the details but here is the result.

Chrome Firefox Safari
Before 95.70% 99.33% 99.79%
After 95.82% 99.19% 99.80%
Difference +0.12% −0.15% +0.02%

Not sure if these changes are small enough to be considered just noise, or this would be a problem?

An alternative would be to only replace IdnaTestV2.window.html with IdnaTestV2.any.html in wpt-metadata. That would give only +0.01% to Firefox and leave the other two scores unchanged to two decimal places. (i.e., hundredths of a percent.) It'd be less accurate IMO because it would actually move existing tests out from under the interop umbrella, but it'd be the most no-op possible given the constraints of the scoring system.

Copy link
Member

@gsnedders gsnedders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if these changes are small enough to be considered just noise, or this would be a problem?

Given it's not a no-op change to Interop, this needs an Interop Test Change Proposal issue opened before it (and the as-of-yet-unwritten wpt-metadata PR) can land together.

This view exaggerates the differences this PR makes (through the inclusion of is:different), but does show there are meaningful changes. (This shows the proposed Interop results, I believe.)

(FWIW: I'm not opposed to this change being made, it just needs to go through the right process given the Interop impact.)

ptomato added a commit to ptomato/wpt-metadata that referenced this pull request Nov 12, 2024
In web-platform-tests/wpt#41985 we are proposing
to rename and split some tests, in order to increase test coverage for
URL in ShadowRealm scopes. ShadowRealm is irrelevant to Interop, but we
do have to split some files that previously contained a mix of tests
both suitable and unsuitable for executing in ShadowRealm scopes.

This PR adjusts the `interop-2023-url` label so that the splits in the
abovementioned PR have as little effect as possible on the Interop
scores.
ptomato added a commit to ptomato/wpt-metadata that referenced this pull request Nov 12, 2024
In web-platform-tests/wpt#41985 we are proposing
to rename and split some tests, in order to increase test coverage for
URL in ShadowRealm scopes. ShadowRealm is irrelevant to Interop, but we
do have to split some files that previously contained a mix of tests
both suitable and unsuitable for executing in ShadowRealm scopes.

This PR adjusts the `interop-2023-url` label so that the splits in the
abovementioned PR have as little effect as possible on the Interop
scores.

It also adjusts the browser bug metadata so that the tests referred to
are in the correct files.
@ptomato
Copy link
Contributor

ptomato commented Nov 12, 2024

Thanks for the pointers and for the nice link showing the results; much better than doing it in a spreadsheet.

wpt-metadata PR: web-platform-tests/wpt-metadata#6930
Test change proposal: web-platform-tests/interop#904

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants