Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for parse URI for hostnames #1569

Merged

Conversation

hyperbolic2346
Copy link
Collaborator

@hyperbolic2346 hyperbolic2346 commented Nov 15, 2023

This PR is branched off of #1554 and a vast majority of the code is from that PR. Once that merges, this should reduce down to very little change.

This PR adds support for hostname parsing in parse uri. This adds the tests and exposes the functionality.

closes #1500

@hyperbolic2346 hyperbolic2346 self-assigned this Nov 15, 2023
@hyperbolic2346 hyperbolic2346 changed the base branch from branch-23.12 to branch-24.02 November 21, 2023 21:02
@thirtiseven
Copy link
Collaborator

Few mismatches from integration test cases:

url cpu gpu
http://[email protected]/path?query=1#Ref spark.apache.org null
https://use%20r:pas%[email protected]/dir%20/pa%20th.HTML?query=x%20y&q2=2#Ref%20two example.com example
http://userid:[email protected]:8080 example.com null
http://userid:[email protected]:8080/ example.com null
http://userid:[email protected] example.com exampl
http://userid:[email protected]/ example.com exampl
http://-.~_!$&'()*+,;=:%40:80%2f::::::@example.com example.com null
http://. null .
http://.. null ..
http://../ null ..
http://.www.foo.bar/ null .www.foo.bar
http://.www.foo.bar./ null .www.foo.bar.
  • URLs containing USERINFO are not handled correctly.
  • hostnames have their own validation logic, they're generated by following syntax:
// hostname      = domainlabel [ "." ] | 1*( domainlabel "." ) toplabel [ "." ]
// domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
// toplabel      = alpha | alpha *( alphanum | "-" ) alphanum

@hyperbolic2346 hyperbolic2346 marked this pull request as ready for review December 5, 2023 02:19
@hyperbolic2346
Copy link
Collaborator Author

build

@thirtiseven
Copy link
Collaborator

All tests from plugin side passed (IT and the kaggle dataset), I think it's functional correct.

Signed-off-by: Mike Wilson <[email protected]>
ttnghia
ttnghia previously approved these changes Dec 5, 2023
Signed-off-by: Mike Wilson <[email protected]>
@hyperbolic2346
Copy link
Collaborator Author

build

@hyperbolic2346 hyperbolic2346 merged commit 554645d into NVIDIA:branch-24.02 Dec 5, 2023
3 checks passed
@hyperbolic2346 hyperbolic2346 deleted the mwilson/parse_uri_host branch December 5, 2023 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement HOST url parsing as part of parse_url support
3 participants