Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make sure to use only the RegExpProxy interface #1419

Closed
wants to merge 2 commits into from

Conversation

rbri
Copy link
Collaborator

@rbri rbri commented Nov 12, 2023

No description provided.

f.getFunctionName());
if (args.length > 0) {
RegExpProxy reProxy = ScriptRuntime.getRegExpProxy(cx);
if (reProxy != null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in assuming that args[0] can never be something considered a regex if ScriptRuntime.getRegExpProxy(cx) returns null? Otherwise wouldn't these changes modify the behavior slightly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @p-bakker,

there are already some places in the code that checking that the regexpproxy is not null - without having one the whole regex support is not working and crashing. See org.htmlunit.corejs.javascript.ScriptRuntime.checkRegExpProxy(Context) and the caller of this method.

org.htmlunit.corejs.javascript.ScriptRuntime.setRegExpProxy(Context, RegExpProxy) - the only way to set the proxy does also an null check and throws if not null.

The only way to get null here is to remove the default from the classpath/classolader - see the lazy initialization in org.htmlunit.corejs.javascript.Context.getRegExpProxy()

I think we can assume here that the proxy is configured...

Hope that is enough text to get this merged

@p-bakker
Copy link
Collaborator

Could you add some description on this PR to explain the reasoning behind it?

@rbri
Copy link
Collaborator Author

rbri commented Aug 10, 2024

Could you add some description on this PR to explain the reasoning behind it?

At least i can try...

If you like to customize (as an project that uses Rhino) the current regex implementation you have to replace to things - the RegExpProxy implementation and the NativeRegExp impl. Both replacements can be done by subclassing the existing rhino classes or by creating your own. HtmlUnit does exactly by following the subclassing path.

Now i have started to switch to a more advanced replacement, and soon found out, that it might be more reasonable to replace both implementations. The api makes that possible - but we still have this instanceof check in our source code. Therefore this PR fixes this by using the reProxy.isRegExp() instead (and using this method in this scenario is exactly what the method is for.

Ok, and there is another (major from my point of view) improvement in this PR - i have added ArchUnit to our test suite as starting point to be able to enforce some design rules for the whole code base. I use ArchUnit since some years for HtmlUnit and it is a great help....

@p-bakker
Copy link
Collaborator

Will respond more indepth next week (sorta off this week), but given your understanding of the proxy pattern used here can you comment on the seemingly improved performance after implementing the same pattern for quasi call sites, as suggested here

@gbrail
Copy link
Collaborator

gbrail commented Aug 18, 2024

I will look at this too. In the meantime... I'm interested in what you did with HtmlUnit. My basic and very naive question is this -- does Rhino still need its own custom regexp implementation or could we use the java.util.regexp package instead?

@gbrail
Copy link
Collaborator

gbrail commented Aug 18, 2024

Also, the design rule check is interesting and it'd be interesting to see if we should use it in other places.

However, we have modules now. The org.mozilla.javascript.regexp package is currently not exported, so no one outside the "rhino" project can depend on it now. And if we wanted to be even more pedantic, we could consider moving this package to a separate module, so that we can guarantee that the "rhino" module builds without it. That would complicate people's builds, though, so it's debatable whether it's a good idea.

@p-bakker
Copy link
Collaborator

or could we use the java.util.regexp package instead?

To my knowledge they never were 100% compatible and on the other side, a lot of new features have recently been added to Regex in EcmaScript and more enhancements are on the way. So when looking into this we need to also take into account new Regex features being added to Regex in EcmaScript that Java may not support

Having said that, would be great if we could replace the engine to gain performance and be more ES-compatible!

@rbri
Copy link
Collaborator Author

rbri commented Aug 19, 2024

I'm interested in what you did with HtmlUnit. My basic and very naive question is this -- does Rhino still need its own custom regexp implementation or could we use the java.util.regexp package instead?

As far as i rember there are some fundamental differences between js regexp and java regexp - js has supports more things and therefore there is no chance to map them all to java.

if you like you can start here https://github.com/HtmlUnit/htmlunit/tree/master/src/main/java/org/htmlunit/javascript/regexp, but for various reasons this is full of hacks

@rbri
Copy link
Collaborator Author

rbri commented Aug 19, 2024

However, we have modules now. The org.mozilla.javascript.regexp package is currently not exported, so no one outside the "rhino" project can depend on it now. And if we wanted to be even more pedantic, we could consider moving this package to a separate module, so that we can guarantee that the "rhino" module builds without it. That would complicate people's builds, though, so it's debatable whether it's a good idea.

The idea is not so much about hoping others replace the regexp stuff in there impl, my idea was more to be able to offer a more mordern and complete impl for ES6 and use the existing one for the old stuff and backward compatibility.

@gbrail
Copy link
Collaborator

gbrail commented Aug 22, 2024

I am sorry that this one ended up at the bottom of my queue, and I don't know why not. I was able to merge it and resolve the conflicts, but I see some failing tests, so I'm going to ask for help to get it rebased on the latest or something close to it. Any chance that you have time for that?

@rbri
Copy link
Collaborator Author

rbri commented Aug 22, 2024

Will have a look

@rbri
Copy link
Collaborator Author

rbri commented Aug 22, 2024

@gbrail a new pr is on the way not sure why this one has passed all the test but i have a new one that again is hopefully green

@rbri
Copy link
Collaborator Author

rbri commented Aug 22, 2024

PR #1580 is ready

@gbrail
Copy link
Collaborator

gbrail commented Aug 23, 2024

Thanks for the new PR!

@gbrail gbrail closed this Aug 23, 2024
@rbri
Copy link
Collaborator Author

rbri commented Aug 23, 2024

@gbrail thanks for taking care of this - will try to make this alternative regexp impl after my vacation

@rbri rbri deleted the regexp_proxy branch November 26, 2024 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants