Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize tests during coverage calculation #91

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Vampire
Copy link
Contributor

@Vampire Vampire commented May 12, 2023

Depends on #90
Fixes #73
Fixes hcoles/pitest#760

This generically fixes hcoles/pitest#760 and #73 for all platform engines,
removing the Jupiter specific work-around from #74 and serializing test execution
during coverage calculation using locks. This way the tests can also properly run
in parallel later on during mutant hunting.

I tested it extensively with the MCVE from hcoles/pitest#760 and also with the same MCVE ported to Spock with parallel test execution
and got consistently the correct result of 6 killed mutants across several 100 runs. (7th mutant is immortal)


public TestIdentifierListener(Class<?> testClass, TestUnitExecutionListener l) {
this.testClass = testClass;
this.l = l;
serializeExecution = !(l instanceof NullExecutionListener);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not immediately clear why serialized execution is dependent on the type of the execution listener. Think some comments to explain what is going on are needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@@ -97,10 +102,15 @@ private class TestIdentifierListener implements TestExecutionListener {
private final Class<?> testClass;
private final TestUnitExecutionListener l;
private final List<TestIdentifier> identifiers = synchronizedList(new ArrayList<>());
private final boolean serializeExecution;
private final Map<UniqueId, ReentrantLock> parentCoverageSerializers = new ConcurrentHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a lot of locks here. I'd have imagined that we could force things to run serially using just a single lock. I'm sure that I'm wrong, but to understand what all these locks are doing I think a big comment block explaining what junit does and the rationale behind this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagined also a single lock, unfortunately at least for Spock it didn't work.
I added a wall of text explaining the situation and logic that hopefully makes it clearer.
I really was "a bit" silent in regards of comments here. :-D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also made the child locks lazily initialized atomic references now, so that the locks for childs are only created if actually needed like for data-driven Spock features.

@@ -134,6 +158,14 @@ public void executionFinished(TestIdentifier testIdentifier, TestExecutionResult
} else if (testIdentifier.isTest()) {
l.executionFinished(new Description(testIdentifier.getUniqueId(), testClass), true);
}

if (serializeExecution) {
parentCoverageSerializers.remove(testIdentifier.getUniqueIdObject());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We remove the lock from a map but don't unlock it? When is the lock released?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wall of text above should hopefully make it clear.
This is the lock for child tests, so it should not be locked at this point.

@Vampire Vampire force-pushed the serialize-tests-during-coverage-calculation branch from f960b26 to 325f0c5 Compare May 15, 2023 16:46
@Vampire Vampire requested a review from hcoles May 15, 2023 16:50
@hcoles
Copy link
Contributor

hcoles commented May 18, 2023

@Vampire Do you have the spock version of the MCVE available anywhere?

@Vampire Vampire force-pushed the serialize-tests-during-coverage-calculation branch from 325f0c5 to d01afb0 Compare May 19, 2023 12:35
@Vampire
Copy link
Contributor Author

Vampire commented May 19, 2023

Sure, I expanded the given MCVE to add Spock: pitest-issue-760-with-spock.zip
It currently points to snapshot versions for PIT and the junit5 plugin.

@Vampire Vampire force-pushed the serialize-tests-during-coverage-calculation branch 2 times, most recently from 7b7d28e to 310fd0b Compare May 19, 2023 14:01
@hcoles
Copy link
Contributor

hcoles commented May 24, 2023

@Vampire Just been taking another look at this.

Unfortunately I think the idea of stopping parallel execution during coverage, but allowing it during mutation testing is flawed. It risks the killing test being misidentified, which could give in incorrect results when history recording is enabled.

I'm also still uncomfortable about the large number of locks required to prevent parallel running. I think the current soloution of automatically disabling parallel running by automatically setting a junit configuration value is preferable. Although it doesn't work for spock, this could presumably be rectified by adding the correct setting. In theory we'd have to do this for an open ended number of test runners, but in practice there are very few in use.

@Vampire
Copy link
Contributor Author

Vampire commented May 24, 2023

Unfortunately I think the idea of stopping parallel execution during coverage, but allowing it during mutation testing is flawed. It risks the killing test being misidentified, which could give in incorrect results when history recording is enabled.

Would you mind elaborating where you see that risk?
For the coverage calculation, definitely, as it is collected in static state so parallel execution is a killer.
But for identifying the killing test you call TestUnit#execute with a ResultCollector that records the results of the tests.
This ResultCollector is called in the JUnit5TestUnit from a JUnit platform TestExecutionListener, so whether the tests are run in parallel or sequentially should not make any difference.

You just have to make sure you properly handle the calls in the result collector in a thread-safe way.
Indeed this is currently not the case, but from a cursory look I'd say changing the ArrayLists in the two Container implementations to a thread-safe collection will already be enough for that.
You should probably anyway do this, as there could be other PIT engine plugins out there that call the collector from varying threads and not always from the same one.

I'm also still uncomfortable about the large number of locks required to prevent parallel running.

And I'm still unaware of what you mean.

For Jupiter tests you will have exactly 1 lock, the global lock, as there are no nodes with type CONTAINER_AND_TEST and so the situation as with Spock cannot happen.
So all tests use the same 1 global lock.

For Spock, there will be at most 2 locks locked at the same time.
All features (i.e. test methods) are locked with the same 1 global lock.
For each data driven feature (i.e. the ones with a where: clause) there will be one additional lock, but only one of them will be locked at all times as the features are all sequentialized on the global lock.
So actually I can change the code that all iterations of data-driven methods use the same lock, so in the Spock case there will even be exactly 1 or 2 locks existing then.

It is just implemented like it is to cover all platform engines someone could come up with sanely.
But the amount of locks is actually minimal and never a "large number".

I think the current soloution of automatically disabling parallel running by automatically setting a junit configuration value is preferable.

Well, our opinions greatly differ here and I still hope I can convince you. :-)
From what I have seen so far, there is no real reason to forbid parallel execution, except for during coverage calculation.

Although it doesn't work for spock, this could presumably be rectified by adding the correct setting. In theory we'd have to do this for an open ended number of test runners, but in practice there are very few in use.

There is no setting you can set that overrides it for Spock afair.
There is the system property spock.parallel.enabled, but that is only used as a default if the Spock config file does not configure a value.
If the Spock config value sets a value, it wins.

There is also no setting on the JUnit platform level I'm aware of that could disable this, because it is the test engine that decides to either use a ForkJoinPoolHierarchicalTestExecutorService instance, or a SameThreadHierarchicalTestExecutorService instance and with which parallelism settings. If the parallelism setting is calculated from the available CPU cores, and you are running on a JVM that has the right -XX:... switch available, you could restrict the available CPU cores. But the parallelism could as well be set to some fixed value and thus the available CPU cores would bepointless. And even if the JUnit platform would have or introduce a platform-level switch to disable parallel running, that would also only help for those engines that actually use this facility. A test engine can practically do whatever it wants and use a complete own implementation for whatever reason.

I really do think that using the locks during coverage calculation and accepting parallel execution later on is the proper way to go. It is imho the easiest, than to always react to a new engine being brought up and then first make it add an overwrite setting and then use it and so on, when this solution here should just generally work for practically any sane implementation someone can come up with without the need to do anything further.

The non-thread safe parts that could fail like the mentioned ArrayList or the other spots you accepted my PRs for already should be fixed anyway, as you never know what a test engine does, or if it will run all tests on the same thread even if run sequentially. There is for example right now a [bug / strange behavior / unexpected behavior / depends on PoV how you call it] that causes too many tests to run in parallel when parallel execution is enabled. The cause for it is in the platform code, so for example Jupiter and Spock are both affected the same. And also the work-around for both is the same. Have an interceptor that does the actual test execution on a different thread / thread pool than the ForkJoinPool the test scheduling happens on.

This work-around will typically be used independent of whether parallel running is currently enabled or disabled. You put it in place because you enable parallel running and even if for whatever reason you temporarily disable parallel running (or even permanently because by then you forgot you made this work-around), the work-around will still be in place. So if you now disable parallel running using the settings switch of Jupiter or one that might be introduced for Spock and all test scheduling now happens on the same thread, the test execution and thus ResultCollector notification will still happen on various other threads.

So all according code that somehow communicates with the test execution or discovery phase needs to be made thread-safe where not already happened anyway or you get nasty multi-threading problems that will be hard to find and fix.
Which should most proabably already be achieved by my past PRs, except for this last one location.

@Vampire
Copy link
Contributor Author

Vampire commented May 25, 2023

So actually I can change the code that all iterations of data-driven methods use the same lock, so in the Spock case there will even be exactly 1 or 2 locks existing then.

Nah, that's not as trivial as I thought while writing that sentence, because you need to recognize that there is a parent locked and on which layer you are.

Having looked over the implementation and what I imagined as replacement, I think the current code is fine already.
Again talking about the Spock case, there actually is at most 1 or 2 locks existing at the same time already with this code.
All features are locked on the 1 global lock, so only one feature can actually start. For this started feature a potential serializer reference for childs is registered, but it is only lazily initialized if really needed.

Now it could happen that executionStarted ist called for sibling features, or child iterations.
To differentiate between theese cases we recorded the potential serializer reference. If it is called for a sibling, there is no entry for its parent and thus the root serializer is used (or if talking about the generic case, it could also be the same one, resulting in the same result, that the siblings are serialized). If it is instead called for a child, which can only be the case for the one started parent / feature, then the potential child serializer reference is found, lazily initialized and locked, so that at most one of the iterations can actually start.

This logic automatically works for any depth as long as you don't have a CONTAINER_AND_TEST with a CONTAINER child that has a TEST child. But this is what I meant with "sane implementations". I don't expect such a construct to appear and if it appears it should be handled then, by searching not only for getParentIdObject serializer, but recursively up the hierarchy until only the root serializer is left. But I would not add this most probably unnecessary complexity now but only on demand.

@hcoles
Copy link
Contributor

hcoles commented May 31, 2023

You are correct, there should no longer be a problem misreporting the killing test if they run in parallel. The minion used to send the name of a test back to the parent process before executing it so the parent was guaranteed to receive it even if the test caused an infinite loop. This would not work if the tests ran in parallel, but I'd forgotten that this got ripped out because it proved too flakey. We now accept that we won't know which test triggered an infinite loop and all data is sent after execution.

The "lots of locks" comment is because looking at the code statically the number of locks is unbounded (we have two maps of them). The fact that we have only 1 or 2 locks at runtime is not obvious without intimate knowledge of how spock, jupiter, cucumber and the other test runners behave at runtime.

Given that tests cannot be run in parallel during the coverage stage, and the benefit of them being in parallel during the mutation analysis stage is unclear (pitest will only run one at once so the only possible parallelism is where a "unit" contains more than one test) the simplest, easiest to maintain soloution would be to disable parallel running at the level of the test framework.

Unfortunately that doesn't seem to be an option.

So, the two possible ways forward are

  1. Merge this and allow parallel running
  2. Throw an error/log when it's detected and ask the user to disable it

My preference is always to keep things as simple as possible (i.e 2), but if you're happy to maintain this going forward I'll go with 1 and merge it in.

@Vampire Vampire force-pushed the serialize-tests-during-coverage-calculation branch from 310fd0b to 6020822 Compare June 9, 2023 16:54
@Vampire
Copy link
Contributor Author

Vampire commented Jun 9, 2023

pitest will only run one at once so the only possible parallelism is where a "unit" contains more than one test

Which I hope to get changed with hcoles/pitest#1223 once the JUnit platform launcher supports aborting a run gracefully. :-)

But for now at least for the hunting minion test discovery and for atomic units with mutliple tests it will speed up a bit.

Merge this and allow parallel running

❤️

if you're happy to maintain this going forward I'll go with 1 and merge it in.

Sure, no problem.
I'm honest, I will not follow the issues, I have enough on my plate to read already.
But feel free to ping me anytime if an issue comes in regarding it and I'll try to take care as soon as my time allows. :-)

@Vampire
Copy link
Contributor Author

Vampire commented Jun 10, 2023

Maybe it would make sense though to first cut a new pitest release and depend on that from this one, so that the result collector is thread-safe which should have been the last unsafe part hopefully.

@Vampire
Copy link
Contributor Author

Vampire commented Jun 22, 2023

Any idea when you get a chance to cut a release of pitest @hcoles?
I actually use a combination of this PR and #93.
Two of the test specs in the project get treated atomically and are hitting exactly the result collector problem.
30 - 40 of the 338 mutants where those tests are relevant are "killed" by RUN_ERROR due to a NullPointerException in the non-thread-safe result collector.
With the latest master code, everything works fine.

@hcoles
Copy link
Contributor

hcoles commented Jun 23, 2023

@Vampire 1.14.2 should be available from maven central now.

This generically fixes hcoles/pitest#760 and pitest#73 for all platform engines,
removing the Jupiter specific work-around from pitest#74 and serializing test execution
during coverage calculation using locks. This way the tests can also properly run
in parallel later on during mutant hunting.
@Vampire Vampire force-pushed the serialize-tests-during-coverage-calculation branch from 6020822 to b66b56d Compare June 23, 2023 13:04
@Vampire
Copy link
Contributor Author

Vampire commented Jun 23, 2023

Perfect, thank you very much.
I also updated the PR here to depend on the new version.

@Vampire
Copy link
Contributor Author

Vampire commented Jul 21, 2023

So, anything left to do here? :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants