-
Notifications
You must be signed in to change notification settings - Fork 118
Integration testing on master #568
Comments
are you thinking of factoring the IT into a separate repo? |
I think that might be one way. If we just have the integration tests be built separately, as a project by itself, and then invoke it against master. It might not be the most expedient - open to other suggestions as well. |
One advantage of that would be that it would be convenient to make separate branches, having different IT subsets targeted to various upstreaming stages of functionality |
Would we then have to re-integrate them into the upstream so it has the correct IT? |
It might be desirable for upstream also to have the same decoupled integration testing, similar to YARN/Mesos. |
+1, that makes it a two-fer |
maybe even a 3-fer, it keeps more code out of the main upstream PRs |
cc @felixcheung |
Having it outside of the main repo is easier and faster to iterate, which is why we did it also for Mesos.. |
I added a tracking issue for jenkins-infra: https://github.com/ucbrise/jenkins-infra/issues/86 |
when you say decoupled testing, same as YARN/mesos, do you mean the test and source will be completely outside of the Apache Spark git repo? (because as far as I can see, there is no open source YARN integration tests for Spark, for example) |
But I’d think that not having OSS YARN integration tests is a bad thing. I
think having some would have been good. In the current state, it’s hard for
contributors that aren’t at a company that has their own private test suite
to be confident of their changes, especially significant ones. I think we
can have some for K8s, it’s a good thing.
As mentioned in the thread, it doesn’t have to be in the same repo. The
disadvantage of this, though, is that they are harder to keep in sync - if
we change Spark in some way that enables/requires an integration test
change, it becomes harder for every user of the integration tests.
Personally, I think the advantages of a separate repo outweigh the
disadvantage I listed above. I think we should go for a separate repo.
Sean
On November 30, 2017 at 10:29:25 PM, Felix Cheung ([email protected]) wrote:
when you say decoupled testing, same as YARN/mesos, do you mean the test
and source will be completely outside of the Apache Spark git repo?
(because as far as I can see, there is no open source YARN integration
tests for Spark, for example)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#568 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB2E24mGsg57hAvHIJkJR80AQlhGgm8Lks5s75zEgaJpZM4Qw9mJ>
.
|
I agree with the separate repo. However, this will require a group of separate group to maintain and monitor this repo as well. |
If it's about fast iteration, I'd say go for it. |
I agree - we shouldn't look at it as a separate thing long term. It's expedient for now to separate it out - to get some confidence in upstream by running tests against it. Eventually, it should live alongside the rest of the code upstream, but I do think it needs some hardening before that, and we can't hit that for 2.3 |
We probably want to use the testing version provided by #521. There's one more change required to make this pass on Amplab Jenkins. I'll patch this branch and bring it up to speed with our mainline, then we should think about how we're going to move the code over. Automation is something to consider as well. We probably want to run these tests in the separate repository on every commit to master that affects Kubernetes. We could run the suite on every commit to master in general as well. |
I think we can break the integration testing out further after #521 is merged. The immediate next steps I see are:
For now, it would still depend on minikube. In the future, we can do even better.
This would enable running on other k8s clusters, enabling for example, stress testing on GKE if we want to do that, and also let other users test the release on their own clusters using our integration tests. |
Are we still planning to run integration tests in the Jenkins PR builder on master? Or was the plan to run the tests in our own automated system? |
Not sure if we'll get that done in time. |
@foxish I was thinking about this a little more and am concerned that this setup isn't too intuitive. (edit: the proposed setup being #568 (comment)) What these semantics are basically saying is that the main repository depends on the tests. This communicates that the main repository is checking the correctness of the tests, not the other way around. To illustrate this point, consider the situation where a developer writes a new integration test for a feature they just merged into master. Consider the workflow:
This sequence of events makes it clear that the main repository is determining whether or not the integration tests are correct. But the more intuitive mental model is that the integration tests are validating the correctness of a given Spark artifact. Consider also the situation when the main repository is changed such that the assumptions made by the given integration test version are invalid. In this workflow, the developer is blocked from merging into master before the new integration test version is published. But in between the time that the new integration tests are published and the main change merges, that new integration test version is only applicable for a version of the main repository that doesn't exist yet. So I think we want the integration tests to pull down and test the repository under test, not the other way around. |
So I think we want the integration tests to pull down and test the
repository under test, not the other way around.
I definitely agree. I think this direction of dependency (repo under test
is unaware of what will test it, repo containing tests is aware of what it
will be testing) allows you to more easily make advanced changes to the
main repo and integration tests in synchronization.
Sean
On December 6, 2017 at 10:35:27 AM, mccheah ([email protected]) wrote:
I was thinking about this a little more and am concerned that this setup
isn't too intuitive.
What these semantics are basically saying is that the main repository
depends on the tests. This communicates that the main repository is
checking the correctness of the tests, not the other way around.
To illustrate this point, consider the situation where a developer writes a
new integration test for a feature they just merged into master. Consider
the workflow:
1. Developer writes unit tests and main feature in apache/master. They
don't update the integration test version because integration tests aren't
published yet. The change is merged.
2. Developer writes and publishes integration tests that are broken for
the feature.
3. Developer updates main repository's integration test dependency to
try the new tests.
4. Developer finds the tests are wrong, so they have to publish a new
integration test version.
This sequence of events makes it clear that the main repository is
determining whether or not the integration tests are correct.
Consider also the situation when the main repository is changed such that
the assumptions made by the given integration test version are invalid. In
this workflow, the developer is blocked from merging into master before the
new integration test version is published. But in between the time that the
new integration tests are published and the main change merges, that new
integration test version is only applicable for a version of the main
repository that doesn't exist yet.
So I think we want the integration tests to pull down and test the
repository under test, not the other way around.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#568 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB2E28lxPQj4SQIZe8Cw4SjqVlWyjqsXks5s9t5vgaJpZM4Qw9mJ>
.
|
I agree with that @mccheah. I wasn't proposing the opposite but I guess it got confusing since I went into a bit of the implementation as well in my comment. I don't think we want to "publish" the integration test jar as a separate entity. My point was that jenkins could take care of building the latest integration tests (from wherever they live, which may be for now - https://github.com/apache-spark-on-k8s/spark-integration), and the distro (from a particular PR), and then have the integration tests run on the distro. The point about jars was just to make the separation point and isn't really a necessity. I don't mind having the integration tests actually pull down the repo and build it for now if it helps save time. I think it's the same whether jenkins does, or if we have the test logic itself do it. |
Ah, sorry for the misunderstanding there. I think we want the integration tests to target a git hash on either a remote repository to clone or a local checkout for development iteration. But maybe we should support testing against Spark distribution tarballs also because we can save build time by using a pre-published Spark distribution instead of building it ourselves in the integration test CI. |
Elaborating on #545 (comment)
We need to separate our integration testing out - so we can make it run any arbitrary branch - including master.
cc @apache-spark-on-k8s/contributors @ifilonenko @erikerlandson @ssuchter @kimoonkim
The text was updated successfully, but these errors were encountered: