-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide option to enable pushing submodule commits to a branch of the same name as the destination meta commit #726
Comments
The architecture doc explains why we don't do this: because then you have (as you note) the possibility of "shear" between the submodule branches and the meta branches. If some_subrepo has branch release_v1 set to commit X, but the meta repo's branch release_v1 is set to commit Y, who wins? The only possible answer is the meta repo, because that's the only one we can update atomically. The idea is that you never make submodule commits outside of the context of the meta repo. That's what git meta is for: to make it easy to make submodule commits from within the meta repo. |
I read through the architecture doc in detail. Maybe I'm missing something, but how is this different from your local repo branch pointing to one commit, and the remote repo pointing to a different commit? If the branches have diverged, then "git push" fails until you resolve the divergence through rebase or merge. |
In this case, the potential shear is between the meta repository's branches and the submodules' branches. You can't push atomically to the meta and submodules, or to multiple submodules (without weird custom server stuff, anyway). So it's possible for them to get out-of-sync. The question is: what are the semantics of this? We solve the problem by ignoring submodule branches, and only considering meta branches. (Inside Two Sigma, we do have a cronjob that populates submodule branches from meta repo branches, just for ease of browsing, but it's kind of a hack). |
Yes, I see the race condition/atomic problem across multiple repos if you actually try to push updates to branch heads. I'm thinking more along the lines of a synthetic meta branch head. On meta commit, record the commit hash plus current branch_name. This hash may or may not match origin/branch_name; we don't really care. We allow divergence with origin, and maybe just print a warning. On "git meta open", do the equivalent of: git checkout -b branch_name hash At this point, the branch may be in a divergent state with respect to origin, but this is true today. The only difference is that you know what the original branch was for your rebase or merge operation, instead of having to guess. |
You don't need the original branch for your merge/rebase, because you can use git meta merge or rebase, which works on meta commits. But if this is really something that seems exciting, you might be able to do this with hooks. But I think it would be confusing to have that sort of divergence. |
Could you say a bit more about why you need the original local branchnames?
Typically, in git, the local branch names are conidered ephemeral. They may
persist in a commit message like "Merge from feature_a", but once merged,
they're typically deleted, and then they age out of the local reflog, and
then those names are gone forever. We usually use local branch names as a
hint to the local developer, to help them switch between multiple
simultaneous in-progresss efforts.
It sounds like you have a different use case for local branch names? And it
also sounds like you kind of care about remote branch names, but also kind
of don't care? If you could say more about what you use the branch names
for, maybe we could find a way to support your workflow with git-meta. (In
particular, if remote branch shear is not a problem for you....)
|
Perhaps one could use the submodule.<name>.branch config in .gitmodules for
this:
https://git-scm.com/docs/gitmodules#Documentation/gitmodules.txt-submoduleltnamegtbranch
I believe git-meta currently ignores this property (and we don't set it
inside Two Sigma), but it might be easy to make git-meta open do what
you're looking for when it's present.
…On Sat, Sep 7, 2019, 10:02 Adam Bliss ***@***.***> wrote:
Could you say a bit more about why you need the original local
branchnames? Typically, in git, the local branch names are conidered
ephemeral. They may persist in a commit message like "Merge from
feature_a", but once merged, they're typically deleted, and then they age
out of the local reflog, and then those names are gone forever. We usually
use local branch names as a hint to the local developer, to help them
switch between multiple simultaneous in-progresss efforts.
It sounds like you have a different use case for local branch names? And
it also sounds like you kind of care about remote branch names, but also
kind of don't care? If you could say more about what you use the branch
names for, maybe we could find a way to support your workflow with
git-meta. (In particular, if remote branch shear is not a problem for
you....)
|
We have a fairly large organization with multiple business units and multiple product groups within each business unit. Each product group has its own set of git repositories. We want to move to a monorepo across our entire company, but the scaling issues associated with one giant git repo is not acceptable. The git-meta architectural doc pretty much sums up our own conclusions, and we are now exploring using git-meta. As a pilot deployment, we would layer git-meta on top of our existing repos. This was one of the bonuses of git-meta; it could coexist with our current multi-repo workflows. However, deploying git-meta without disrupting current multi-repo workflows means that what we see at the git-meta level should be consistent at the multi-repo level, and vice-versa. We have a fairly standard release process. We create a permanent release branch (across all repos of interest), and hotfix those release branches as needed. If we hotfix at the meta level, we would need to make sure those hotfixes are reflected at the multi-repo release branch; they can't just live at the meta level. This means checking out the release branch for the subrepo, merge/rebase the meta commits, then pushing. Relying on every developer to manually figure out what branch to check out when synchronizing the subrepo itself would be very error prone. Ideally, when we check out a branch of the meta and do a "git meta open", it would automatically change your local branch appropriately. At the subrepo level, git operations would be very natural to the developer and "just work" on the correct subrepo branch:
vs
We don't need git-meta to do any automatic subrepo branch ref updating. The subrepo branch pushes can be left outside of the scope of git-meta. The only additional functionality is that enough info is recorded in git meta commits so that a git meta open would automatically do the correct "git checkout -b" and "git branch --set-upstream-to" upon a "git meta open". |
I think we would accept a patch to do this, as long as it was optional. |
We have a fairly standard release process. We create a permanent release
branch (across all repos of interest), and hotfix those release branches as
needed.
What happens if two people try to push a hotfix to the same branch at the
same time? The architecture doc describes a possible race here: if each of
two pushes succeeds in pushing to a different set of repos, they can become
permanently deadlocked. Do you force developers to take out a central lock
for the duration of the hotfix push?
If we hotfix at the meta level, we would need to make sure those hotfixes
are reflected at the multi-repo release branch; they can't just live at the
meta level. This means checking out the release branch for the subrepo,
merge/rebase the meta commits, then pushing.
Makes sense, but it doesn't sound like you actually have any dependency on
the *local* branches in users' submodules. It sounds like it would be
enough to ensure that, when pushing meta commit M:m={A:a, B:b} to branch
br1, each submodule commit a,b,... must also be at the tip of a branch br1
in its own remote. You could probably do this without any change to
client-side git-meta, by simply adding a remote pre-receive hook--something
like this:
1. take the global lock on the name br1
2. For each submodule {$remote, $commit} in m, in parallel:
2a. assert that $remote contains $commit (this is what the current
pre-receive hook included in git-meta checks)
2b. assert that the current branch br1 in $remote is an ancestor of $commit
(else die and reject push)
3. For each submodule {$remote, $commit} in m, in parallel:
3a. fast-forward br1 to $commit in $remote
4. relase the lock on br1
There might be some extra complexity around submodules being
added/deleted/relocated, especially if you allow multiple commits to be
pushed at once.
Also, if not everyone is using git-meta, there's a risk that someone would
push a hotfix to the submodules' branches and neglect to update the meta
branch. If you already have some exogenous procedure for locking the
release branch to push hotfixes, you could probably patch it to ensure
integrity with the meta repo.
At the subrepo level, git operations would be very natural to the developer
and "just work" on the correct subrepo branch:
cd meta
git meta open subrepoA
cd subrepoA
# make changes to subrepoA
git commit
git pull --rebase # Just works. We are on the correct subrepo branch associated with the meta branch
git push
git meta push
We discourage our users from doing manual pulls and pushes in the
submodules. It quickly causes the meta repo to get into inconsistent states
which are hard for the user to understand. But, as described above, there
should be no need for it. If you have taken care to ensure that the
meta-repo branch br1 always points each submodule s to the head of br1 in
s's remote, then user can simply `git meta pull --rebase` to atomically
bring them from one consistent state to the next, and a single `git meta
push` can atomically publish their work.
|
Currently, pushes happen manually one repo at a time (using the deprecated "gits" for some groups, manually by other groups), so pushes involving multiple repos can be interleaved between two people. At this point, both have to pull (whatever subset of repos that has been pushed), compile, run tests, then continue pushing. It's true that for a very short period of time, repos become out of sync, but this is quickly resolved by both parties. We've learned to live with this in our multirepo system. It's similar to a bad push causing compile or QA failures; when it happens, it's the highest priority to fix immediately. Manually pushing the subrepos, even with git-meta, would maintain the status quo. However, git-meta pushes would be atomic and record states before we get into manually resolving the "race condition", so this would be an improvement to our current system.
I agree that working completely in git-meta and not at the submodule would be ideal. However, the reality is that we will not be able to instantly change our entire company and internal processes to use git-meta with the flick of a switch. Not everyone is convinced that monorepo is the way to go. We will need to support both git-meta monorepo and our existing multirepo workflows for the transition period, and have a fallback plan if git-meta proves to be problematic for risk management. I believe that this would be true for any company with well established multi-repo workflows. As far as using hooks and a global lock, I'm hoping we can avoid having to do that. The meta repo would be the one "source of truth," and if any submodule activity causes divergence from the meta repo, we would resolve that at the meta level and then push the resolution back to the submodule. |
I agree that working completely in git-meta and not at the submodule would
be ideal. However, the reality is that we will not be able to instantly
change our entire company and internal processes to use git-meta with the
flick of a switch. Not everyone is convinced that monorepo is the way to go.
Oh believe me, I understand how that can go :)
To be clear, I didn't mean to encourage the whole organization to use
git-meta instead of git. I meant that once a particular user has decided to
use a git-meta clone, it's best if that user sticks to `git meta push` and
`git meta pull` in that clone, rather than mixing in raw git submodule
push/pulls. Maybe your users will turn out to be more submodule-savvy than
ours, but most have found it terribly confusing.
As far as using hooks and a global lock, I'm hoping we can avoid having to
do that. The meta repo would be the one "source of truth," and if any
submodule activity causes divergence from the meta repo, we would resolve
that at the meta level and then push the resolution back to the submodule.
For release branches, if your push rate is low, you may be able to get away
with it. But I would like to opine that the biggest benefits of monorepo
development do not accrue until you start using git-meta to snapshot all
pushes to master. For example, bisecting across the history of the meta
repo to find a bug is extremely powerful. (To be fair, this is also when
the biggest pains accrue. A good discussion of the tradeoffs is at
https://trunkbaseddevelopment.com .)
… |
The problem is when we have some people using git-meta and some not. How does a meta-user push his commits back to the subrepo so that the non-meta-user can see them? It's easy without branching, as there is only one branch (master). However, when there are branches, this branch selection becomes problematic. I'll play around with branching within meta and construct a usage example where branch info is stored. |
Would it suffice if `git meta push origin HEAD:refs/heads/foo` would
attempt to push each changed submodule to `refs/heads/foo` in its own
remote? That should be pretty easy to implement, and wouldn't require any
local tracking of branch history.
It will have the consistency problems already discussed, and an additional
problem that, without some assistance from a remote hook of some kind, it's
impossible for the client to know exactly which submodules need pushing.
(Unless you have few enough submodules that you can expect users to open
all of them.)
|
Yes, I think that would do the trick. The local tracking of branch history is not needed. Git meta open should also read this branch and set the subrepo to this local branch name. Any coordination with the subrepo origin/branch_name would be left up to the user (along with all the pitfalls). We can tool around this part. |
Ok, I updated the title to reflect the new goal. I propose the config be named |
What do you want to do about the consistency problems? Warn the user when things get inconsistent? That seems fine, I guess. |
Yeah, I guess the default should be something like:
1. (optional, but probably a good idea): fetch all repos that you plan to
push, and check that each push is a fast-forward. If any aren't, fail early
with a message like "Please pull and rebase/merge; if that doesn't help,
the remote server may in an inconsistent state".
2. Start pushing repos; if any submodule push fails after the first one
succeeds, it should finish other submodules(?), abort the meta push, and
print a big scary warning that the server may now be inconsistent
3. Maybe add a --keep-going flag to complete the meta push and also print
the scary warning.
|
For consistency problems, a warning would be sufficient and it's ok to leave it to the user to resolve. Right now, we use the deprecated "git-slave", and if a push fails on a repo, we know we are temporarily in an inconsistent state, but we just resolve it immediately. I like the --keep-going option. Thanks for implementing this. We are currently doing a pilot project with git-meta. If successful, we will roll it out to one product group, followed by one Business Unit, followed by the entire company. |
Sorry, just to be clear: we'll happily take a patch on this, but I don't think we're likely to implement it ourselves. |
I see. Guess I'll have to start looking at the source code. Reaching out to anyone else out there who is already familiar with the code and is willing to make the enhancement... |
Our typical workflow very much depends on branches across multiple repos. The meta repo state that should be recorded for every commit should not only record the hash of the subrepos, but also the branch of the subrepo. This branch info is pretty important.
Envisoned workflow:
The next time we need to do a hotfix on the release branch, what I'd like to do:
At this point, if someone else has directly made updates to branch some_subrepo/release_v1, I can just to a "git pull --fast-forward" to bring things up to date.
Without his branch info, we have to manually guess or somehow record as part of the commit what the "working branch" was at the time of the "git meta commit". A meta branch could potentially mix and match subrepo branches (i.e. meta:feature_a = subrepo1:master + subrepo2:feature_a + subrepo3:feature_b). On a new clone, we want to know which branch we should continue to work on for each subrepo.
The "git meta" can remain lightweight and not push branch names upstream, and leave this as a manual step for each subrepo.
The text was updated successfully, but these errors were encountered: