Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3-way merge #181

Open
uiteoi opened this issue Jun 5, 2017 · 14 comments
Open

3-way merge #181

uiteoi opened this issue Jun 5, 2017 · 14 comments
Labels

Comments

@uiteoi
Copy link

uiteoi commented Jun 5, 2017

This is a feature request.

Given a common ancestor and 2 sets of patches, provide a merged document that may include conflicts to resolve manually.

This would be useful to allow two or more users collaborate on the edition of a document, implementing an equivalent of git merge.

It is assumed that the most recent common ancestor is know and provided as a parameter:

JsDiff.merge( commonAncestor, x_patches, y_patches [, options] )

Where x_patches and y_patches are Arrays of patches for the x and y branches from the common ancestor,

Options may be used to provide a manual merge patterns which could default to:

mergePattern: [ "<<<<<<< x", "=======", ">>>>>>> y" ]

For the algorithm, see https://www.quora.com/How-does-Git-merge-work.

@gonzofish
Copy link

This was implemented but not made part of the top-level API f94da90

@goodmind
Copy link

goodmind commented Feb 7, 2019

@gonzofish how to write it to file like git does?

@gonzofish
Copy link

sorry @goodmind, I don't remember, I haven't used this library in over a year...sorry I cannot be more help

@goodmind
Copy link

goodmind commented Feb 7, 2019

@kpdecker

@doug65536
Copy link

doug65536 commented Jan 25, 2022

I have done it, but I found merge misbehaving bug Basically what I did was hand-process the "+", "-" " " prefix of the lines that result from calling merge, and handle the conflict blocks that look like { conflict: true, mine: ['#yes'], theirs: ['#no']} interspersed into the string lines of the patch. You write >>>>>>>> (emit the 'theirs' lines) ======== (emit 'ours' lines) <<<<<<<<

@ExplodingCabbage
Copy link
Collaborator

ExplodingCabbage commented Jun 24, 2024

Hmm. Just started looking at the merge.js code and besides the already-reported bugs I see more to dislike; loadPatch seems to heuristically detect if the passed param is a patch or not by looking for substrings like @@, and change its behaviour accordingly, which seems obviously fragile/incorrect given that you could legitimately want to diff two files that contain those substrings. An API that depends upon that kind of heuristic to determine how to treat its inputs is fundamentally bad, IMO.

@ExplodingCabbage
Copy link
Collaborator

I just had another few minutes to look at this. I've been reading the key bit of logic here and trying to reason about it:

  while (mineIndex < mine.hunks.length || theirsIndex < theirs.hunks.length) {
    let mineCurrent = mine.hunks[mineIndex] || {oldStart: Infinity},
        theirsCurrent = theirs.hunks[theirsIndex] || {oldStart: Infinity};

    if (hunkBefore(mineCurrent, theirsCurrent)) {
      // This patch does not overlap with any of the others, yay.
      ret.hunks.push(cloneHunk(mineCurrent, mineOffset));
      mineIndex++;
      theirsOffset += mineCurrent.newLines - mineCurrent.oldLines;
    } else if (hunkBefore(theirsCurrent, mineCurrent)) {
      // This patch does not overlap with any of the others, yay.
      ret.hunks.push(cloneHunk(theirsCurrent, theirsOffset));
      theirsIndex++;
      mineOffset += theirsCurrent.newLines - theirsCurrent.oldLines;
    } else {
      // Overlap, merge as best we can
      let mergedHunk = {
        oldStart: Math.min(mineCurrent.oldStart, theirsCurrent.oldStart),
        oldLines: 0,
        newStart: Math.min(mineCurrent.newStart + mineOffset, theirsCurrent.oldStart + theirsOffset),
        newLines: 0,
        lines: []
      };
      mergeLines(mergedHunk, mineCurrent.oldStart, mineCurrent.lines, theirsCurrent.oldStart, theirsCurrent.lines);
      theirsIndex++;
      mineIndex++;

      ret.hunks.push(mergedHunk);
    }
  }

Maybe I'm missing something, but just by inspection, I think I immediately see a bug in that else block at the end - namely that it doesn't handle the case where one hunk from mine overlaps with two separate hunks from theirs, or vice versa. So it'll (I think) produce a patch with overlapping, perhaps conflicting, hunks. I should probably create a test case demonstrating this.

@ExplodingCabbage
Copy link
Collaborator

Yeah, the bug I spotted in review is real. Here's a demo script:

diff = require('.');

patch1 = `--- foo.txt   2024-06-26 20:47:03.936414424 +0100
+++ bar.txt 2024-06-26 20:47:04.892377928 +0100
@@ -1,10 +1,10 @@
 one
-two
-three
-four
-five
-six
-seven
-eight
-nine
-ten
+zwei
+drei
+vier
+fünf
+sechs
+sieben
+acht
+neun
+zehn
`

patch2 = `--- foo.txt   2024-06-26 20:47:03.936414424 +0100
+++ baz.txt 2024-06-26 20:47:06.544314866 +0100
@@ -1,4 +1,4 @@
-one
+un
 two
 three
 four
@@ -7,4 +7,4 @@
 seven
 eight
 nine
-ten
+dix
`

diff.merge(patch1, patch2)

There's a conflict between the two patches (the first one deletes ten and replaces it with zehn, while the second one deletes ten and replaces it with dix) and so we SHOULD get one hunk with conflict: true. But this gets missed by merge and we don't get conflict: true; instead we get a patch with these overlapping hunks:

[
  {
    oldStart: 1,
    oldLines: 10,
    newStart: 1,
    newLines: 10,
    lines: [
      '-one',   '+un',     '-two',
      '-three', '-four',   '-five',
      '-six',   '-seven',  '-eight',
      '-nine',  '-ten',    '+zwei',
      '+drei',  '+vier',   '+fünf',
      '+sechs', '+sieben', '+acht',
      '+neun',  '+zehn'
    ]
  },
  {
    oldStart: 7,
    oldLines: 4,
    newStart: 7,
    newLines: 4,
    lines: [ ' seven', ' eight', ' nine', '-ten', '+dix' ]
  }
]

@ExplodingCabbage
Copy link
Collaborator

Plan:

  • add test cases for the bug above and for Incorrect merge #341.
  • rewrite the core logic for merging hunks (in merge and mergeLines) from scratch, without paying too much attention to what's already there. See if I can get it right - i.e. end up with something that passes all tests

If I also screw up and create something buggy, then I'll see if I can learn something from what's already there and synthesize a working implementation.

@ExplodingCabbage
Copy link
Collaborator

Oh man - I just looked at tests and realised that the existing code intentionally tries to support the case where the two patches use a different base file (rather than just throwing an error in that situation). Ugh, why? I struggle to reason about the use case for that or figure out what correct behaviour is. I don't see how even attempting to do that can possibly be useful, because if you're allowing for the possibility that your patches are based on slightly different base files, then you've got to handle the possibility that two hunks that are really patching the "same" bit of the file start from different old line numbers. But that requires a completely different algorithm, that matches hunks from the two patches against each other based on (possibly fuzzy) matching of shared context (like when applying a patch), and for which "correct" results are often going to be outright different to an algorithm that doesn't attempt to support merging patches with different bases.

It just seems ill-conceived to me; IMO if you're two patches make different claims (in context or removal lines) about what the old content of line 52 was, you should immediately throw an error saying "Patches disagree about old content of line 52". Treating this as a conflict, like when one patch wants to delete line 52 and another patch wants to insert a line after line 52, isn't correct; it's a fundamentally different situation that calls the validity of the entire merge into question.

@ehoogeveen-medweb
Copy link

ehoogeveen-medweb commented Jun 29, 2024

Hmm - on the one hand, trying to apply a patch to a file that doesn't exactly match its expectations is pretty common; that's what conflicts are. But on the other hand, if you're trying to apply to merge two patches into a coherent whole, they need some sort of common ancestor to start from.

I can imagine a situation where you "zip" two branches together, merging the commits in chronological order - although that seems ill-advised in general - and then you might end up with two patches that both disagree about what the base should look like because neither takes into account the preceding partial merge. But at that point I think you just have to pick a patch to merge first, not try to merge both at the same time, so you're not really talking about a 3-way merge anymore.

@ExplodingCabbage
Copy link
Collaborator

ExplodingCabbage commented Jun 29, 2024

The more I think about this the more I think it's actually super-complicated.

Here are six different tasks that you might characterise as merging two patches, but which are different from each other:

  1. You have two patches to merge that are presumed to have the exact same base file...
    a. ... and you also have the full content of that base file
    b. ... but you don't have the content of the base file
  2. You have two patches to merge that may be patching different versions of the base file...
    a. ... and you have those two versions and their common ancestor
    b. ... and you have those two versions but NOT any common ancestor
    c. ... and you have a common ancestor, but not the versions the patches apply to
    d. ... and you don't have the source of any version whatsoever of the file you're patching

Each of these six different scenarios probably requires a slightly different algorithm to any of the other five, and they require different interfaces and ways of resolving conflicts, too, since in some cases you can present a file with merge conflicts in it like Git does, and in others you can't.

Note also that even in the absence of conflicts, the behaviours of these functions have to differ. Suppose you are merging the following two patches, without a copy of the base file...

Patch 1:

--- file.js
+++ file2.js
@@ -943,7 +943,6 @@
   if (!elit.sit) {
     return [];
   }
-  const sed = elit[0];
   return eiusmod.tempor(sed) ? sed : [sed];
 }
 function incididunt(ipsum, ut = 1) {

Patch 2:

--- file.js
+++ file2.js
@@ -917,7 +917,6 @@
   if (!elit.sit) {
     return [];
   }
-  const sed = elit[0];
   return eiusmod.tempor(sed) ? sed : [sed];
 }
 function incididunt(ipsum, ut = 1) {

If you are trying to perform task 1b - merging two patches that have the same base - then your code should "reason" along these lines:

Well, the old line numbers indicate that these hunks don't overlap. They happen to have the exact same context lines, but since the line numbers are absolutely trustworthy, that must be coincidence; the same lines must simply appear at two points in the base file. Therefore, to merge these patches, I should produce a patch that includes both hunks.

And you end up with something like this:

--- file.js
+++ file2.js
@@ -917,7 +917,6 @@
   if (!elit.sit) {
     return [];
   }
-  const sed = elit[0];
   return eiusmod.tempor(sed) ? sed : [sed];
 }
 function incididunt(ipsum, ut = 1) {
@@ -943,7 +942,6 @@
   if (!elit.sit) {
     return [];
   }
-  const sed = elit[0];
   return eiusmod.tempor(sed) ? sed : [sed];
 }
 function incididunt(ipsum, ut = 1) {

But if you are trying to merge the very same two patches in order to perform task 2d - merging two patches whose bases are presumed to be non-identical, albeit with some common ancestor - then your code should "reason" about them differently. It should instead think to itself:

Well, the context for these two hunks is identical. That probably means they're both deleting the same line. Yes, the line numbers don't match - they give start line numbers for the hunk in the base file 26 lines apart - but since we know the bases have diverged, that could easily because the first patch was generated using a base file where 26 lines of extra content had previously been inserted higher up in the file. We fundamentally can't know for sure, but the most reasonable guess we can make is that these two patches represent the same change, and we should therefore output a "merged" patch with a single hunk.

And then you get output something like this (albeit it's not clear what the line numbers should be - here I've averaged them!)

--- file.js
+++ file2.js
@@ -930,7 +930,6 @@
   if (!elit.sit) {
     return [];
   }
-  const sed = elit[0];
   return eiusmod.tempor(sed) ? sed : [sed];
 }
 function incididunt(ipsum, ut = 1) {

So two functions with similar looking interfaces - both take a pair of patches and no source file, and output a new patch - and which both are "merging" patches need different algorithms that will output fundamentally different results!

And maybe both of these are potentially useful functions that should exist. But the code that currently exists in merge.js seems to be half-heartedly trying to fulfil both use cases at once, and it fundamentally can't succeed at that, because they require different algorithms that will output different results given the same inputs.

@ExplodingCabbage
Copy link
Collaborator

Okay my new plan is to abandon this. :P

I'm not gonna close any of the issues. But I'm not gonna do more work on it either. I don't have a good grasp of the use cases in which people want to merge patches, and I now realise it's not actually one potential feature, but (at least) six potential features in a trenchcoat, none of which are simple either algorithmically or in terms of API design. I welcome others taking a stab at this, but I don't personally want to sink time into it; apologies to anyone who saw me suggest I would and got excited!

@doug65536
Copy link

doug65536 commented Jun 30, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants