Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Rebase strategy to pull/merge #7458

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

khanaffan
Copy link
Contributor

@khanaffan khanaffan commented Dec 6, 2024

Pull merge & conflict resolution

NOTE This document is work in progress and Rebase API are experimental.

What is change merging?

Change merging is when a briefcase pulls new changes from the iModel Hub and applies them locally. If the user has any local changes, the incoming changes need to be merged with the local changes. In this document, we are currently strictly talking about low-level SQLite changesets and high-level concepts defined in ECSchemas.

What is change conflicts

Git is used for text and text is two-dimensional data. Conflicts in two-dimensional data happen when two users change overlapping ranges of text. In a database, an overlapping change can be a change to the same row identified by a primary key. However, it could be more complex as that row might have database constraints like unique index, check, foreign key, triggers, etc., that make this row part of a larger chunk of information. SQLite tracks changes to individual tables/rows. It is up to the application to make changes consistently so as not to violate database constraints. If the database only contains a single table with no constraints, then it's basically the same as a text file as long as there is a primary key.

SQLite records changes by recording INSERT/UPDATE/DELETE operations with old and new values into a binary changeset. The order of changes in the changeset is arbitrary, but once the whole changeset is applied, it should bring the database to a new state that is consistent with all specified database constraints.

Things get interesting when there are local changes specifically to the same data as the incoming changeset. This operation is called pull-merge. Doing that can result in conflicts which from the SQLite perspective is as follows.

Operation Conflict description Type
INSERT 1. PRIMARY KEY already exists Conflict
2. Database constraint voliation e.g. UNIQUE, CHECK Constraint
DELETE 1. PRIMARY KEY does not exist NotFound
2. PRIMARY KEY does exist but other fields values does not match Data
3. Database constraint voliation e.g. UNIQUE, CHECK caused by update Constraint
UPDATE 1. PRIMARY KEY does not exist NotFound
2. PRIMARY KEY does not exist but data fields values does not match Data
3. Database constraint voliation e.g. UNIQUE, CHECK caused by update Constraint
- Forignkey voliations. Its is not for given row but for whole changeset ForeignKey

Above conflict can be resolved in on of the followed allowed ways. A REPLACE resolution may cause CONSTRAINT conflict afterword if db constrain are voilated by REPLACE action. If CONSTRAINT conflict is skipped then it can cause ForeignKey voliation at the end of changeset apply.

Operation Conflict Skip Replace Abort Default
INSERT Conflict Allowed Allowed but may cause constraint conflict Allowed ABORT
Constraint Allowed. Change is ignored. n/a Allowed SKIP
DELETE NotFound Allowed n/a Allowed ABORT
Data Allowed Allowed but may cause constraint conflict Allowed REPLACE
Constraint Allowed. Change is ignored. n/a Allowed SKIP
UPDATE NotFound Allowed n/a Allowed ABORT
Data Allowed Allowed but may cause constraint conflict Allowed REPLACE
Constraint Allowed. Change is ignored. n/a Allowed SKIP
- ForeignKey n/a n/a Allowed ABORT

Note even ForeignKey can be skipped but this will cause db integrity check to fail or any change to db in future may fail.

The foreign key one is the only one that is not associated with a specific row/table as it is a count of FK left unresolved and is a fatal error pointing to an error in application logic.

When SQLite applies a changeset and detects a conflict, it will call the conflict handler and request application feedback on how to resolve it. For the above, we can generally choose SKIP, REPLACE, or ABORT. If we choose REPLACE, in some cases, our conflict handler may be called again if replacing the local row causes another conflict due to database constraints. We are looking at one row at a time, but these rows do not exist in isolation. They are connected to the rest of the rows in the same table via constraints, to the table definition via CHECK constraints, or to other tables via foreign keys. This makes change merging in the database a bit more complex.

Types of change merging methods

Now that we have a somewhat understanding of key concepts about change merging and conflicts, we can talk about how to merge conflicting changes.

itwinjs supports two types of change merging:

  1. Merge: pull new changes & apply these changes on top of local changes (default)
  2. Rebase: pull new changes, rollback local changes, apply new changes, and then apply local changes. (new as in 5.x but not default, WIP/internal)

Note: Since a briefcase as of now use locks to serialize changes between two or more users. The two methods specified does not really make any difference. The locks ensure only one user can change a given element and thus we do not expect low level sqlite conflict for any directly changed data.

"Merge" method

Merge is simple and straightforward but not flexible and can easily cause the creation of a changeset that might look fine at the local file level but fail when applied to a new file.

It is done as follows:

  1. Briefcase pulls new changes.
  2. It applies new changes on top of local changes.
    • It depends on locks for changes to be non-conflicting.
    • This method does not record the effect of applying incoming changes. This locally recorded changeset is an unreliable source of truth without locking.
  3. If a conflict happens, there is not much we can do except for either skipping the incoming change or replacing the local change. There is no room for actually merging.
  4. Conflict resolution is recorded in a rebase buffer, so when pulling more changes, the same resolution will be applied.
    • This is also dangerous because let's say we choose REPLACE for a DATA conflict. And then change that row later in the session. When we push changes, the local change to the row will be ignored.
  5. Finally, when all changes are merged, we can push the changes, but without a lock, this could create a changeset that cannot be applied by others.

As we can see from the above, this merge method works fine if you lock individual rows in a table so only one user can change it. This workflow cannot really scale, and without a lock, it is completely useless.

"Rebase" method

This is a new method that has been implemented that is way more flexible. It is similar to git rebase. It is simple and easy to understand and will never lead to the push of a changeset that cannot be applied without locks.

  1. Briefcase pulls new changes.
  2. Briefcase local changes are rolled back, so now it's the same as a point in history on the master timeline.
  3. New changes from the master are applied and advance local head to the master. No conflicts are expected as we simply update the local briefcase to the master.
  4. Local changes from oldest to newest are now applied on top of the master. Note that the current disk file represents the master and the changeset been applied is the local change made by the user.
  5. This can cause conflicts as we replay local changes on top of the new master tip.
    • With rebase, we have flexibility that we can change data beside simple choices of SKIP, REPLACE, & ABORT.
    • If the application finds a conflict that requires creating new changes or merging incoming and local changes, then it can do so.
    • It also has a 3-way view of what changed. The local changeset been applied has old/new values while the current value in the db represents the master.
    • The whole merge activity is recorded in the changeset, and the local changeset is updated after merging. In other words, the local change is recomputed against the master.
    • Any tool that made those changes can also react to rebase event and update there change against what is new on master.
  6. If at any point the rebase fails, it can be resumed. The db is unable to push unstable changes until it resolves it locally first.
  7. Finally user can still undo his local changes. Or there undo stack is preserved. They can also push out there changes if needed.

As one of the most important take away in rebase is application control of merge. It has tools that made those changes and can correct redo them over master if needed. Where as "Merge" method is one way and does not give application any control over merge.

Rebase API workflow

At the moment only one set of changes does not require locks and we will use that as example. Its local properties of briefcase. Currently two user can change them and it can cause conflicts.

After downloading briefcase you can explicitly change merge method to rebase.

    b1.txns.changeMergeManager.setMergeMethod("Rebase");

you can also set it by default this will cause any new briefcase to automatcally use "Rebase" as default method.

  await IModelHost.startup({ pullMergeMethod: "Rebase" });

We download and open two briefcases for the same iModel, then set a property on b1 and push the changes. Note that the property being set did not exist, so it causes an INSERT operation at the database level.

  const b1 = ...
  const b2 = ...

  // save a property
  b1.saveFileProperty({ namespace: "test", name: "test" }, "foo");
  b1.saveChanges();
  await b1.pushChanges({ description: "set test to foo" });

Now b2 does the same and sets a different value, which also causes an INSERT operation being recorded.

  b2.saveFileProperty({ namespace: "test", name: "test" }, "goo");
  b2.saveChanges();

  // following will throw exception with error
  // `PRIMARY KEY INSERT CONFLICT - rejecting this changeset`
  await b2.pushChanges({ description: "set test to goo" });

Above threw exception as after undoing local changes then applying incoming changes we now have an entry for the key test in the db. While locally we recorded an insert operation for the test property. This causes a primary key violation. This action leaves the database in rebase mode.

Next, we will install a conflict handler to resolve this issue. This conflict handler concatenates the master value with the local.

  b2.txns.changeMergeManager.addConflictHandler({
    id: "my", handler: (args: RebaseChangesetConflictArgs) => {
      if (args.cause === DbConflictCause.Conflict) {
        if (args.tableName === "be_Prop") {
          if (args.opcode === DbOpcode.Insert) {
            const localChangedVal = args.getValueText(5, DbChangeStage.New);
            const tipValue = b2.queryFilePropertyString({ namespace: "test", name: "test" });
            b2.saveFileProperty({ namespace: "test", name: "test" }, `${tipValue} + ${localChangedVal}`);
            return DbConflictResolution.Skip; // skip incomming value and continue
          }
        }
      }
      return undefined;
    }
  });

We resume rebase and it will resolve the conflict, allowing us to push the changes.

  // resume rebase see if it resolve the conflict
  b2.txns.changeMergeManager.resume();
  await b2.pushChanges({ description: "set test to goo" });

The final value of test pushed will be foo + goo.

imodel-native: iTwin/imodel-native#933

Copy link
Member

@pmconne pmconne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big exciting new feature, but nobody can currently use it. What needs to happen before they can?

@kabentley
Copy link
Contributor

I don't think this should be a public API at all, because it doesn't make any sense to have to choose between them. If there's a conflict, the only way out is to rebase. There doesn't need to be an option - just always do merge-then-rebaseOnFailure. That will be the most efficient anyway. BTW, you mentioned that you think rebasing eliminates the need for a schemaDb, but I don't understand how they're related at all. Maybe we should talk?

/** @internal */
public readonly onRebaseTxnBegin = new BeEvent<(txn: TxnArgs) => void>();
/** @internal */
public readonly onRebaseLTxnEnd = new BeEvent<(txn: TxnArgs) => void>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public readonly onRebaseLTxnEnd = new BeEvent<(txn: TxnArgs) => void>();
public readonly onRebaseTxnEnd = new BeEvent<(txn: TxnArgs) => void>();

}

if (args.cause === DbConflictCause.Constraint) {
if (Logger.isEnabled(category, LogLevel.Info)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to wrap log statements to specific levels with checks if that level is enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its copied to produce same errors/warning as in native. To match logs. But in case of rebase we do not need to be similar as its new code. I will update this method to minimally log and use different category.

| | | 2. PRIMARY KEY does exist but other fields values does not match | `Data` |
| | | 3. Database constraint voliation e.g. UNIQUE, CHECK caused by update | `Constraint` |
| | `UPDATE` | 1. PRIMARY KEY does not exist | `NotFound` |
| | | 2. PRIMARY KEY does not exist but data fields values does not match | `Data` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this says PRIMARY KEY does not exist, did you mean does exist?

@khanaffan
Copy link
Contributor Author

I don't think this should be a public API at all, because it doesn't make any sense to have to choose between them. If there's a conflict, the only way out is to rebase. There doesn't need to be an option - just always do merge-then-rebaseOnFailure. That will be the most efficient anyway. BTW, you mentioned that you think rebasing eliminates the need for a schemaDb, but I don't understand how they're related at all. Maybe we should talk?

So I tested following locally that remove need to choose pullmerge method.

FastForward: In this method we apply changeset on top of local changes but abort on any directly change conflict. Indirect changes conflict does not cause failure they are REPLACED. Schema changes cannot be applied using FastForward as they bound to cause conflict.

  1. If there was local changes
    1. For each changeset attempt to do fast-forward. In fastforward we fail on any direct change conflict.
    2. Any remaining changesets that could not be applied using fastforward will be applied using rebase.
  2. If there was no local changes (do what we do now during merge)
    1. Our imodel history on hub has many direct change conflicts beside indirect. We know this from tons of conflict logging when v2 checkpoints are created. So we are not going to change anything about this workflow. The new method will in future push clean changesets with zero conflict when applied during v2 checkpoint creation.

Above simplify things. App does not need to decide the merge method. Its all internal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants