From 7b6f5199e167f256b9536cbdd19e275d50d7ecab Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Thu, 12 Sep 2024 10:52:27 +0200 Subject: [PATCH 01/25] experimental HiFi tree diff algorithm for use with quick-fixes and refactoring commands in the IDE --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 235 ++++++++++++++++++ 1 file changed, 235 insertions(+) create mode 100644 src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc new file mode 100644 index 00000000000..8708f61a192 --- /dev/null +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -0,0 +1,235 @@ +@license{ +Copyright (c) 2018-2023, NWO-I Centrum Wiskunde & Informatica +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + +1. Redistributions of source code must retain the above copyright notice, +this list of conditions and the following disclaimer. + +2. Redistributions in binary form must reproduce the above copyright notice, +this list of conditions and the following disclaimer in the documentation +and/or other materials provided with the distribution. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +} +@synopsis{Infer ((TextEdit)) from the differences between two parse ((ParseTree::Tree))s} +@description{ +This module will move to the Rascal standard library. +} +module analysis::diff::edits::HiFiTreeDiff + +extend analysis::diff::edits::TextEdits; +import ParseTree; +import List; +import String; + +@synopsis{Detects minimal differences between parse trees and makes them explicit as ((TextEdit)) instructions.} +@description{ +This is a "diff" algorithm of two parse trees to generate a ((TextEdit)) script that applies the differences on +the textual level, _with minimal collatoral damage in whitespace_. This is why it is called "HiFi": minimal unnecessary +noise introduction to the original file. + +The resulting ((TextEdit))s are an intermediate representation for making changes in source code text files. They can be executed independently via ((ExecuteTextEdits)), or interactively via ((IDEServices)), or LanguageServer features. + +This top-down diff algorithm takes two arguments: +1. an _original_ parse tree for a text file, +2. and a _derived_ parse tree that is mostly equal to the original but has pieces of it substituted or rewritten. + +From the tree node differences between these two trees, ((TextEdit))s are derived such that: +* when the edited source text is parsed again, the resulting tree would match the derived tree. +However, the parsed tree could be different from the derived tree in terms of whitespace, indentation and case-insensitive literals (see below). +* when tree nodes (grammar rules) are equal, smaller edits are searched by pair-wise comparison of the children +* differences between respective layout or (case insensitve) literal nodes are always ignored +* when lists have changed, careful editing of possible separators ensures syntactic correctness +* when new sub-trees are inserted, the replacement will be at the same indentation level as the original. (((TODO this is a todo))) +* when case-insensitive literals have been changed under a grammar rule that remained the same, no edits are produced. + +The function comes in handy when we use Rascal to rewrite parse trees, and then need to communicate the effect +back to the IDE (for example using ((util::IDEServices)) or ((util::LanguageServer)) interfaces). We use +((ExecuteTextEdits)) to _test_ the effect of ((TextEdits)) while developing a source-to-source transformation. +} +@benefits{ +* This function allows the language engineer to work in terms of abstract and concrete syntax trees while manipulating source text. The +((TextEdit))s intermediate representation bridge the gap to the minute details of IDE interaction such as "undo" and "preview" features. +* Text editing is fraught with details of whitespace, comments, list separators; all of which are handled here by +the exactness of syntactic and semantic knowledge of the parse trees. +* Where possible the algorithm also retains the capitalization of case-insensitive literals. +* The algorithm retrieves and retains indentation levels from the original tree, even if sub-trees in the +derived tree have mangled indentation. This allows us to ignore the indentation concern while thinking of rewrite +rules for source-to-souce transformation, and focus on the semantic effect. +} +@pitfalls{ +* If the first argument is not an original parse tree, then basic assumptions of the algorithm fail and it may produce erroneous text edits. +* If the second argument is not derived from the original, then the algorithm will produce a single text edit to replace the entire source text. +* If the parse tree of the original does not reflect the current state of the text in the file, then the generated text edits will do harm. +* If the original tree is not annotated with source locations, the algorithm fails. +* Both parse trees must be type correct, e.g. the number of symbols in a production rule, must be equal to the number of elements of the argument list of ((Tree::appl)). +* This algorithm does not work with ambiguous (sub)trees. +} +@examples{ +If we rewrite parse trees, this can be done with concrete syntax matching. +The following example swaps the if-branch with the else-branch in Pico: + +```rascal-shell +import lang::pico::\syntax::Main; +import IO; +import analysis::diff::edits::ExecuteTextEdits; +import analysis::diff::edits::TextEdits; +import analysis::diff::edits::TreeDiff; +// an example Pico program: +writeFile(|tmp://example.pico|, + "begin + ' declare + ' a : natural, + ' b : natural; + ' if a then + ' a := b + ' else + ' b := a + ' fi + 'end"); +original = parse(#start[Program], |tmp://example.pico|); +// match and replace all conditionals +rewritten = visit(original) { + case (Statement) `if then <{Statement ";"}* ifBranch> else <{Statement ";"}* elseBranch> fi` + => (Statement) `if then + ' <{Statement ";"}* elseBranch> + 'else + ' <{Statement ";"}* ifBranch> + 'fi` +} +// Check the result as a string. It worked, but we see some collatoral damage in whitespace (indentation). +"" +// Now derive text edits from the two parse trees: +edits = treeDiff(original, rewritten); +// Wrap them in a single document edit +edit = changed(original@\loc.top, edits); +// Apply the document edit on disk: +executeDocumentEdit(edit); +// and when we read the result back, we see the transformation succeeded, and indentation was not lost: +readFile(tmp://example.pico|); +``` +} +// equal trees generate empty diffs (note this already ignores whitespace differences) +default list[TextEdit] treeDiff(Tree a, a) = []; + +// When the productions are different, we've found an edit, and there is no need to recurse deeper. +list[TextEdit] treeDiff( + t:appl(Production p:prod(_,_,_), list[Tree] _), + r:appl(Production q:!p , list[Tree] _)) + = t@\loc? + ? [replace(t@\loc, learnIndentation("", "")] + : /* literals and layout (without @\loc) are ignored */ []; + +// If a first element is removed and there are elements left, skip the separator too +list[TextEdit] treeDiff( + t:appl(Production p:regular(Symbol reg), list[Tree] aElems), + appl(p, list[Tree] bElems)) + = listDiff(t@\loc, prepareSeparators(aElems, seps(reg)), prepareSeparators(bElems, seps(reg))); + +// When the productions are equal, but the trees may be different, we dig deeper for differences +list[TextEdit] treeDiff(appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) + = [*treeDiff(a, b) | <- zip2(argsA, argsB)]; + +@synopsis{decide how many separators we have} +int seps(\iter-seps(_,list[Symbol] s)) = size(s); +int seps(\iter-star-seps(_,list[Symbol] s)) = size(s); +default int seps(Symbol _) = 0; + +@synopsis{Finds minimal edits to list elements, taking extra care of removing separators when so required.} +@description{ +To make this easy, we add source location information to each original separator first, and then +reuse the rest of the algorithm which normally ignores separators. +} +list[TextEdit] listDiff(loc _span, [], []) = []; + +// equal length, we assume only specific elements have changed. +list[TextEdit] listDiff(loc _span, list[Tree] elemsA, list[Tree] elemsB) = equalLengthDiff(elemsA, elemsB) + when size(elemsA) == size(elemsB); + +// additional elements, and possibly other elements have changed. +list[TextEdit] listDiff(loc span, list[Tree] elemsA, list[Tree] elemsB) = longerLengthDiff(span, elemsA, elemsB) + when size(elemsA) < size(elemsB); + +// fewer elements, and possibly other elements have changed. +list[TextEdit] listDiff(list[Tree] elemsA, list[Tree] elemsB) = shorterLengthDiff(elemsA, elemsB) + when size(elemsA) > size(elemsB); + +// this works only because we annotated the separators. +list[TextEdit] equalLengthDiff(list[Tree] elemsA, list[Tree] elemsB) + = [*treeDiff(a,b) | <- zip2(elemsA, elemsB)]; + +// added things to an empty list. this is also the final stage of a deep recursion +list[TextEdit] longerLengthDiff(loc span, [], list[Tree] elemsB) = [replace(span, yield(elemsB))]; + +// equal length lists can be forwarded (this happens when we already found the extra elements) +list[TextEdit] longerLengthDiff(loc span, list[Tree] elemsA, list[Tree] elemsB) + = equalLengthDiff(elemsA, elemsB) when size(elemsA) == size(elemsB); + +// always ignore identical trees, and continue with the rest +list[TextEdit] longerLengthDiff(loc span, [Tree a, *Tree elemsA], [a, *Tree elemsB]) + = longerLengthDiff(span[offset=a@\loc.offset][length=span.length-a@\loc.length], elemsA, elemsB); + +// a single elem is different and also new by definition because ("longerLengthDiff") +list[TextEdit] longerLengthDiff(loc span, [Tree a, *Tree elemsA], [Tree b:!a, *Tree elemsB]) + = [replace(span[length=0], "")] // we put b in front of a + + (size(elemsA) + 1 == size(elemsB) // and continue with the rest + ? equalLengthDiff([a, *elemsA], elemsB) // this could have been the last additional element + : longerLengthDiff(span, [a, *elemsA], elemsB)) // or we still have more to add + ; + +// we have to remove the elements that are replaced by an empty list +list[TextEdit] shorterLengthDiff(loc span, list[Tree] _, []) + = [replace(span, "")]; + +// always ignore identical trees, and continue with the rest +list[TextEdit] shorterLengthDiff(loc span, [Tree a, *Tree elemsA], [a, *Tree elemsB]) + = shorterLengthDiff(span[offset=a@\loc.offset][length=span.length-a@\loc.length], elemsA, elemsB); + +// a single elem is different and also superfluous by definition because ("shorterLengthDiff") +list[TextEdit] shorterLengthDiff(loc span, [Tree a, *Tree elemsA], [Tree b:!a, *Tree elemsB]) + = [replace(a@\loc, "")] // we replace a by b + + shorterLengthDiff(span, elemsA, elemsB) // and continue with the rest + // TODO: the lists could have become of equal length. Deal with that case. + ; + +private Production sepProd = prod(layouts("*separators*"),[],{}); + +@synopsis{yield a consecutive list of trees} +private str yield(list[Tree] elems) = "<}>"; + +@synopsis{Separator literals need location annotations because they have to be edited.} +private list[Tree] prepareSeparators([], int _) = []; + +private list[Tree] prepareSeparators([Tree t], int _) = [t]; + +// we group the 3 separators into a single tree with accurate position information. +private list[Tree] prepareSeparators([Tree head, Tree l1, Tree sep, Tree l2, *Tree rest], 3) + = [head, appl(sepProd, [l1, newSep, l2])[@\loc=span], *prepareSeparators(rest)] + when + span := head@\loc.top(end(head@\loc), size("")); + +// single separators get accurate position informaiton (even if they are layout) +private list[Tree] prepareSeparators([Tree head, Tree sep, *Tree rest], 1) + = [head, sep[\loc=span], *prepareSeparators(rest)] + when + span := head@\loc.top(end(head@\loc), size("")); + +// unseparated lists are ready +private list[Tree] prepareSeparators(list[Tree] elems, 0) = elems; + +private int end(loc src) = src.offset + src.length; + +private str learnIndentation(str replacement, str original) = replacement; // TODO: learn minimal indentaton from original From 374a8a295362bc296741f81e11b2178aa6ec1d2b Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Tue, 1 Oct 2024 11:32:03 +0200 Subject: [PATCH 02/25] developing the list diff algorithms with inspiration from the diff tool --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 101 ++++++++++++++++-- 1 file changed, 93 insertions(+), 8 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 8708f61a192..88b4a8d3699 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -34,6 +34,7 @@ extend analysis::diff::edits::TextEdits; import ParseTree; import List; import String; +import Locations; @synopsis{Detects minimal differences between parse trees and makes them explicit as ((TextEdit)) instructions.} @description{ @@ -123,24 +124,63 @@ readFile(tmp://example.pico|); ``` } // equal trees generate empty diffs (note this already ignores whitespace differences) -default list[TextEdit] treeDiff(Tree a, a) = []; +list[TextEdit] treeDiff(Tree a, a) = []; + +// skip production labels of original rules when diffing +list[TextEdit] treeDiff( + appl(prod(label(_, Symbol s), syms, attrs), list[Tree] args), + Tree u) + = treeDiff(appl(prod(s, syms, attrs), args), u); + +// skip production labels of replacement rules when diffing +list[TextEdit] treeDiff( + Tree t, + appl(prod(label(_, Symbol s), syms, attrs), list[Tree] args)) + = treeDiff(t, appl(prod(s, syms, attrs), args)); + +// matched layout trees generate empty diffs such that the original is maintained +list[TextEdit] treeDiff( + appl(prod(layouts(_), _, _), list[Tree] _), + appl(prod(layouts(_), _, _), list[Tree] _)) + = []; + +// matched literal trees generate empty diffs +list[TextEdit] treeDiff( + appl(prod(lit(str l), _, _), list[Tree] _), + appl(prod(lit(l) , _, _), list[Tree] _)) + = []; + +// matched case-insensitive literal trees generate empty diffs such that the original is maintained +list[TextEdit] treeDiff( + appl(prod(cilit(str l), _, _), list[Tree] _), + appl(prod(cilit(l) , _, _), list[Tree] _)) + = []; + +// different lexicals generate small diffs even if the parent is equal +list[TextEdit] treeDiff( + t:appl(prod(lex(str l), _, _), list[Tree] _), + r:appl(prod(lex(l) , _, _), list[Tree] _)) + = [replace(t@\loc, learnIndentation("", ""))] + when t != r; // When the productions are different, we've found an edit, and there is no need to recurse deeper. list[TextEdit] treeDiff( t:appl(Production p:prod(_,_,_), list[Tree] _), r:appl(Production q:!p , list[Tree] _)) = t@\loc? - ? [replace(t@\loc, learnIndentation("", "")] + ? [replace(t@\loc, learnIndentation("", ""))] : /* literals and layout (without @\loc) are ignored */ []; -// If a first element is removed and there are elements left, skip the separator too + +// If list production are the same, then the element lists can still be of different length +// and we switch to listDiff which has different heuristics than normal trees. list[TextEdit] treeDiff( - t:appl(Production p:regular(Symbol reg), list[Tree] aElems), + Tree t:appl(Production p:regular(Symbol reg), list[Tree] aElems), appl(p, list[Tree] bElems)) - = listDiff(t@\loc, prepareSeparators(aElems, seps(reg)), prepareSeparators(bElems, seps(reg))); + = listDiff(t@\loc, seps(reg), aElems, bElems); -// When the productions are equal, but the trees may be different, we dig deeper for differences -list[TextEdit] treeDiff(appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) +// When the productions are equal, but the children may be different, we dig deeper for differences +default list[TextEdit] treeDiff(appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) = [*treeDiff(a, b) | <- zip2(argsA, argsB)]; @synopsis{decide how many separators we have} @@ -148,6 +188,51 @@ int seps(\iter-seps(_,list[Symbol] s)) = size(s); int seps(\iter-star-seps(_,list[Symbol] s)) = size(s); default int seps(Symbol _) = 0; +@synsopis{List diff is like text diff on lines; complex and easy to make slow} +list[TextEdit] listDiff(loc _span, int seps, list[Tree] originals, list[Tree] replacements) { + assert originals != replacements && originals == []; + = trimEqualElements(originals, replacements); + span = cover([orig@\loc | orig <- originals, orig@\loc?]); + + assert originals != replacements && originals != []; + = commonSpecialCases(span, seps, originals, replacements); + + return [*edits, *genericListDiff(span, originals, replacements)]; +} + +@synopsis{trips equal elements from the front and the back of both lists, if any.} +tuple[list[Tree], list[Tree]] trimEqualElements([Tree a, *Tree aTail], [ a, *Tree bTail]) + = ; + +tuple[list[Tree], list[Tree]] trimEqualElements([*Tree aHead, Tree a], [*Tree bHead, a]) + = ; + +default tuple[list[Tree], list[Tree]] trimEqualElements(list[Tree] a, list[Tree] b) + = ; + +// only one element removed in front, then we are done +tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 0, [Tree a, *Tree tail], [*tail]) + = <[replace(a@\loc, "", "")], [], []>; + +// only one element removed in front, plus 1 separator, then we are done because everything is the same +tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 1, + [Tree a, Tree _sep, Tree tHead, *Tree tail], [tHead, *tail]) + = <[replace(fromUntil(a, tHead), "", "")], [], []>; + +@synopsis{Compute location span that is common between an element and a succeeding element} +@description{ +The resulting loc is including the `from` but exclusing the `until`. It goes right +up to `until`. +```ascii-art + [from] gap [until] + <---------> +```` +} +private loc fromUntil(loc from, loc until) = from.top(from.offset, until.offset - from.offset); + +@synopsis{convenience overload for shorter code} +private loc fromUntil(Tree from, Tree until) = fromUntil(fro@\loc, until@\loc); + @synopsis{Finds minimal edits to list elements, taking extra care of removing separators when so required.} @description{ To make this easy, we add source location information to each original separator first, and then @@ -164,7 +249,7 @@ list[TextEdit] listDiff(loc span, list[Tree] elemsA, list[Tree] elemsB) = longer when size(elemsA) < size(elemsB); // fewer elements, and possibly other elements have changed. -list[TextEdit] listDiff(list[Tree] elemsA, list[Tree] elemsB) = shorterLengthDiff(elemsA, elemsB) +list[TextEdit] listDiff(loc span, list[Tree] elemsA, list[Tree] elemsB) = shorterLengthDiff(span, elemsA, elemsB) when size(elemsA) > size(elemsB); // this works only because we annotated the separators. From c623d2b37bfcf0cb1335e0eecdd3966ba662fbb4 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 7 Oct 2024 16:14:33 +0200 Subject: [PATCH 03/25] made some progress with the list algorithm --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 232 ++++++++++-------- 1 file changed, 129 insertions(+), 103 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 88b4a8d3699..0b77130ecec 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -26,7 +26,59 @@ POSSIBILITY OF SUCH DAMAGE. } @synopsis{Infer ((TextEdit)) from the differences between two parse ((ParseTree::Tree))s} @description{ -This module will move to the Rascal standard library. +This module provides an essential building block for creating high-fidelity source-to-source code transformations. +It is common for industrial use cases of source-to-source transformation to extract +a list of text edits programmatically using parse tree pattern matching. This way the +changes are made on the textual level, with less introduction of noise and fewer removals +of valuable layout (indentation) and source code comments. + +The construction of such high-fidelity edit lists can be rather involved because it tangles +and scatters a number of concerns: +1. syntax-directed pattern matching +2. string substitution; construction of the rewritten text + * retention of layout and in particular indentation + * retention of source code comments + * retention of specific case-insensitive keyword style + * syntactic correctness of the result; especially in relation to list separators there are many corner-cases to thing of + +On the other hand, ParseTree to ParseTree rewrites are much easier to write and get correct. +They are "syntax directed" via the shape of the tree that follows the grammar of the language. +Some if not all of the above aspects are tackled by the rewriting mechanism with concrete patterns. +Especially the corner cases w.r.t. list separators are all handled by the rewriting mechanisms. +Also the rules are in "concrete syntax", on both the matching and the substition side. So they are +readable for all who know the object language. The rules guarantee syntactic correctness of the +rewritten source code. However, rewrite rules do quite some noisy damage to the layout, indentation +and comments, of the result. + +With this module we bring these two modalities of source-to-source transformations together: +1. The language engineer uses concrete syntax rewrite rules to derive a new ParseTree from the original; +2. We run ((treeDiff)) to obtain a set of minimal text edit; +3. We apply the text edits to the editor contents or the file system. +} +@benefits{ +* Because the derived text edits change fewer characters, the end result is more "hifi" than simply +unparsing the rewritten ParseTree. More comments are retained and more indentation is kept the same. More +case-insensitive keywords retain their original shape. +* At the same time the rewrite rules are easier to maintain as they remain "syntax directed". +* Changes to the grammar will be picked up when checking all source and target patterns. +* The diff algorithm uses cross-cutting information from the parse tree (what is layout and what not, + what is case-insensitive, etc.) which would otherwise have to be managed by the language engineer in _every rewrite rule_. +* The diff algoritm understands what indentation is and brings new sub-trees to the original level +of indentation (same as the sub-trees they are replacing) +* Typically the algorithm's run-time is lineair in the size of the tree, or better. Same for memory usage. +} +@pitfalls{ +* ((treeDiff)) only works under the assumption that the second tree was derived from the first +by applying concrete syntax rewrite rules in Rascal. If there is no origin relation between the two +then its heuristics will not work. The algorithm could degenerate to substituting the entire file, +or worse it could degenerate to an exponential search for commonalities in long lists. +* ((treeDiff))'s efficiency is predicated on the two trees being derived from each other in main memory of the currently running JVM. +This way both trees will share pointers where they are the same, which leads to very efficient equality +testing. If the trees are first independently serialized to disk and then deserialized again, and then ((treeDiff)) is called, +this optimization is not present and the algorithm will perform (very) poorly. +* Substitution patterns should be formatted as best as possible. The algorithm will not infer +spacing or relative indentation inside of the substituted subtree. It will only infer indentation +for the entire subtree. } module analysis::diff::edits::HiFiTreeDiff @@ -34,7 +86,7 @@ extend analysis::diff::edits::TextEdits; import ParseTree; import List; import String; -import Locations; +import Location; @synopsis{Detects minimal differences between parse trees and makes them explicit as ((TextEdit)) instructions.} @description{ @@ -42,7 +94,8 @@ This is a "diff" algorithm of two parse trees to generate a ((TextEdit)) script the textual level, _with minimal collatoral damage in whitespace_. This is why it is called "HiFi": minimal unnecessary noise introduction to the original file. -The resulting ((TextEdit))s are an intermediate representation for making changes in source code text files. They can be executed independently via ((ExecuteTextEdits)), or interactively via ((IDEServices)), or LanguageServer features. +The resulting ((TextEdit))s are an intermediate representation for making changes in source code text files. +They can be executed independently via ((ExecuteTextEdits)), or interactively via ((IDEServices)), or LanguageServer features. This top-down diff algorithm takes two arguments: 1. an _original_ parse tree for a text file, @@ -74,6 +127,8 @@ rules for source-to-souce transformation, and focus on the semantic effect. @pitfalls{ * If the first argument is not an original parse tree, then basic assumptions of the algorithm fail and it may produce erroneous text edits. * If the second argument is not derived from the original, then the algorithm will produce a single text edit to replace the entire source text. +* If the second argument was not produced from the first in the same JVM memory, it will not share many pointers to equal sub-trees +and the performance of the algorithm will degenerate quickly. * If the parse tree of the original does not reflect the current state of the text in the file, then the generated text edits will do harm. * If the original tree is not annotated with source locations, the algorithm fails. * Both parse trees must be type correct, e.g. the number of symbols in a production rule, must be equal to the number of elements of the argument list of ((Tree::appl)). @@ -164,13 +219,10 @@ list[TextEdit] treeDiff( when t != r; // When the productions are different, we've found an edit, and there is no need to recurse deeper. -list[TextEdit] treeDiff( +default list[TextEdit] treeDiff( t:appl(Production p:prod(_,_,_), list[Tree] _), r:appl(Production q:!p , list[Tree] _)) - = t@\loc? - ? [replace(t@\loc, learnIndentation("", ""))] - : /* literals and layout (without @\loc) are ignored */ []; - + = [replace(t@\loc, learnIndentation("", ""))]; // If list production are the same, then the element lists can still be of different length // and we switch to listDiff which has different heuristics than normal trees. @@ -191,33 +243,88 @@ default int seps(Symbol _) = 0; @synsopis{List diff is like text diff on lines; complex and easy to make slow} list[TextEdit] listDiff(loc _span, int seps, list[Tree] originals, list[Tree] replacements) { assert originals != replacements && originals == []; - = trimEqualElements(originals, replacements); - span = cover([orig@\loc | orig <- originals, orig@\loc?]); - - assert originals != replacements && originals != []; - = commonSpecialCases(span, seps, originals, replacements); + edits = []; + + // this algorithm isolates commonalities between the two lists + // by handling different special cases. It continues always with + // what is left to be different. By maximizing commonalities, + // the edits are minimized. Note that we float on source location parameters + // not only for the edit locations but also for sub-tree identity. + solve (originals, replacements) { + = trimEqualElements(originals, replacements); + span = cover([orig@\loc | orig <- originals, orig@\loc?]); + + = commonSpecialCases(span, seps, originals, replacements); + edits += specialEdits; + + equalSubList = largestEqualSubList(originals, replacements); + + if (equalSubList != [], + [*preO, *equalSubList, *postO] := originals, + [*preR, *equalSubList, *postR] := replacements) { + // TODO: what about the separators? + // we align the prefixes and the postfixes and + // continue recursively. + return edits + + listDiff(cover(preO), seps, preO, preR) + + listDiff(cover(postO), seps, postO, postR) + ; + } + } + + return edits; +} - return [*edits, *genericListDiff(span, originals, replacements)]; +@synopsis{Finds the largest sublist that occurs in both lists} +@description{ +Using list matching and backtracking, this algorithm detects which common +sublist is the largest. It assumes ((trimEqualElements)) has happened already, +and thus there are interesting differences left, even if we remove any equal +sublist. +} +list[Tree] largestEqualSubList(list[Tree] originals, list[Tree] replacements) { + assert := trimEqualElements(originals, replacements) : "both lists begin and end with unique elements"; + + bool largerList(list[Tree] a, list[Tree] b) = size(a) > size(b); + + equals = [eq | + [*_, pre, *eq, post, *_] := originals, size(eq) > 0, + [*_, !pre, *eq, !post, *_] := replacements + ]; + + return [largest, *_] := sort(equals, largerList) + ? largest + : [] // no equal sublists detected + ; } @synopsis{trips equal elements from the front and the back of both lists, if any.} -tuple[list[Tree], list[Tree]] trimEqualElements([Tree a, *Tree aTail], [ a, *Tree bTail]) - = ; +tuple[list[Tree], list[Tree]] trimEqualElements([Tree a, *Tree aPostfix], [ a, *Tree bPostfix]) + = ; -tuple[list[Tree], list[Tree]] trimEqualElements([*Tree aHead, Tree a], [*Tree bHead, a]) - = ; +tuple[list[Tree], list[Tree]] trimEqualElements([*Tree aPrefix, Tree a], [*Tree bPrefix, a]) + = ; default tuple[list[Tree], list[Tree]] trimEqualElements(list[Tree] a, list[Tree] b) = ; // only one element removed in front, then we are done tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 0, [Tree a, *Tree tail], [*tail]) - = <[replace(a@\loc, "", "")], [], []>; + = <[replace(a@\loc, "")], [], []>; // only one element removed in front, plus 1 separator, then we are done because everything is the same tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 1, [Tree a, Tree _sep, Tree tHead, *Tree tail], [tHead, *tail]) - = <[replace(fromUntil(a, tHead), "", "")], [], []>; + = <[replace(fromUntil(a, tHead), "")], [], []>; + +// only one element removed in front, plus 1 separator, then we are done because everything is the same +tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 3, + [Tree a, Tree _l1, Tree _sep, Tree _l2, Tree tHead, *Tree tail], [tHead, *tail]) + = <[replace(fromUntil(a, tHead), "")], [], []>; + + +@synopsis{convenience overload for shorter code} +private loc fromUntil(Tree from, Tree until) = fromUntil(fro@\loc, until@\loc); @synopsis{Compute location span that is common between an element and a succeeding element} @description{ @@ -229,92 +336,11 @@ up to `until`. ```` } private loc fromUntil(loc from, loc until) = from.top(from.offset, until.offset - from.offset); +private int end(loc src) = src.offset + src.length; -@synopsis{convenience overload for shorter code} -private loc fromUntil(Tree from, Tree until) = fromUntil(fro@\loc, until@\loc); - -@synopsis{Finds minimal edits to list elements, taking extra care of removing separators when so required.} -@description{ -To make this easy, we add source location information to each original separator first, and then -reuse the rest of the algorithm which normally ignores separators. -} -list[TextEdit] listDiff(loc _span, [], []) = []; - -// equal length, we assume only specific elements have changed. -list[TextEdit] listDiff(loc _span, list[Tree] elemsA, list[Tree] elemsB) = equalLengthDiff(elemsA, elemsB) - when size(elemsA) == size(elemsB); - -// additional elements, and possibly other elements have changed. -list[TextEdit] listDiff(loc span, list[Tree] elemsA, list[Tree] elemsB) = longerLengthDiff(span, elemsA, elemsB) - when size(elemsA) < size(elemsB); - -// fewer elements, and possibly other elements have changed. -list[TextEdit] listDiff(loc span, list[Tree] elemsA, list[Tree] elemsB) = shorterLengthDiff(span, elemsA, elemsB) - when size(elemsA) > size(elemsB); - -// this works only because we annotated the separators. -list[TextEdit] equalLengthDiff(list[Tree] elemsA, list[Tree] elemsB) - = [*treeDiff(a,b) | <- zip2(elemsA, elemsB)]; - -// added things to an empty list. this is also the final stage of a deep recursion -list[TextEdit] longerLengthDiff(loc span, [], list[Tree] elemsB) = [replace(span, yield(elemsB))]; - -// equal length lists can be forwarded (this happens when we already found the extra elements) -list[TextEdit] longerLengthDiff(loc span, list[Tree] elemsA, list[Tree] elemsB) - = equalLengthDiff(elemsA, elemsB) when size(elemsA) == size(elemsB); - -// always ignore identical trees, and continue with the rest -list[TextEdit] longerLengthDiff(loc span, [Tree a, *Tree elemsA], [a, *Tree elemsB]) - = longerLengthDiff(span[offset=a@\loc.offset][length=span.length-a@\loc.length], elemsA, elemsB); - -// a single elem is different and also new by definition because ("longerLengthDiff") -list[TextEdit] longerLengthDiff(loc span, [Tree a, *Tree elemsA], [Tree b:!a, *Tree elemsB]) - = [replace(span[length=0], "")] // we put b in front of a - + (size(elemsA) + 1 == size(elemsB) // and continue with the rest - ? equalLengthDiff([a, *elemsA], elemsB) // this could have been the last additional element - : longerLengthDiff(span, [a, *elemsA], elemsB)) // or we still have more to add - ; - -// we have to remove the elements that are replaced by an empty list -list[TextEdit] shorterLengthDiff(loc span, list[Tree] _, []) - = [replace(span, "")]; - -// always ignore identical trees, and continue with the rest -list[TextEdit] shorterLengthDiff(loc span, [Tree a, *Tree elemsA], [a, *Tree elemsB]) - = shorterLengthDiff(span[offset=a@\loc.offset][length=span.length-a@\loc.length], elemsA, elemsB); - -// a single elem is different and also superfluous by definition because ("shorterLengthDiff") -list[TextEdit] shorterLengthDiff(loc span, [Tree a, *Tree elemsA], [Tree b:!a, *Tree elemsB]) - = [replace(a@\loc, "")] // we replace a by b - + shorterLengthDiff(span, elemsA, elemsB) // and continue with the rest - // TODO: the lists could have become of equal length. Deal with that case. - ; - -private Production sepProd = prod(layouts("*separators*"),[],{}); +private loc cover(list[Tree] elems) = cover([e@\loc | e <- elems, e@\loc?]); @synopsis{yield a consecutive list of trees} private str yield(list[Tree] elems) = "<}>"; -@synopsis{Separator literals need location annotations because they have to be edited.} -private list[Tree] prepareSeparators([], int _) = []; - -private list[Tree] prepareSeparators([Tree t], int _) = [t]; - -// we group the 3 separators into a single tree with accurate position information. -private list[Tree] prepareSeparators([Tree head, Tree l1, Tree sep, Tree l2, *Tree rest], 3) - = [head, appl(sepProd, [l1, newSep, l2])[@\loc=span], *prepareSeparators(rest)] - when - span := head@\loc.top(end(head@\loc), size("")); - -// single separators get accurate position informaiton (even if they are layout) -private list[Tree] prepareSeparators([Tree head, Tree sep, *Tree rest], 1) - = [head, sep[\loc=span], *prepareSeparators(rest)] - when - span := head@\loc.top(end(head@\loc), size("")); - -// unseparated lists are ready -private list[Tree] prepareSeparators(list[Tree] elems, 0) = elems; - -private int end(loc src) = src.offset + src.length; - private str learnIndentation(str replacement, str original) = replacement; // TODO: learn minimal indentaton from original From 1525e738c9bc5f915fa484dbe3e616d5bf5ca6d6 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 7 Oct 2024 16:43:25 +0200 Subject: [PATCH 04/25] minor improvements. this is not finished yet --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 45 ++++++++++--------- 1 file changed, 24 insertions(+), 21 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 0b77130ecec..0e3292cd618 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -250,29 +250,32 @@ list[TextEdit] listDiff(loc _span, int seps, list[Tree] originals, list[Tree] re // what is left to be different. By maximizing commonalities, // the edits are minimized. Note that we float on source location parameters // not only for the edit locations but also for sub-tree identity. - solve (originals, replacements) { - = trimEqualElements(originals, replacements); - span = cover([orig@\loc | orig <- originals, orig@\loc?]); + + = trimEqualElements(originals, replacements); + span = cover([orig@\loc | orig <- originals, orig@\loc?]); - = commonSpecialCases(span, seps, originals, replacements); - edits += specialEdits; + = commonSpecialCases(span, seps, originals, replacements); + edits += specialEdits; - equalSubList = largestEqualSubList(originals, replacements); - - if (equalSubList != [], - [*preO, *equalSubList, *postO] := originals, - [*preR, *equalSubList, *postR] := replacements) { - // TODO: what about the separators? - // we align the prefixes and the postfixes and - // continue recursively. - return edits - + listDiff(cover(preO), seps, preO, preR) - + listDiff(cover(postO), seps, postO, postR) - ; - } + equalSubList = largestEqualSubList(originals, replacements); + + // by using the (or "a") largest common sublist as a pivot to divide-and-conquer + // to the left and right of it, we minimize the number of necessary + // edit actions for the entire list. + if (equalSubList != [], + [*preO, *equalSubList, *postO] := originals, + [*preR, *equalSubList, *postR] := replacements) { + // TODO: what about the separators? + // we align the prefixes and the postfixes and + // continue recursively. + return edits + + listDiff(cover(preO), seps, preO, preR) + + listDiff(cover(postO), seps, postO, postR) + ; + } + else { // nothing in common means we can replace the entire list + return edits + replace(span, learnIndentation(yield(replacements), yield(originals))); } - - return edits; } @synopsis{Finds the largest sublist that occurs in both lists} @@ -324,7 +327,7 @@ tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 3, @synopsis{convenience overload for shorter code} -private loc fromUntil(Tree from, Tree until) = fromUntil(fro@\loc, until@\loc); +private loc fromUntil(Tree from, Tree until) = fromUntil(from@\loc, until@\loc); @synopsis{Compute location span that is common between an element and a succeeding element} @description{ From 3196433fe647415695d302f41f8e48a27c54760d Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Thu, 10 Oct 2024 09:46:53 +0200 Subject: [PATCH 05/25] slow progress --- .../analysis/diff/edits/ExecuteTextEdits.rsc | 12 +++++-- .../analysis/diff/edits/HiFiTreeDiff.rsc | 34 +++++++++++++------ 2 files changed, 32 insertions(+), 14 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc b/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc index e3417aea87d..90dcc727937 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc @@ -24,16 +24,22 @@ void executeDocumentEdit(renamed(loc from, loc to)) { } void executeDocumentEdit(changed(loc file, list[TextEdit] edits)) { + str content = readFile(file); + + content = executeTextEdits(content, edits); + + writeFile(file.top, content); +} + +str executeTextEdits(str content, list[TextEdit] edits) { assert isSorted(edits, less=bool (TextEdit e1, TextEdit e2) { return e1.range.offset < e2.range.offset; }); - str content = readFile(file); - for (replace(loc range, str repl) <- reverse(edits)) { assert range.top == file.top; content = ""; } - writeFile(file.top, content); + return content; } diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 0e3292cd618..16a5b402926 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -242,7 +242,6 @@ default int seps(Symbol _) = 0; @synsopis{List diff is like text diff on lines; complex and easy to make slow} list[TextEdit] listDiff(loc _span, int seps, list[Tree] originals, list[Tree] replacements) { - assert originals != replacements && originals == []; edits = []; // this algorithm isolates commonalities between the two lists @@ -257,7 +256,7 @@ list[TextEdit] listDiff(loc _span, int seps, list[Tree] originals, list[Tree] re = commonSpecialCases(span, seps, originals, replacements); edits += specialEdits; - equalSubList = largestEqualSubList(originals, replacements); + equalSubList = largestEqualSubList(span, originals, replacements); // by using the (or "a") largest common sublist as a pivot to divide-and-conquer // to the left and right of it, we minimize the number of necessary @@ -284,21 +283,27 @@ Using list matching and backtracking, this algorithm detects which common sublist is the largest. It assumes ((trimEqualElements)) has happened already, and thus there are interesting differences left, even if we remove any equal sublist. + +Note that this is not a general algorithm for Largest Common Subsequence (LCS), since it +uses particular properties of the relation between the original and the replacement list. +* New elements are never equal to old elements (due to source locations) +* Equal prefixes and postfixes may be assumed to be maximal sublists as well (see above). +* Candidate equal sublists always have consecutive source locations from the origin. +* etc. } -list[Tree] largestEqualSubList(list[Tree] originals, list[Tree] replacements) { - assert := trimEqualElements(originals, replacements) : "both lists begin and end with unique elements"; +list[Tree] largestEqualSubList(loc span, list[Tree] originals, list[Tree] replacements) { + // assert := trimEqualElements(originals, replacements) : "both lists begin and end with unique elements"; bool largerList(list[Tree] a, list[Tree] b) = size(a) > size(b); + + bool fromOriginalFile(loc span, Tree last) = span.top == (last@\loc?|unknown:///|).top; - equals = [eq | - [*_, pre, *eq, post, *_] := originals, size(eq) > 0, - [*_, !pre, *eq, !post, *_] := replacements + equals = [[*eq,q] | + [*_, pre, *eq, q, post, *_] := replacements, fromOriginalFile(span, q), + [*_, !pre, *eq, q, !post, *_] := originals ]; - return [largest, *_] := sort(equals, largerList) - ? largest - : [] // no equal sublists detected - ; + return sort(equals, largerList)[0] ? []; } @synopsis{trips equal elements from the front and the back of both lists, if any.} @@ -325,6 +330,13 @@ tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 3, [Tree a, Tree _l1, Tree _sep, Tree _l2, Tree tHead, *Tree tail], [tHead, *tail]) = <[replace(fromUntil(a, tHead), "")], [], []>; +// singleton replacement +tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, int _, + [Tree a], [Tree b]) + = ; + +default tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, int _, list[Tree] a, list[Tree] b) + = <[], a, b>; @synopsis{convenience overload for shorter code} private loc fromUntil(Tree from, Tree until) = fromUntil(from@\loc, until@\loc); From 3f05df428847e9c9195558674d4f3bbfc3539dd7 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Thu, 10 Oct 2024 11:40:32 +0200 Subject: [PATCH 06/25] added demo --- .../analysis/diff/edits/ExecuteTextEdits.rsc | 1 - .../analysis/diff/edits/HiFiTreeDiff.rsc | 10 ++--- .../rascalmpl/library/lang/pico/HiFiDemo.rsc | 44 +++++++++++++++++++ .../library/lang/pico/examples/flip.pico | 14 ++++++ 4 files changed, 63 insertions(+), 6 deletions(-) create mode 100644 src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc create mode 100644 src/org/rascalmpl/library/lang/pico/examples/flip.pico diff --git a/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc b/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc index 90dcc727937..0d5388ce802 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc @@ -37,7 +37,6 @@ str executeTextEdits(str content, list[TextEdit] edits) { }); for (replace(loc range, str repl) <- reverse(edits)) { - assert range.top == file.top; content = ""; } diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 16a5b402926..6e1e718fca7 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -298,12 +298,12 @@ list[Tree] largestEqualSubList(loc span, list[Tree] originals, list[Tree] replac bool fromOriginalFile(loc span, Tree last) = span.top == (last@\loc?|unknown:///|).top; - equals = [[*eq,q] | - [*_, pre, *eq, q, post, *_] := replacements, fromOriginalFile(span, q), - [*_, !pre, *eq, q, !post, *_] := originals - ]; + if ([*_, pre, *Tree eq, post, *_] := replacements, + [*_, !pre, *eq, !post, *_] := originals) { + return eq; + } - return sort(equals, largerList)[0] ? []; + return []; } @synopsis{trips equal elements from the front and the back of both lists, if any.} diff --git a/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc b/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc new file mode 100644 index 00000000000..0720c95ea1a --- /dev/null +++ b/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc @@ -0,0 +1,44 @@ +@synopsis{Demonstrates HiFi source-to-source transformations through concrete syntax rewrites and text edits.} +module lang::pico::HiFiDemo + +import lang::pico::\syntax::Main; +import IO; +import ParseTree; +import analysis::diff::edits::HiFiTreeDiff; +import analysis::diff::edits::ExecuteTextEdits; + +@synopsis{Blindly swaps the branches of all the conditionals in a program} +@description{ +This rule is syntactically correct and has a clear semantics. The +layout of the resulting if-then-else-fi statement is also clear. +} +start[Program] flipConditionals(start[Program] program) = visit(program) { + case (Statement) `if then + ' <{Statement ";"}* ifBranch> + 'else + ' <{Statement ";"}* elseBranch> + 'fi` => + (Statement) `if then + ' <{Statement ";"}* elseBranch> + 'else + ' <{Statement ";"}* ifBranch> + 'fi` +}; + +void main() { + t = parse(#start[Program], |project://rascal/src/org/rascalmpl/library/lang/pico/examples/flip.pico|); + println("The original: + '"); + + u = flipConditionals(t); + println("Branches swapped, comments and indentation lost: + '"); + + edits = treeDiff(t, u); + println("Smaller text edits: + ' "); + + newContent = executeTextEdits("", edits); + println("Better output after executeTextEdits: + '"); +} diff --git a/src/org/rascalmpl/library/lang/pico/examples/flip.pico b/src/org/rascalmpl/library/lang/pico/examples/flip.pico new file mode 100644 index 00000000000..f235085ebcc --- /dev/null +++ b/src/org/rascalmpl/library/lang/pico/examples/flip.pico @@ -0,0 +1,14 @@ +begin + declare + a : natural, + b : natural; + a := 0; + b := 1; + if a then + % comment 1 % + b := a + else + % comment 2 % + a := b + fi +end \ No newline at end of file From ed091f7115db224724eca8232372610bf51b6b4c Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Sun, 13 Oct 2024 16:06:43 +0200 Subject: [PATCH 07/25] exposed IString.indent to String library module to allow users to reuse indentation in O(1) --- src/org/rascalmpl/library/Prelude.java | 4 ++++ src/org/rascalmpl/library/String.rsc | 16 ++++++++++++++++ 2 files changed, 20 insertions(+) diff --git a/src/org/rascalmpl/library/Prelude.java b/src/org/rascalmpl/library/Prelude.java index 61bcf1889c4..e13d0167704 100644 --- a/src/org/rascalmpl/library/Prelude.java +++ b/src/org/rascalmpl/library/Prelude.java @@ -3047,6 +3047,10 @@ public IValue stringChars(IList lst){ return values.string(chars); } + + public IString indent(IString indentation, IString content, IBool indentFirstLine) { + return content.indent(indentation, indentFirstLine.getValue()); + } public IValue charAt(IString s, IInteger i) throws IndexOutOfBoundsException //@doc{charAt -- return the character at position i in string s.} diff --git a/src/org/rascalmpl/library/String.rsc b/src/org/rascalmpl/library/String.rsc index de466de5272..8ce2acfedf8 100644 --- a/src/org/rascalmpl/library/String.rsc +++ b/src/org/rascalmpl/library/String.rsc @@ -627,3 +627,19 @@ str substitute(str src, map[loc,str] s) { order = sort([ k | k <- s ], bool(loc a, loc b) { return a.offset < b.offset; }); return ( src | subst1(it, x, s[x]) | x <- order ); } + +@synopsis{Indent a block of text} +@description{ +Every line in `content` will be indented using the characters +of `indentation`. +} +@benefits{ +* This operation executes in constant time, independent of the size of the content +or the indentation. +* Indent is the identity function if `indentation == ""` +} +@pitfalls{ +* This function works fine if `indentation` is not spaces or tabs; but it does not make much sense. +} +@javaClass{org.rascalmpl.library.Prelude} +java str indent(str indentation, str content, bool indentFirstLine=false); \ No newline at end of file From 2462eeb16a03a1d806fcf770a839895c3074d140 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Sun, 13 Oct 2024 16:06:57 +0200 Subject: [PATCH 08/25] slow progress on the diff algorithm --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 18 +++++++++++++----- .../rascalmpl/library/lang/pico/HiFiDemo.rsc | 4 ++-- 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 6e1e718fca7..39bda42f896 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -272,8 +272,10 @@ list[TextEdit] listDiff(loc _span, int seps, list[Tree] originals, list[Tree] re + listDiff(cover(postO), seps, postO, postR) ; } - else { // nothing in common means we can replace the entire list - return edits + replace(span, learnIndentation(yield(replacements), yield(originals))); + else { + // covered all the cases + assert originals := replacements; + return edits; } } @@ -308,10 +310,10 @@ list[Tree] largestEqualSubList(loc span, list[Tree] originals, list[Tree] replac @synopsis{trips equal elements from the front and the back of both lists, if any.} tuple[list[Tree], list[Tree]] trimEqualElements([Tree a, *Tree aPostfix], [ a, *Tree bPostfix]) - = ; + = trimEqualElements(aPostfix, bPostfix); tuple[list[Tree], list[Tree]] trimEqualElements([*Tree aPrefix, Tree a], [*Tree bPrefix, a]) - = ; + = trimEqualElements(aPrefix, bPrefix); default tuple[list[Tree], list[Tree]] trimEqualElements(list[Tree] a, list[Tree] b) = ; @@ -358,4 +360,10 @@ private loc cover(list[Tree] elems) = cover([e@\loc | e <- elems, e@\loc?]); @synopsis{yield a consecutive list of trees} private str yield(list[Tree] elems) = "<}>"; -private str learnIndentation(str replacement, str original) = replacement; // TODO: learn minimal indentaton from original +private str learnIndentation(str replacement, str original) { + list[str] indents(str text) = [indent | // <- split(text, "\n")]; + + str minIndent = sort(indents(original)[1..])[0]? ""; + + return indent(minIndent, replacement); +} diff --git a/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc b/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc index 0720c95ea1a..7be0812f81d 100644 --- a/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc +++ b/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc @@ -35,8 +35,8 @@ void main() { '"); edits = treeDiff(t, u); - println("Smaller text edits: - ' "); + println("Smaller text edits:"); + iprintln(edits); newContent = executeTextEdits("", edits); println("Better output after executeTextEdits: From 8abdbd66dba0dcff47eef9d055178a3a77c88acc Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Sun, 13 Oct 2024 16:29:41 +0200 Subject: [PATCH 09/25] more complex example, and debug prints --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 19 ++++++++++++++----- .../library/lang/pico/examples/flip.pico | 11 +++++++++-- 2 files changed, 23 insertions(+), 7 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 39bda42f896..9a3d3239853 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -87,6 +87,7 @@ import ParseTree; import List; import String; import Location; +import IO; @synopsis{Detects minimal differences between parse trees and makes them explicit as ((TextEdit)) instructions.} @description{ @@ -241,7 +242,10 @@ int seps(\iter-star-seps(_,list[Symbol] s)) = size(s); default int seps(Symbol _) = 0; @synsopis{List diff is like text diff on lines; complex and easy to make slow} -list[TextEdit] listDiff(loc _span, int seps, list[Tree] originals, list[Tree] replacements) { +list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] replacements) { + println(" listDiff: + ' + ' "); edits = []; // this algorithm isolates commonalities between the two lists @@ -272,11 +276,12 @@ list[TextEdit] listDiff(loc _span, int seps, list[Tree] originals, list[Tree] re + listDiff(cover(postO), seps, postO, postR) ; } - else { - // covered all the cases - assert originals := replacements; + else if (originals := replacements) { return edits; } + else { + return edits + [replace(span, learnIndentation(yield(replacements), yield(originals)))]; + } } @synopsis{Finds the largest sublist that occurs in both lists} @@ -361,9 +366,13 @@ private loc cover(list[Tree] elems) = cover([e@\loc | e <- elems, e@\loc?]); private str yield(list[Tree] elems) = "<}>"; private str learnIndentation(str replacement, str original) { - list[str] indents(str text) = [indent | // <- split(text, "\n")]; + println("learning: + ' + ' "); + list[str] indents(str text) = [indent | /^[^\ \t]/ <- split(text, "\n")]; str minIndent = sort(indents(original)[1..])[0]? ""; + println("minIndent []"); return indent(minIndent, replacement); } diff --git a/src/org/rascalmpl/library/lang/pico/examples/flip.pico b/src/org/rascalmpl/library/lang/pico/examples/flip.pico index f235085ebcc..2bd7685a354 100644 --- a/src/org/rascalmpl/library/lang/pico/examples/flip.pico +++ b/src/org/rascalmpl/library/lang/pico/examples/flip.pico @@ -6,9 +6,16 @@ begin b := 1; if a then % comment 1 % - b := a + b := a; + x := 1 else % comment 2 % - a := b + a := b; + if b then + z := a + else + z := b + fi; + z := z fi end \ No newline at end of file From 4a55110f95f685b33db7cd0c4e39671fda10d61d Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Sun, 13 Oct 2024 17:48:42 +0200 Subject: [PATCH 10/25] finetunes stuff in indentation learner --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 9a3d3239853..555287f7225 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -369,10 +369,21 @@ private str learnIndentation(str replacement, str original) { println("learning: ' ' "); - list[str] indents(str text) = [indent | /^[^\ \t]/ <- split(text, "\n")]; + list[str] indents(str text) = [indent | /^[^\ \t]/ <- split("\n", text)]; - str minIndent = sort(indents(original)[1..])[0]? ""; + origIndents = indents(original); + replLines = split("\n", replacement); - println("minIndent []"); - return indent(minIndent, replacement); + if (replLines == []) { + return ""; + } + + minIndent = sort(origIndents[1..])[0]? ""; + + stripped = [ /^$/ := line ? rest : line | line <- replLines[1..]]; + + indented = [replLines[0], *[ indent(minIndent, line, indentFirstLine=true) | line <- stripped]]; + + return " + '<}>"[..-1]; } From 97eb3529a634c66d1203e2822a9dd01b48a10519 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Tue, 15 Oct 2024 09:47:30 +0200 Subject: [PATCH 11/25] testing --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 47 ++++++++++++------- .../rascalmpl/library/lang/pico/HiFiDemo.rsc | 4 ++ .../library/lang/pico/examples/flip.pico | 2 +- 3 files changed, 35 insertions(+), 18 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 555287f7225..badc4333c34 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -216,14 +216,14 @@ list[TextEdit] treeDiff( list[TextEdit] treeDiff( t:appl(prod(lex(str l), _, _), list[Tree] _), r:appl(prod(lex(l) , _, _), list[Tree] _)) - = [replace(t@\loc, learnIndentation("", ""))] + = [replace(t@\loc, learnIndentation(t@\loc, "", ""))] when t != r; // When the productions are different, we've found an edit, and there is no need to recurse deeper. default list[TextEdit] treeDiff( t:appl(Production p:prod(_,_,_), list[Tree] _), r:appl(Production q:!p , list[Tree] _)) - = [replace(t@\loc, learnIndentation("", ""))]; + = [replace(t@\loc, learnIndentation(t@\loc, "", ""))]; // If list production are the same, then the element lists can still be of different length // and we switch to listDiff which has different heuristics than normal trees. @@ -243,9 +243,9 @@ default int seps(Symbol _) = 0; @synsopis{List diff is like text diff on lines; complex and easy to make slow} list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] replacements) { - println(" listDiff: - ' - ' "); + // println(" listDiff: + // ' + // ' "); edits = []; // this algorithm isolates commonalities between the two lists @@ -280,7 +280,7 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep return edits; } else { - return edits + [replace(span, learnIndentation(yield(replacements), yield(originals)))]; + return edits + [replace(span, learnIndentation(span, yield(replacements), yield(originals)))]; } } @@ -365,10 +365,15 @@ private loc cover(list[Tree] elems) = cover([e@\loc | e <- elems, e@\loc?]); @synopsis{yield a consecutive list of trees} private str yield(list[Tree] elems) = "<}>"; -private str learnIndentation(str replacement, str original) { - println("learning: - ' - ' "); +@synopsis{Make sure the subtitution is at least as far indented as the original} +@description{ +This algorithm ignores the first line, since the first line is always preceeded by the layout of a parent node. + +Then it measures the depth of indentation of every line in the original, and takes the minimum. +That minimum indentation is stripped off every line that already has that much indentation in the replacement, +and then _all_ lines are re-indented with the discovered minimum. +} +private str learnIndentation(loc span, str replacement, str original) { list[str] indents(str text) = [indent | /^[^\ \t]/ <- split("\n", text)]; origIndents = indents(original); @@ -378,12 +383,20 @@ private str learnIndentation(str replacement, str original) { return ""; } - minIndent = sort(origIndents[1..])[0]? ""; - - stripped = [ /^$/ := line ? rest : line | line <- replLines[1..]]; - - indented = [replLines[0], *[ indent(minIndent, line, indentFirstLine=true) | line <- stripped]]; + minIndent = ""; + if ([_] := origIndents) { + // only one line. have to invent indentation from span + minIndent = " <}>"; + } + else { + minIndent = sort(origIndents[1..])[0]? ""; + } + + println("min: []"); + stripped = [ /^$/ := line ? rest : line | line <- replLines]; - return " - '<}>"[..-1]; + println("stripped:"); + iprintln(stripped); + return indent(minIndent, " + '<}>"[..-1]); } diff --git a/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc b/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc index 7be0812f81d..3decb0f5c00 100644 --- a/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc +++ b/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc @@ -41,4 +41,8 @@ void main() { newContent = executeTextEdits("", edits); println("Better output after executeTextEdits: '"); + + newU = parse(#start[Program], newContent); + + assert u := newU : "the rewritten tree matches the newly parsed"; } diff --git a/src/org/rascalmpl/library/lang/pico/examples/flip.pico b/src/org/rascalmpl/library/lang/pico/examples/flip.pico index 2bd7685a354..63f58c62b40 100644 --- a/src/org/rascalmpl/library/lang/pico/examples/flip.pico +++ b/src/org/rascalmpl/library/lang/pico/examples/flip.pico @@ -7,7 +7,7 @@ begin if a then % comment 1 % b := a; - x := 1 + z := z else % comment 2 % a := b; From fd6ccbb59278498d2fb0857653d8b8135c380bfa Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Thu, 9 Jan 2025 11:56:51 +0100 Subject: [PATCH 12/25] fixed nasty bug in Type.intersection w.r.t. parameter types --- src/org/rascalmpl/types/NonTerminalType.java | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/org/rascalmpl/types/NonTerminalType.java b/src/org/rascalmpl/types/NonTerminalType.java index 20094a487ef..8ff2293f257 100644 --- a/src/org/rascalmpl/types/NonTerminalType.java +++ b/src/org/rascalmpl/types/NonTerminalType.java @@ -346,6 +346,9 @@ public boolean intersects(Type other) { if (other == RascalValueFactory.Tree) { return true; } + else if (other.isParameter()) { + return other.intersects(this); + } else if (other instanceof NonTerminalType) { return ((NonTerminalType) other).intersectsWithNonTerminal(this); } From 1c0a81d263438faf387c0340d8a4d32fc7903239 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Thu, 9 Jan 2025 12:00:52 +0100 Subject: [PATCH 13/25] started on testing HiFiTreeDiff --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 3 - .../analysis/diff/edits/HiFiTreeDiffTests.rsc | 56 +++++++++++++++++++ 2 files changed, 56 insertions(+), 3 deletions(-) create mode 100644 src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index badc4333c34..248514b6a90 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -392,11 +392,8 @@ private str learnIndentation(loc span, str replacement, str original) { minIndent = sort(origIndents[1..])[0]? ""; } - println("min: []"); stripped = [ /^$/ := line ? rest : line | line <- replLines]; - println("stripped:"); - iprintln(stripped); return indent(minIndent, " '<}>"[..-1]); } diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc new file mode 100644 index 00000000000..191fb877b9a --- /dev/null +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -0,0 +1,56 @@ +module lang::rascal::tests::library::analysis::diff::edits::HiFiTreeDiffTests + +extend analysis::diff::edits::ExecuteTextEdits; +extend analysis::diff::edits::HiFiTreeDiff; +extend lang::pico::\syntax::Main; + +import ParseTree; +import IO; + +public str simpleExample + = "begin + ' declare + ' a : natural, + ' b : natural; + ' a := a + b; + ' b := a - b; + ' a := a - b + 'end + '"; + +@synopsis{Specification of what it means for `treeDiff` to be syntactically correct} +@description{ +TreeDiff is syntactically correct if: +* The tree after rewriting _matches_ the tree after applying the edits tot the source text and parsing that. +* Note that _matching_ ignores case-insensitive literals and layout, indentation and comments +} +bool editsAreSyntacticallyCorrect(type[&T<:Tree] grammar, str example, (&T<:Tree)(&T<:Tree) transform) { + println("Transforming: + '"); + orig = parse(grammar, example); + transformed = transform(orig); + println("Transformed: + '"); + edits = treeDiff(orig, transformed); + println("Edits: + '"); + edited = executeTextEdits(example, edits); + println("Edited: + '"); + + // the edited text should produce a tree that matches the rewritten tree + return transformed := parse(grammar, edited); +} + +(&X<:Tree) identity(&X<:Tree x) = x; + +start[Program] swapAB(start[Program] p) = visit(p) { + case (Id) `a` => (Id) `b` + case (Id) `b` => (Id) `a` +}; + +test bool nulTestWithId() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, identity); + +test bool simpleSwapper() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, swapAB); From 9c644589b66ba15fc9dfaac46153079d82b432a6 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Thu, 9 Jan 2025 12:13:22 +0100 Subject: [PATCH 14/25] minor improvements --- .../analysis/diff/edits/HiFiTreeDiffTests.rsc | 47 +++++++++++++------ 1 file changed, 32 insertions(+), 15 deletions(-) diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index 191fb877b9a..6ed4f8f9172 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -4,8 +4,9 @@ extend analysis::diff::edits::ExecuteTextEdits; extend analysis::diff::edits::HiFiTreeDiff; extend lang::pico::\syntax::Main; -import ParseTree; import IO; +import ParseTree; +import String; public str simpleExample = "begin @@ -25,23 +26,38 @@ TreeDiff is syntactically correct if: * Note that _matching_ ignores case-insensitive literals and layout, indentation and comments } bool editsAreSyntacticallyCorrect(type[&T<:Tree] grammar, str example, (&T<:Tree)(&T<:Tree) transform) { - println("Transforming: - '"); - orig = parse(grammar, example); + orig = parse(grammar, example); transformed = transform(orig); - println("Transformed: - '"); - edits = treeDiff(orig, transformed); - println("Edits: - '"); - edited = executeTextEdits(example, edits); - println("Edited: - '"); - - // the edited text should produce a tree that matches the rewritten tree + edits = treeDiff(orig, transformed); + edited = executeTextEdits(example, edits); + return transformed := parse(grammar, edited); } +@synopsis{Extract the leading spaces of each line of code} +list[str] indentationLevels(str example) + = [ i | /^[^\ ]*/ <- split("\n", example)]; + +@synopsis{In many cases, but not always, treeDiff maintains the indentation levels} +@description{ +Typically when a rewrite does not change the lines of code count, +and when the structure of the statements remains comparable, treeDiff +can guarantee that the indentation of a file remains unchanged, even if +significant changes to the code have been made. +} +@pitfalls{ +* This specification is not true for any transformation. Only apply it to +a test case if you can expect indentation-preservation for _the entire file_. +} +bool editsMaintainIndentationLevels(type[&T<:Tree] grammar, str example, (&T<:Tree)(&T<:Tree) transform) { + orig = parse(grammar, example); + transformed = transform(orig); + edits = treeDiff(orig, transformed); + edited = executeTextEdits(example, edits); + + return indentationLevels(example) == indentationLevels(edited); +} + (&X<:Tree) identity(&X<:Tree x) = x; start[Program] swapAB(start[Program] p) = visit(p) { @@ -53,4 +69,5 @@ test bool nulTestWithId() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, identity); test bool simpleSwapper() - = editsAreSyntacticallyCorrect(#start[Program], simpleExample, swapAB); + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, swapAB) + && editsMaintainIndentationLevels(#start[Program], simpleExample, swapAB); From 71a1c00338a69c5af97807c55fc0556c95191cc0 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Thu, 9 Jan 2025 15:37:30 +0100 Subject: [PATCH 15/25] fixed bug in list diff --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 32 ++++++++++------ .../analysis/diff/edits/HiFiTreeDiffTests.rsc | 37 ++++++++++++++++++- 2 files changed, 56 insertions(+), 13 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 248514b6a90..22eca29d3e4 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -230,11 +230,11 @@ default list[TextEdit] treeDiff( list[TextEdit] treeDiff( Tree t:appl(Production p:regular(Symbol reg), list[Tree] aElems), appl(p, list[Tree] bElems)) - = listDiff(t@\loc, seps(reg), aElems, bElems); + = listDiff(t@\loc, seps(reg), aElems, bElems) when bprintln("diving into

"); // When the productions are equal, but the children may be different, we dig deeper for differences default list[TextEdit] treeDiff(appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) - = [*treeDiff(a, b) | <- zip2(argsA, argsB)]; + = [*treeDiff(a, b) | <- zip2(argsA, argsB)] when bprintln("diving into

"); @synopsis{decide how many separators we have} int seps(\iter-seps(_,list[Symbol] s)) = size(s); @@ -254,9 +254,9 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep // the edits are minimized. Note that we float on source location parameters // not only for the edit locations but also for sub-tree identity. - = trimEqualElements(originals, replacements); - span = cover([orig@\loc | orig <- originals, orig@\loc?]); - + println("span before trim: , size originals "); + = trimEqualElements(span, originals, replacements); + println("span after trim: , size originals "); = commonSpecialCases(span, seps, originals, replacements); edits += specialEdits; @@ -314,14 +314,14 @@ list[Tree] largestEqualSubList(loc span, list[Tree] originals, list[Tree] replac } @synopsis{trips equal elements from the front and the back of both lists, if any.} -tuple[list[Tree], list[Tree]] trimEqualElements([Tree a, *Tree aPostfix], [ a, *Tree bPostfix]) - = trimEqualElements(aPostfix, bPostfix); +tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, [Tree a, *Tree aPostfix], [ a, *Tree bPostfix]) + = trimEqualElements(endCover(span, aPostfix), aPostfix, bPostfix); -tuple[list[Tree], list[Tree]] trimEqualElements([*Tree aPrefix, Tree a], [*Tree bPrefix, a]) - = trimEqualElements(aPrefix, bPrefix); +tuple[loc, list[Tree], list[Tree]] trimEqualElements([*Tree aPrefix, Tree a], [*Tree bPrefix, a]) + = trimEqualElements(beginCover(span, aPrefix), aPrefix, bPrefix); -default tuple[list[Tree], list[Tree]] trimEqualElements(list[Tree] a, list[Tree] b) - = ; +default tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, list[Tree] a, list[Tree] b) + = ; // only one element removed in front, then we are done tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 0, [Tree a, *Tree tail], [*tail]) @@ -360,7 +360,15 @@ up to `until`. private loc fromUntil(loc from, loc until) = from.top(from.offset, until.offset - from.offset); private int end(loc src) = src.offset + src.length; -private loc cover(list[Tree] elems) = cover([e@\loc | e <- elems, e@\loc?]); +private loc endCover(loc span, []) = span(span.offset + span.length, 0); +private loc endCover(loc span, [Tree x]) = x@\loc; +private default loc endCover(loc span, list[Tree] l) = cover(l); + +private loc beginCover(loc span, []) = span(span.offset, 0); +private loc beginCover(loc span, [Tree x]) = x@\loc; +private default loc beginCover(loc span, list[Tree] l) = cover(l); + +private loc cover(list[Tree] elems:[_, *_]) = cover([e@\loc | Tree e <- elems, e@\loc?]); @synopsis{yield a consecutive list of trees} private str yield(list[Tree] elems) = "<}>"; diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index 6ed4f8f9172..0a073e6dd62 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -29,9 +29,18 @@ bool editsAreSyntacticallyCorrect(type[&T<:Tree] grammar, str example, (&T<:Tree orig = parse(grammar, example); transformed = transform(orig); edits = treeDiff(orig, transformed); + println("derived edits:"); + iprintln(edits); edited = executeTextEdits(example, edits); - return transformed := parse(grammar, edited); + try { + return transformed := parse(grammar, edited); + } + catch ParseError(loc l): { + println("Parse error in:"); + println(edited); + return false; + } } @synopsis{Extract the leading spaces of each line of code} @@ -65,9 +74,35 @@ start[Program] swapAB(start[Program] p) = visit(p) { case (Id) `b` => (Id) `a` }; +start[Program] addDeclarationToEnd(start[Program] p) = visit(p) { + case (Program) `begin declare <{IdType ","}* decls>; <{Statement ";"}* body> end` + => (Program) `begin + ' declare + ' <{IdType ","}* decls>, + ' c : natural; + ' <{Statement ";"}* body> + 'end` +}; + +start[Program] addDeclarationToStart(start[Program] p) = visit(p) { + case (Program) `begin declare <{IdType ","}* decls>; <{Statement ";"}* body> end` + => (Program) `begin + ' declare + ' c : natural, + ' <{IdType ","}* decls>; + ' <{Statement ";"}* body> + 'end` +}; + test bool nulTestWithId() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, identity); test bool simpleSwapper() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, swapAB) && editsMaintainIndentationLevels(#start[Program], simpleExample, swapAB); + +test bool addDeclarationToEndTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToEnd); + +test bool addDeclarationToStartTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart); From 26777955b4652b8a585f0b1f285c5e4dc448692c Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Thu, 9 Jan 2025 15:40:45 +0100 Subject: [PATCH 16/25] oops --- src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 22eca29d3e4..8f423b4d073 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -317,7 +317,7 @@ list[Tree] largestEqualSubList(loc span, list[Tree] originals, list[Tree] replac tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, [Tree a, *Tree aPostfix], [ a, *Tree bPostfix]) = trimEqualElements(endCover(span, aPostfix), aPostfix, bPostfix); -tuple[loc, list[Tree], list[Tree]] trimEqualElements([*Tree aPrefix, Tree a], [*Tree bPrefix, a]) +tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, [*Tree aPrefix, Tree a], [*Tree bPrefix, a]) = trimEqualElements(beginCover(span, aPrefix), aPrefix, bPrefix); default tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, list[Tree] a, list[Tree] b) From 091b0b942c8f7e14dcd63c3383682205f5cff2e0 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Thu, 9 Jan 2025 19:39:43 +0100 Subject: [PATCH 17/25] simplified and repaired equal sublist detection --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 28 ++++++++----------- .../analysis/diff/edits/HiFiTreeDiffTests.rsc | 14 ++++++++++ 2 files changed, 25 insertions(+), 17 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 8f423b4d073..e10ad351b27 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -259,8 +259,12 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep println("span after trim: , size originals "); = commonSpecialCases(span, seps, originals, replacements); edits += specialEdits; + println("special edits:"); + iprintln(edits); - equalSubList = largestEqualSubList(span, originals, replacements); + equalSubList = largestEqualSubList(originals, replacements); + println("equal sublist:"); + println(yield(equalSubList)); // by using the (or "a") largest common sublist as a pivot to divide-and-conquer // to the left and right of it, we minimize the number of necessary @@ -272,8 +276,8 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep // we align the prefixes and the postfixes and // continue recursively. return edits - + listDiff(cover(preO), seps, preO, preR) - + listDiff(cover(postO), seps, postO, postR) + + listDiff(beginCover(span, preO), seps, preO, preR) + + listDiff(endCover(span, postO), seps, postO, postR) ; } else if (originals := replacements) { @@ -298,20 +302,10 @@ uses particular properties of the relation between the original and the replacem * Candidate equal sublists always have consecutive source locations from the origin. * etc. } -list[Tree] largestEqualSubList(loc span, list[Tree] originals, list[Tree] replacements) { - // assert := trimEqualElements(originals, replacements) : "both lists begin and end with unique elements"; - - bool largerList(list[Tree] a, list[Tree] b) = size(a) > size(b); - - bool fromOriginalFile(loc span, Tree last) = span.top == (last@\loc?|unknown:///|).top; - - if ([*_, pre, *Tree eq, post, *_] := replacements, - [*_, !pre, *eq, !post, *_] := originals) { - return eq; - } - - return []; -} +list[Tree] largestEqualSubList([*Tree sub], [*_, *sub, *_]) = sub; +list[Tree] largestEqualSubList([*_, *sub, *_], [*Tree sub]) = sub; +list[Tree] largestEqualSubList([*_, *sub, *_], [*_, *Tree sub, *_]) = sub; +default list[Tree] largestEqualSubList(list[Tree] _orig, list[Tree] _repl) = []; @synopsis{trips equal elements from the front and the back of both lists, if any.} tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, [Tree a, *Tree aPostfix], [ a, *Tree bPostfix]) diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index 0a073e6dd62..80060cf59ae 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -94,6 +94,17 @@ start[Program] addDeclarationToStart(start[Program] p) = visit(p) { 'end` }; +start[Program] addDeclarationToStartAndEnd(start[Program] p) = visit(p) { + case (Program) `begin declare <{IdType ","}* decls>; <{Statement ";"}* body> end` + => (Program) `begin + ' declare + ' x : natural, + ' <{IdType ","}* decls>, + ' y : natural; + ' <{Statement ";"}* body> + 'end` +}; + test bool nulTestWithId() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, identity); @@ -106,3 +117,6 @@ test bool addDeclarationToEndTest() test bool addDeclarationToStartTest() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart); + +test bool addDeclarationToStartAndEndTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStartAndEnd); From ed1ad0335794659d9ece5ecceef121b5253a2c23 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Fri, 10 Jan 2025 16:29:32 +0100 Subject: [PATCH 18/25] finding more nested similarity under list elements --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 22 ++++++++++++++----- 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index e10ad351b27..d882fbfb209 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -283,8 +283,15 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep else if (originals := replacements) { return edits; } - else { - return edits + [replace(span, learnIndentation(span, yield(replacements), yield(originals)))]; + else if (size(originals) == size(replacements)) { + return edits + + [*treeDiff(a, b) | <- zip2(originals, replacements)]; + ; + } else { + // TODO: make cases for shortering or lenghtening a list but + // mixing the common prefix with `treeDiff` to find more nested sharing + return edits + + [replace(span, learnIndentation(span, yield(replacements), yield(originals)))]; } } @@ -304,17 +311,20 @@ uses particular properties of the relation between the original and the replacem } list[Tree] largestEqualSubList([*Tree sub], [*_, *sub, *_]) = sub; list[Tree] largestEqualSubList([*_, *sub, *_], [*Tree sub]) = sub; -list[Tree] largestEqualSubList([*_, *sub, *_], [*_, *Tree sub, *_]) = sub; +list[Tree] largestEqualSubList([*_, p, *sub, q, *_], [*_, !p, *Tree sub, !q, *_]) = sub; default list[Tree] largestEqualSubList(list[Tree] _orig, list[Tree] _repl) = []; @synopsis{trips equal elements from the front and the back of both lists, if any.} -tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, [Tree a, *Tree aPostfix], [ a, *Tree bPostfix]) +tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, + [Tree a, *Tree aPostfix], [ a, *Tree bPostfix]) = trimEqualElements(endCover(span, aPostfix), aPostfix, bPostfix); -tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, [*Tree aPrefix, Tree a], [*Tree bPrefix, a]) +tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, + [*Tree aPrefix, Tree a], [*Tree bPrefix, a]) = trimEqualElements(beginCover(span, aPrefix), aPrefix, bPrefix); -default tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, list[Tree] a, list[Tree] b) +default tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, + list[Tree] a, list[Tree] b) = ; // only one element removed in front, then we are done From cf798ec8a0e5372e05b459d59c057a1105436f3a Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Sat, 11 Jan 2025 11:40:08 +0100 Subject: [PATCH 19/25] Finishes HiFiTreeDiff algorithm This finishes the complete algorithm for lists for the first time. The algorithm works in these steps: * Trim equal elements from the head and the tail of both lists * Detect common edits to lists with fast list patterns; this is an optional optimization * Find the latest common sublist and split both lists in three parts: two different prefixes, two equal middle parts and two different post fixes. Recurse on the prefixes and the postfixes and concatenate their edits lists. * Finally we end up with two empty lists or two lists without common elements; we collect the differences of each element position pairwise. Lists that became shorter get an additional edit to cut off the list, while lists that became shorter get one additional edit to add the new elements. The new elements inherit indentation from the pre-existing elements. For these changes additional tests still must be added later. --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index d882fbfb209..f9117ca720e 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -280,18 +280,21 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep + listDiff(endCover(span, postO), seps, postO, postR) ; } - else if (originals := replacements) { + else if (originals == [], replacements == []) { return edits; } - else if (size(originals) == size(replacements)) { + else { + // here we know there are no common elements anymore, only a common amount of different elements + common = min(size(originals), size(replacements)); + return edits - + [*treeDiff(a, b) | <- zip2(originals, replacements)]; + // first the minimal length pairwise replacements, essential for finding accidental commonalities + + [*treeDiff(a, b) | <- zip2(originals[..common], replacements[..common])]; + // then we either remove the tail that became shorter: + + [replace(cover(end(last), cover(originals[cover+1..])), "") | size(originals) > size(replacements), [*_, last] := originals[..common]] + // or we add new elements to the end, while inheriting indentation from the originals: + + [replace(end(last), learnIndentation(span, yield(replacements[common+1..]), yield(originals))) | size(originals) < size(replacements)] ; - } else { - // TODO: make cases for shortering or lenghtening a list but - // mixing the common prefix with `treeDiff` to find more nested sharing - return edits - + [replace(span, learnIndentation(span, yield(replacements), yield(originals)))]; } } From c84a9e2e62f93deaf98364e6fdba5f4d08a61424 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Sat, 11 Jan 2025 15:46:48 +0100 Subject: [PATCH 20/25] fixed omision in ResultFactory for ComposedFunctions --- src/org/rascalmpl/interpreter/result/ResultFactory.java | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/org/rascalmpl/interpreter/result/ResultFactory.java b/src/org/rascalmpl/interpreter/result/ResultFactory.java index 4607b274ce0..d30ecc94ef5 100644 --- a/src/org/rascalmpl/interpreter/result/ResultFactory.java +++ b/src/org/rascalmpl/interpreter/result/ResultFactory.java @@ -209,6 +209,9 @@ else if (value instanceof OverloadedFunction) { return (OverloadedFunction) value; } } + else if (value instanceof ComposedFunctionResult) { + return (Result) value; + } else { // otherwise this is an abstract ICalleableValue // for which no further operations are defined? From af081f597e811b78f178224cd96b417a9d85b910 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Sat, 11 Jan 2025 16:22:20 +0100 Subject: [PATCH 21/25] added more tests, fixed some issues --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 34 ++++++++----------- .../analysis/diff/edits/HiFiTreeDiffTests.rsc | 26 ++++++-------- 2 files changed, 26 insertions(+), 34 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index f9117ca720e..742326421ee 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -88,6 +88,7 @@ import List; import String; import Location; import IO; +import util::Math; @synopsis{Detects minimal differences between parse trees and makes them explicit as ((TextEdit)) instructions.} @description{ @@ -230,11 +231,11 @@ default list[TextEdit] treeDiff( list[TextEdit] treeDiff( Tree t:appl(Production p:regular(Symbol reg), list[Tree] aElems), appl(p, list[Tree] bElems)) - = listDiff(t@\loc, seps(reg), aElems, bElems) when bprintln("diving into

"); + = listDiff(t@\loc, seps(reg), aElems, bElems); // When the productions are equal, but the children may be different, we dig deeper for differences default list[TextEdit] treeDiff(appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) - = [*treeDiff(a, b) | <- zip2(argsA, argsB)] when bprintln("diving into

"); + = [*treeDiff(a, b) | <- zip2(argsA, argsB)]; @synopsis{decide how many separators we have} int seps(\iter-seps(_,list[Symbol] s)) = size(s); @@ -243,9 +244,6 @@ default int seps(Symbol _) = 0; @synsopis{List diff is like text diff on lines; complex and easy to make slow} list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] replacements) { - // println(" listDiff: - // ' - // ' "); edits = []; // this algorithm isolates commonalities between the two lists @@ -254,18 +252,13 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep // the edits are minimized. Note that we float on source location parameters // not only for the edit locations but also for sub-tree identity. - println("span before trim: , size originals "); = trimEqualElements(span, originals, replacements); - println("span after trim: , size originals "); + = commonSpecialCases(span, seps, originals, replacements); edits += specialEdits; - println("special edits:"); - iprintln(edits); equalSubList = largestEqualSubList(originals, replacements); - println("equal sublist:"); - println(yield(equalSubList)); - + // by using the (or "a") largest common sublist as a pivot to divide-and-conquer // to the left and right of it, we minimize the number of necessary // edit actions for the entire list. @@ -275,6 +268,7 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep // TODO: what about the separators? // we align the prefixes and the postfixes and // continue recursively. + return edits + listDiff(beginCover(span, preO), seps, preO, preR) + listDiff(endCover(span, postO), seps, postO, postR) @@ -285,15 +279,15 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep } else { // here we know there are no common elements anymore, only a common amount of different elements - common = min(size(originals), size(replacements)); - + common = min([size(originals), size(replacements)]); + return edits // first the minimal length pairwise replacements, essential for finding accidental commonalities - + [*treeDiff(a, b) | <- zip2(originals[..common], replacements[..common])]; + + [*treeDiff(a, b) | <- zip2(originals[..common], replacements[..common])] // then we either remove the tail that became shorter: - + [replace(cover(end(last), cover(originals[cover+1..])), "") | size(originals) > size(replacements), [*_, last] := originals[..common]] + + [replace(cover([after(last@\loc), cover(originals[common+1..])]), "") | size(originals) > size(replacements), [*_, last] := originals[..common]] // or we add new elements to the end, while inheriting indentation from the originals: - + [replace(end(last), learnIndentation(span, yield(replacements[common+1..]), yield(originals))) | size(originals) < size(replacements)] + + [replace(after(span), learnIndentation(span, yield(replacements[common..]), yield(originals))) | size(originals) < size(replacements)] ; } } @@ -313,8 +307,8 @@ uses particular properties of the relation between the original and the replacem * etc. } list[Tree] largestEqualSubList([*Tree sub], [*_, *sub, *_]) = sub; -list[Tree] largestEqualSubList([*_, *sub, *_], [*Tree sub]) = sub; -list[Tree] largestEqualSubList([*_, p, *sub, q, *_], [*_, !p, *Tree sub, !q, *_]) = sub; +list[Tree] largestEqualSubList([*_, *Tree sub, *_], [*sub]) = sub; +list[Tree] largestEqualSubList([*_, p, *Tree sub, q, *_], [*_, !p, *sub, !q, *_]) = sub; default list[Tree] largestEqualSubList(list[Tree] _orig, list[Tree] _repl) = []; @synopsis{trips equal elements from the front and the back of both lists, if any.} @@ -367,6 +361,8 @@ up to `until`. private loc fromUntil(loc from, loc until) = from.top(from.offset, until.offset - from.offset); private int end(loc src) = src.offset + src.length; +private loc after(loc src) = src(end(src), 0); + private loc endCover(loc span, []) = span(span.offset + span.length, 0); private loc endCover(loc span, [Tree x]) = x@\loc; private default loc endCover(loc span, list[Tree] l) = cover(l); diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index 80060cf59ae..07475915d7a 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -29,15 +29,13 @@ bool editsAreSyntacticallyCorrect(type[&T<:Tree] grammar, str example, (&T<:Tree orig = parse(grammar, example); transformed = transform(orig); edits = treeDiff(orig, transformed); - println("derived edits:"); - iprintln(edits); edited = executeTextEdits(example, edits); try { return transformed := parse(grammar, edited); } catch ParseError(loc l): { - println("Parse error in:"); + println(" caused a parse error in:"); println(edited); return false; } @@ -94,17 +92,6 @@ start[Program] addDeclarationToStart(start[Program] p) = visit(p) { 'end` }; -start[Program] addDeclarationToStartAndEnd(start[Program] p) = visit(p) { - case (Program) `begin declare <{IdType ","}* decls>; <{Statement ";"}* body> end` - => (Program) `begin - ' declare - ' x : natural, - ' <{IdType ","}* decls>, - ' y : natural; - ' <{Statement ";"}* body> - 'end` -}; - test bool nulTestWithId() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, identity); @@ -119,4 +106,13 @@ test bool addDeclarationToStartTest() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart); test bool addDeclarationToStartAndEndTest() - = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStartAndEnd); + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart o addDeclarationToEnd); + +test bool addDeclarationToEndAndSwapABTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToEnd o swapAB); + +test bool addDeclarationToStartAndSwapABTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart o swapAB); + +test bool addDeclarationToStartAndEndAndSwapABTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart o addDeclarationToEnd o swapAB); From 94c9adce1df6ebf27e46ed8d02f82494ce13c993 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Sat, 11 Jan 2025 16:34:19 +0100 Subject: [PATCH 22/25] added failing test --- .../rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc | 4 ++-- .../library/analysis/diff/edits/HiFiTreeDiffTests.rsc | 7 +++++++ 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 742326421ee..3697180ef0c 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -224,7 +224,7 @@ list[TextEdit] treeDiff( default list[TextEdit] treeDiff( t:appl(Production p:prod(_,_,_), list[Tree] _), r:appl(Production q:!p , list[Tree] _)) - = [replace(t@\loc, learnIndentation(t@\loc, "", ""))]; + = [replace(t@\loc, learnIndentation(t@\loc, "", ""))] when bprintln(t); // If list production are the same, then the element lists can still be of different length // and we switch to listDiff which has different heuristics than normal trees. @@ -235,7 +235,7 @@ list[TextEdit] treeDiff( // When the productions are equal, but the children may be different, we dig deeper for differences default list[TextEdit] treeDiff(appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) - = [*treeDiff(a, b) | <- zip2(argsA, argsB)]; + = [*treeDiff(a, b) | <- zip2(argsA, argsB)] when bprintln("into

on both sides"); @synopsis{decide how many separators we have} int seps(\iter-seps(_,list[Symbol] s)) = size(s); diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index 07475915d7a..0c7dd7c826b 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -72,6 +72,10 @@ start[Program] swapAB(start[Program] p) = visit(p) { case (Id) `b` => (Id) `a` }; +start[Program] naturalToString(start[Program] p) = visit(p) { + case (Type) `natural` => (Type) `string` +}; + start[Program] addDeclarationToEnd(start[Program] p) = visit(p) { case (Program) `begin declare <{IdType ","}* decls>; <{Statement ";"}* body> end` => (Program) `begin @@ -116,3 +120,6 @@ test bool addDeclarationToStartAndSwapABTest() test bool addDeclarationToStartAndEndAndSwapABTest() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart o addDeclarationToEnd o swapAB); + +test bool naturalToStringTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, naturalToString); From 9856f1cf99a0f06fc357ec96cd77ac47e8fcc0e0 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 13 Jan 2025 09:35:02 +0100 Subject: [PATCH 23/25] debugging --- .../library/analysis/diff/edits/HiFiTreeDiff.rsc | 11 +++++++---- .../analysis/diff/edits/HiFiTreeDiffTests.rsc | 12 +++++++++++- 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 3697180ef0c..4666bf585a2 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -185,14 +185,14 @@ list[TextEdit] treeDiff(Tree a, a) = []; // skip production labels of original rules when diffing list[TextEdit] treeDiff( - appl(prod(label(_, Symbol s), syms, attrs), list[Tree] args), + appl(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs), list[Tree] args), Tree u) = treeDiff(appl(prod(s, syms, attrs), args), u); // skip production labels of replacement rules when diffing list[TextEdit] treeDiff( Tree t, - appl(prod(label(_, Symbol s), syms, attrs), list[Tree] args)) + appl(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs), list[Tree] args)) = treeDiff(t, appl(prod(s, syms, attrs), args)); // matched layout trees generate empty diffs such that the original is maintained @@ -224,7 +224,10 @@ list[TextEdit] treeDiff( default list[TextEdit] treeDiff( t:appl(Production p:prod(_,_,_), list[Tree] _), r:appl(Production q:!p , list[Tree] _)) - = [replace(t@\loc, learnIndentation(t@\loc, "", ""))] when bprintln(t); + { + rprintln(t); + return [replace(t@\loc, learnIndentation(t@\loc, "", ""))]; + } // If list production are the same, then the element lists can still be of different length // and we switch to listDiff which has different heuristics than normal trees. @@ -234,7 +237,7 @@ list[TextEdit] treeDiff( = listDiff(t@\loc, seps(reg), aElems, bElems); // When the productions are equal, but the children may be different, we dig deeper for differences -default list[TextEdit] treeDiff(appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) +default list[TextEdit] treeDiff(t:appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) = [*treeDiff(a, b) | <- zip2(argsA, argsB)] when bprintln("into

on both sides"); @synopsis{decide how many separators we have} diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index 0c7dd7c826b..9f8a4e4ca3b 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -30,9 +30,19 @@ bool editsAreSyntacticallyCorrect(type[&T<:Tree] grammar, str example, (&T<:Tree transformed = transform(orig); edits = treeDiff(orig, transformed); edited = executeTextEdits(example, edits); + println(" leads to:"); + iprintln(edits); try { - return transformed := parse(grammar, edited); + if (transformed := parse(grammar, edited)) { + return true; + } + else { + println("The edited result is not the same:"); + println(edited); + println("As the transformed:"); + println(transformed); + } } catch ParseError(loc l): { println(" caused a parse error in:"); From cbfaa68fbcc0de8690bed9b6dbac44ff81c21c40 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 13 Jan 2025 09:50:03 +0100 Subject: [PATCH 24/25] more debugging --- .../library/analysis/diff/edits/HiFiTreeDiff.rsc | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 4666bf585a2..fff32bf5c0b 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -185,15 +185,15 @@ list[TextEdit] treeDiff(Tree a, a) = []; // skip production labels of original rules when diffing list[TextEdit] treeDiff( - appl(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs), list[Tree] args), + Tree t:appl(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs), list[Tree] args), Tree u) - = treeDiff(appl(prod(s, syms, attrs), args), u); + = treeDiff(appl(prod(s, syms, attrs), args)[@\loc=t@\loc?|bla:///|], u); // skip production labels of replacement rules when diffing list[TextEdit] treeDiff( Tree t, - appl(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs), list[Tree] args)) - = treeDiff(t, appl(prod(s, syms, attrs), args)); + Tree u:appl(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs), list[Tree] args)) + = treeDiff(t, appl(prod(s, syms, attrs), args)[@\loc=u@\loc?|bla:///|]); // matched layout trees generate empty diffs such that the original is maintained list[TextEdit] treeDiff( From ec718d1cb89d9f77f2fcd17299048ceae7ec4ac0 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 13 Jan 2025 09:51:19 +0100 Subject: [PATCH 25/25] one more test --- .../tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index 9f8a4e4ca3b..4812d45156d 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -133,3 +133,6 @@ test bool addDeclarationToStartAndEndAndSwapABTest() test bool naturalToStringTest() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, naturalToString); + +test bool naturalToStringAndAtoBTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, naturalToString o swapAB);