Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recovery/recover all productions #2108

Draft
wants to merge 30 commits into
base: feat/error-recovery
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
d856404
Improved line tracking in recovery tests
PieterOlivier Dec 1, 2024
27c7dba
Simplified to string so result is actually readable
PieterOlivier Dec 1, 2024
c87c199
Recover all productions that match the prefix
PieterOlivier Dec 1, 2024
415d536
Added python script to summarize test stats
PieterOlivier Dec 1, 2024
9d09eab
Added tests for prefix-shared production recovery and related perform…
PieterOlivier Dec 1, 2024
c1e93c7
Added visualization of cycle nodes
PieterOlivier Dec 7, 2024
1ab40fe
Added parses tree equality checker and visualizer
PieterOlivier Dec 7, 2024
99b8411
Implemented parse tree visualization
PieterOlivier Dec 7, 2024
46eae75
Implementend memoization of skipped nodes
PieterOlivier Dec 7, 2024
b92410b
Removed stats summary in R
PieterOlivier Dec 7, 2024
0ff2209
Focused SlowExceptionBug test
PieterOlivier Dec 7, 2024
aa93410
Implemented support to disable memoization of parse nodes
PieterOlivier Dec 7, 2024
248cf0b
Added cycle tests
PieterOlivier Dec 7, 2024
3b8b787
Added @Override annotations
PieterOlivier Dec 8, 2024
8446fcf
Simplified test and fixed cycleMark checking in ListContainerNodeFlat…
PieterOlivier Dec 9, 2024
77191f8
Merge branch 'feat/error-recovery' into recovery/recover-all-productions
PieterOlivier Dec 9, 2024
0d25755
Added support for checking memoization correctness during error recov…
PieterOlivier Dec 11, 2024
cc1384e
Added test for missing memoization in cycles
PieterOlivier Dec 11, 2024
1426791
Implemented verification of memoizatoin approaches during parse graph…
PieterOlivier Dec 13, 2024
f845509
Removed debug prints when comparing two parse trees
PieterOlivier Dec 13, 2024
18d89f4
Added simple node count function
PieterOlivier Dec 13, 2024
53074d1
Implemented memoization configuartion using query params
PieterOlivier Dec 13, 2024
ec5a76a
Added debug configuration flags, will be removed later.
PieterOlivier Dec 13, 2024
ecc1412
Fixed test to succeed independent of amb alternative order
PieterOlivier Dec 14, 2024
c8ae256
Added error recovery test for non-ascii grammer/input
PieterOlivier Jan 6, 2025
5e881ce
Renamed conditional memoization booleans
PieterOlivier Jan 6, 2025
5856af6
Implemented visualization of AbstractNode trees
PieterOlivier Jan 8, 2025
9dbe936
Implemented support for parse result visualization
PieterOlivier Jan 8, 2025
6ea7144
Fixed comment on memoization disabling
PieterOlivier Jan 8, 2025
41d05a5
Switched to lexical rules to make result graph simpler
PieterOlivier Jan 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
module lang::rascal::tests::concrete::recovery::CycleTest

import ParseTree;
import vis::Text;
import IO;
import util::ErrorRecovery;
import Node;

lexical S = T | U;

lexical T = X T? | "$";

lexical U = X T? | "$";

lexical X = "b"? | "c";

void testCycles() {
str input = "bc$";
//str input = "bcbcbcccbb$";
Tree t1 = parse(#S, input, |unknown:///|, allowAmbiguity=true);
Tree t2 = parse(#S, input, |unknown:///?parse-memoization=none&visualize-parse-result|, allowAmbiguity=true);
println(prettyTree(t1));
println(prettyTree(t2));

if (treeEquality(t1, t2)) {
println("equal");
} else {
println("NOT EQUAL");
}

if ({appl1Level1, *_ } := getChildren(t1)[0] && {appl2Level1, *_ } := getChildren(t2)[0]) {
println("appl1Level1:\n<prettyTree(appl1Level1)>");
println("appl2Level1:\n<prettyTree(appl2Level1)>");

if ([amb({appl1Level2,*_})] := getChildren(appl1Level1)[1] && [amb({appl2Level2,*_})] := getChildren(appl2Level1)[1]) {
//println("Child 1:");
//iprintln(appl1Level2);
//println("Child 2:");
//iprintln(appl2Level2);

println("child 1 tree:\n<prettyTree(appl1Level2)>");
println("child 2 tree:\n<prettyTree(appl2Level2)>");

println("yield1: <appl1Level2>");
println("yield2: <appl2Level2>");
}
}

//if (set[Tree] amb1Level1 := getChildren(t1)[0]) {
// println("children: <typeOf(childLevel1)>");
//}


}
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
/**
* Copyright (c) 2024, NWO-I Centrum Wiskunde & Informatica (CWI)
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
*
* 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
*
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**/

module lang::rascal::tests::concrete::recovery::NonAsciiTest

import lang::rascal::tests::concrete::recovery::RecoveryTestSupport;


syntax S = T;

syntax T = A B C;

syntax A = "ª";
syntax B = "ß" "ß";
syntax C = "©";

test bool nonAsciiOk() = checkRecovery(#S, "ªßß©", []);

test bool nonAsciiError() = checkRecovery(#S, "ªßxß©", ["xß"]);

Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
module lang::rascal::tests::concrete::recovery::PrefixSharingTest

syntax Stat = Expr ";";

syntax Expr = N "+" N | N "-" N;

syntax N = [0-9];

import ParseTree;
import util::ErrorRecovery;
import lang::rascal::tests::concrete::recovery::RecoveryTestSupport;
import vis::Text;
import IO;

Tree parseStat(str input, bool visualize=false)
= parser(#Stat, allowRecovery=true, allowAmbiguity=true)(input, |unknown:///?visualize=<"<visualize>">|);

test bool exprOk() = checkRecovery(#Stat, "1+2+3;", []);

test bool exprUnknownTerminator() = checkRecovery(#Stat, "1+2:", [":"], visualize=false);

test bool exprUnknownOperator() = checkRecovery(#Stat, "1*2;", ["*2"], visualize=false);

test bool exprPrefixSharing() {
Tree t = parseStat("1*2;", visualize=false);
println(prettyTree(t));
return true;
}


Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ import analysis::statistics::Descriptive;
import util::Math;
import Set;
import List;
import Exception;
import vis::Text;

import lang::rascal::grammar::definition::Modules;

Expand Down Expand Up @@ -56,14 +58,42 @@ public data TestStats = testStats(
FrequencyTable errorCounts=(),
FrequencyTable errorSizes=());

// When this is turned on, memoization is checked for correctness. This makes the tests take much longer.
// Note that this test requires the posibility to disable memoization.
// In the current test version this can be done by including a "parse-memoization=none" query parameter.
// We expect this feature to be removed eventually and then the `verifyMemoizationCorrectness` flag becomes useless.
bool verifyMemoizationCorrectness = false;

// When verifying memoization we need to be able to timeout the conversin from parse graph to parse forest.
// We can do this by using a parse filter.
int timeoutLimit = 0;

void setTimeout(int limit) {
timeoutLimit = limit;
}

void clearTimeout() {
timeoutLimit = 0;
}

Tree timeoutFilter(Tree tree) {
if (timeoutLimit != 0 && realTime() > timeoutLimit) {
throw Timeout();
}
return tree;
}

private TestMeasurement testRecovery(&T (value input, loc origin) standardParser, &T (value input, loc origin) recoveryParser, str input, loc source, loc statFile, int referenceParseTime) {
int startTime = 0;
int duration = 0;
int disambDuration = -1;
int errorCount = 0;
int errorSize=0;
str result = "?";
str verificationResult = "-";

TestMeasurement measurement = successfulParse();
clearTimeout();
try {
startTime = realTime();
Tree t = standardParser(input, source);
Expand All @@ -76,26 +106,87 @@ private TestMeasurement testRecovery(&T (value input, loc origin) standardParser
Tree t = recoveryParser(input, source);
int parseEndTime = realTime();
duration = parseEndTime - startTime;

if (verifyMemoizationCorrectness) {
bool noMemoTimeout = false;
bool linkCorrect = true;
bool nodeMemoTimeout = false;
bool nodeCorrect = true;
//bool nodeLinkEqual = true;

noMemoSource = source;
noMemoSource.query = noMemoSource.query + "&parse-memoization=none";
setTimeout(realTime() + 2000);
Tree noMemoTree = char(0);
try {
Tree noMemo = recoveryParser(input, noMemoSource);
noMemoTree = noMemo;
linkMemoCorrect = treeEquality(t, noMemoTree);
} catch Timeout(): {
print("#");
noMemoTimeout = true;
}
clearTimeout();

nodeMemoSource = source;
nodeMemoSource.query = nodeMemoSource.query + "&parse-memoization=node";
setTimeout(realTime() + 2000);
try {
Tree nodeMemoTree = recoveryParser(input, nodeMemoSource);
//nodeLinkEqual = treeEquality(t, nodeMemoTree); // Too expensive
if (!noMemoTimeout) {
nodeCorrect = treeEquality(noMemoTree, nodeMemoTree);
}
} catch Timeout(): {
print("@");
nodeMemoTimeout = true;
}
clearTimeout();

if (!linkCorrect) {
if (nodeMemoTimeout) {
println("\nlink memoization incorrect, node memoization timeout for <source>");
verificationResult = "linkFailed:nodeTimeout";
} else if (nodeCorrect) {
verificationResult = "linkFailed:nodeSucceeded";
println("\nonly link memoization incorrect for <source>");
} else {
verificationResult = "linkFailed:nodeFailed";
println("\nboth node memoization and link memoization incorrect for <source>");
}
} else if (!nodeCorrect) {
verificationResult = "linkSucceeded:nodeFailed";
println("\nonly node memoization incorrect for <source>");
} else if (noMemoTimeout) {
verificationResult="noMemoTimeout";
} else {
verificationResult = "linkSucceeded:nodeSucceeded";
}
}

list[Tree] errors = findBestErrors(t);
errorCount = size(errors);
disambDuration = realTime() - parseEndTime;
result = "recovery";
if ("<t>" != input) {
throw "Yield of recovered tree does not match the original input";
}
if (errors == []) {
measurement = successfulDisambiguation(source=source, duration=duration);
} else {
errorSize = (0 | it + size(getErrorText(err)) | err <- errors);
measurement = recovered(source=source, duration=duration, errorCount=errorCount, errorSize=errorSize);
}
} catch ParseError(_): {
result = "error";
result = "recovery";
} catch ParseError(_): {
result = "error";
duration = realTime() - startTime;
measurement = parseError(source=source, duration=duration);
}
}

if (statFile != |unknown:///|) {
int ratio = percent(duration, referenceParseTime);
appendToFile(statFile, "<source>,<size(input)>,<result>,<duration>,<ratio>,<disambDuration>,<errorCount>,<errorSize>\n");
appendToFile(statFile, "<source>,<size(input)>,<result>,<duration>,<ratio>,<disambDuration>,<errorCount>,<errorSize>,<verificationResult>\n");
}

return measurement;
Expand Down Expand Up @@ -237,11 +328,15 @@ FileStats testSingleCharDeletions(&T (value input, loc origin) standardParser, &

FileStats testDeleteUntilEol(&T (value input, loc origin) standardParser, &T (value input, loc origin) recoveryParser, loc source, str input, int referenceParseTime, int recoverySuccessLimit, int begin=0, int end=-1, loc statFile=|unknown:///|) {
FileStats stats = fileStats();
int lineStart = begin;
int lineStart = 0;
list[int] lineEndings = findAll(input, "\n");

int line = 1;
int line = 0;
for (int lineEnd <- lineEndings) {
line = line+1;
if (lineEnd < begin) {
continue;
}
lineLength = lineEnd - lineStart;
for (int pos <- [lineStart..lineEnd]) {
// Check boundaries (only used for quick bug testing)
Expand All @@ -258,7 +353,6 @@ FileStats testDeleteUntilEol(&T (value input, loc origin) standardParser, &T (va
}
lineStart = lineEnd+1;
println();
line = line+1;
}

return stats;
Expand Down Expand Up @@ -391,7 +485,8 @@ FileStats testErrorRecovery(loc syntaxFile, str topSort, loc testInput, str inpu
if (sym:\start(\sort(topSort)) <- gram.starts) {
type[value] begin = type(sym, gram.rules);
standardParser = parser(begin, allowAmbiguity=true, allowRecovery=false);
recoveryParser = parser(begin, allowAmbiguity=true, allowRecovery=true);
set[Tree(Tree)] filters = verifyMemoizationCorrectness ? {timeoutFilter} : {};
recoveryParser = parser(begin, allowAmbiguity=true, allowRecovery=true, filters=filters);

// Initialization run
standardParser(input, testInput);
Expand Down Expand Up @@ -434,7 +529,7 @@ TestStats batchRecoveryTest(loc syntaxFile, str topSort, loc dir, str ext, int m
fromFile = from;

if (statFile != |unknown:///|) {
writeFile(statFile, "source,size,result,duration,ratio,disambiguationDuration,errorCount,errorSize\n");
writeFile(statFile, "source,size,result,duration,ratio,disambiguationDuration,errorCount,errorSize,memoVerification\n");
}

return runBatchRecoveryTest(syntaxFile, topSort, dir, ext, maxFiles, minFileSize, maxFileSize, statFile, testStats());
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
module lang::rascal::tests::concrete::recovery::bugs::CycleMemoizationBug

import lang::rascal::tests::concrete::recovery::RecoveryTestSupport;
import lang::rascal::\syntax::Rascal;
import ParseTree;
import IO;
import String;
import vis::Text;
import util::ErrorRecovery;
import util::Benchmark;

/**
* Originally memoization inside cycles was turned off. This caused this test to take a long time and then crash with an out-of-memory error.
* With the new link memoization this test should run fine.
*/
void testCycleMemoizationFailure() {
recoveryParser = parser(#start[Module], allowRecovery=true, allowAmbiguity=true);
loc source = |std:///lang/aterm/syntax/ATerm.rsc|;

input = readFile(source);
modifiedInput = substring(input, 0, 369) + substring(input, 399);

begin = realTime();
Tree t1 = recoveryParser(modifiedInput, source);
duration = realTime() - begin;
println("with memoization duration: <duration>");

assert "<t1>" == modifiedInput;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
module lang::rascal::tests::concrete::recovery::bugs::PreludeOutOfMemoryBug

import lang::rascal::tests::concrete::recovery::RecoveryTestSupport;
import lang::rascal::\syntax::Rascal;
import ParseTree;
import IO;

void testBug() {
standardParser = parser(#start[Module], allowRecovery=false, allowAmbiguity=true);
recoveryParser = parser(#start[Module], allowRecovery=true, allowAmbiguity=true);
loc source = |std:///Prelude.rsc|;
input = readFile(source);
testSingleCharDeletions(standardParser, recoveryParser, source, input, 200, 150, begin=1312, end=1313);
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
module lang::rascal::tests::concrete::recovery::bugs::SlowExceptionBug

import lang::rascal::tests::concrete::recovery::RecoveryTestSupport;
import lang::rascal::\syntax::Rascal;
import ParseTree;
import IO;
import util::Benchmark;
import String;

void testBug() {
standardParser = parser(#start[Module], allowRecovery=false, allowAmbiguity=true);
recoveryParser = parser(#start[Module], allowRecovery=true, allowAmbiguity=true);
loc source = |std:///Exception.rsc|;
input = readFile(source);
int begin = realTime();
str modifiedInput = substring(input, 0, 1744) + substring(input, 1745);
Tree t = recoveryParser(modifiedInput, source);
//testSingleCharDeletions(standardParser, recoveryParser, source, input, 200, 150, begin=1744, end=1744);
int duration = realTime() - begin;
println("duration: <duration> ms.");
}
Loading