-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a wasm browser based playground #41
base: main
Are you sure you want to change the base?
Conversation
…o accept "'\''" literal.
… emit an error message.
…ype to store bitfields.
…w it only work on literals and only for ASCII
…st is failing and need review.
Thanks very much Domingo. There are lots of great changes here. I had a go with the playground and it's amazing. Great work! Do you mind if I share the playground link with a few people? Having seen the railroad diagram generator at https://www.bottlecaps.de/rr/ui I'm convinced that's a useful addition to It's much easier for me, and you're more likely to get a prompt response, if I can deal with these queries and changes in smaller chunks. If you email small queries directly, instead of commenting on the #11, I'll be able to respond faster. If you have smaller PRs then we can also get things merged or given feedback faster too. I've taken the changes that I could and merged them to main. I've rebased the remaining changes to the branch playground-2023-07-16 to hopefully make it convenient for you. But I'll also reply with my thoughts to the remaining commits here. Some of those changes I think are better placed in a separate repository with the playground itself rather than in lalr. Thanks, |
@@ -387,6 +387,7 @@ void Parser<Iterator, UserData, Char, Traits, Allocator>::parse( Iterator start, | |||
const ParserSymbol* symbol = reinterpret_cast<const ParserSymbol*>( lexer_.symbol() ); | |||
while ( parse(symbol, lexer_.lexeme(), lexer_.line(), lexer_.column()) ) | |||
{ | |||
if(lexer_.full()) break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this fixing a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is some grammars that enter a endless loop because the lexer doesn't advance.
I don't know exactly which ones trigger the bug but you can try it with this script:
#!/bin/sh
basep=playground
checkGrammar() {
echo Now testing $1 $2
/usr/bin/time ./grammar_test-clang -g $basep/$1 -i $basep/$2
}
checkGrammar json3.g test.json.txt
checkGrammar lua.g test.lua
checkGrammar carbon-lang.g prelude.carbon
checkGrammar postgresql-16.g test.sql
#checkGrammar cxx-parser.g test.cpp
checkGrammar lsl_ext.g test.lsl
checkGrammar bison.g carbon-lang.y
checkGrammar bison-bug.g carbon-lang.y
checkGrammar dparser.g test.dparser
checkGrammar parse_gen.g test.parse_gen
checkGrammar tameparser.g test.tameparser
checkGrammar javascript.g test.js
checkGrammar javascript-core.g test.js
checkGrammar cparser.g test.c
checkGrammar java11.g test.java
checkGrammar rust.g test.rs
checkGrammar go.g test.go
checkGrammar php-8.2.g test.php
checkGrammar gringo-ng.g test.clingo
checkGrammar ada-adayacc.g test.adb
Build script:
#!/bin/sh
umask 022
myflags="-O2 -g"
#myflags="-O2 -g -m32"
#myflags="-g"
clang-16-env clang++ \
-std=c++17 $myflags -Wall -Wextra -Wno-unused-function -pedantic \
-Isrc -DLALR_NO_THREADS \
src/lalr/ErrorPolicy.cpp \
src/lalr/Grammar.cpp \
src/lalr/GrammarCompiler.cpp \
src/lalr/GrammarGenerator.cpp \
src/lalr/GrammarParser.cpp \
src/lalr/GrammarState.cpp \
src/lalr/GrammarSymbol.cpp \
src/lalr/GrammarSymbolSet.cpp \
src/lalr/GrammarTransition.cpp \
src/lalr/RegexCompiler.cpp \
src/lalr/RegexGenerator.cpp \
src/lalr/RegexItem.cpp \
src/lalr/RegexNode.cpp \
src/lalr/RegexParser.cpp \
src/lalr/RegexState.cpp \
src/lalr/RegexSyntaxTree.cpp \
src/lalr/RegexToken.cpp \
src/lalr/lalr_examples/grammar_test.cpp \
-o grammar_test-clang
grammar_test.cpp:
#include <stdio.h>
#include <stdarg.h>
#include <lalr/GrammarCompiler.hpp>
#include <lalr/Parser.hpp>
#include <string.h>
#include <errno.h>
#include <sys/stat.h>
#include <time.h>
static int errors_ = 0;
typedef unsigned char mychar_t;
static void show_error( const char* format, ... )
{
++errors_;
va_list args;
va_start( args, format );
vfprintf( stderr, format, args );
va_end( args );
}
int read_file(const char *fname, std::vector<mychar_t> &content)
{
struct stat stat;
int result = ::stat( fname, &stat );
if ( result != 0 )
{
show_error( "Stat failed on '%s' - result=%d\n", fname, result );
return EXIT_FAILURE;
}
FILE* file = fopen( fname, "rb" );
if ( !file )
{
show_error( "Opening '%s' to read failed - errno=%d\n", fname, errno );
return EXIT_FAILURE;
}
int size = stat.st_size;
content.resize( size+1 );
int read = int( fread(&content[0], sizeof(mychar_t), size, file) );
fclose( file );
file = nullptr;
if ( read != size )
{
show_error( "Reading grammar from '%s' failed - read=%d\n", fname, int(read) );
return EXIT_FAILURE;
}
content[size] = '\0';
return EXIT_SUCCESS;
}
static clock_t start_time;
clock_t myShowDiffTime(const char *title)
{
clock_t now = clock();
clock_t diff = now - start_time;
int msec = diff * 1000 / CLOCKS_PER_SEC;
printf("%s: Time taken %d seconds %d milliseconds\n", title, msec/1000, msec%1000);
start_time = now;
return now;
}
struct C_MultLineCommentLexer
{
static lalr::PositionIterator<const mychar_t*> string_lexer( const lalr::PositionIterator<const mychar_t*>& begin,
const lalr::PositionIterator<const mychar_t*>& end,
std::basic_string<mychar_t>* lexeme,
const void** /*symbol*/ )
{
LALR_ASSERT( lexeme );
lexeme->clear();
//printf("C_MultLineCommentLexer : %s\n", lexeme->c_str());
bool done = false;
lalr::PositionIterator<const mychar_t*> i = begin;
while ( i != end && !done)
{
switch( *i )
{
case '*':
++i;
if(i != end && *i == '/') done = true;
continue;
break;
}
++i;
}
if ( i != end )
{
LALR_ASSERT( *i == '/' );
++i;
}
return i;
}
};
struct AstUserDataDbg {
int index;
int stack_index;
static int next_index;;
static int total;
AstUserDataDbg():index(total++), stack_index(next_index++) {};
};
int AstUserDataDbg::next_index = 0;
int AstUserDataDbg::total = 0;
static bool astMakerDbg( AstUserDataDbg& result, const AstUserDataDbg* start, const lalr::ParserNode<mychar_t>* nodes, size_t length )
{
// //printf("astMaker: %s\n", nodes[0].lexeme().c_str());
// const char *lexstr = (length > 0 ? (const char *)nodes[0].lexeme().c_str() : "::lnull");
// const char *idstr = (length > 0 ? nodes[0].symbol()->identifier : "::inull");
// int line = (length > 0 ? nodes[0].line() : 0);
// int column = (length > 0 ? nodes[0].column() : 0);
// //const char *stateLabel = (length > 0 ? nodes[0].state()->label : "::inull");
// printf("astMaker: %p\t%zd:%d:%d\t%p\t%zd\t->\t%s : %s :%d:%d\n", start, length,
// length ? start->index : -1, length ? start->stack_index : -1,
// nodes, length, idstr, lexstr, line, column);
printf("----\n");
for(size_t i=0; i< length; ++i)
printf("%zd:%d\t%p\t%d:%d\t%p <:> %s <:> %s <:> %s <:> %d:%d\n", i, nodes[i].symbol()->type,
start+i, start[i].index, start[i].stack_index, nodes+i,
nodes[i].symbol()->identifier, nodes[i].symbol()->lexeme,
nodes[i].lexeme().c_str(), nodes[i].line(), nodes[i].column());
return true;
}
struct ParseTreeUserData {
std::vector<ParseTreeUserData> children;
const lalr::ParserSymbol *symbol;
std::basic_string<mychar_t> lexeme; ///< The lexeme at this node (empty if this node's symbol is non-terminal).
ParseTreeUserData():children(0),symbol(nullptr) {};
};
static bool parsetreeMaker( ParseTreeUserData& result, const ParseTreeUserData* start, const lalr::ParserNode<mychar_t>* nodes, size_t length )
{
if(length == 0) return false;
result.symbol = nodes[length-1].state()->transitions->reduced_symbol;
for(size_t i_node = 0; i_node < length; ++i_node)
{
const lalr::ParserNode<mychar_t>& the_node = nodes[i_node];
switch(the_node.symbol()->type)
{
case lalr::SymbolType::SYMBOL_TERMINAL:
{
ParseTreeUserData& udt = result.children.emplace_back();
udt.symbol = the_node.symbol();
udt.lexeme = the_node.lexeme();
//printf("TERMINAL: %s : %s\n", udt.symbol->identifier, udt.lexeme.c_str());
}
break;
case lalr::SymbolType::SYMBOL_NON_TERMINAL:
{
if(the_node.symbol() == result.symbol)
{
const ParseTreeUserData& startx = start[i_node];
for (std::vector<ParseTreeUserData>::const_iterator child = startx.children.begin(); child != startx.children.end(); ++child)
{
result.children.push_back( std::move(*child) );
}
}
else
{
ParseTreeUserData& udt = result.children.emplace_back();
udt.symbol = the_node.symbol();
if(udt.symbol == start[i_node].symbol)
{
udt.children = start[i_node].children;
}
else
udt.children.push_back(std::move(start[i_node]));
}
//printf("NON_TERMINAL: %s\n", result.symbol->identifier);
}
break;
default:
//LALR_ASSERT( ?? );
printf("Unexpected symbol %p\n", the_node.symbol());
}
}
return true;
}
static void indent( int level )
{
for ( int i = 0; i < level; ++i )
{
printf( " |" );
}
}
static void print_parsetree( const ParseTreeUserData& ast, int level )
{
if(ast.symbol)
{
indent( level );
switch(ast.symbol->type)
{
case lalr::SymbolType::SYMBOL_TERMINAL:
if(ast.lexeme.size())
{
//indent( level -1);
printf("%s -> %s\n", ast.symbol->identifier, ast.lexeme.c_str());
}
break;
case lalr::SymbolType::SYMBOL_NON_TERMINAL:
//indent( level );
printf("%s\n", ast.symbol->lexeme);
break;
}
}
for (std::vector<ParseTreeUserData>::const_iterator child = ast.children.begin(); child != ast.children.end(); ++child)
{
print_parsetree( *child, ast.symbol ? (level + 1) : level );
}
}
#include <locale.h>
int main(int argc, char *argv[])
{
const char *grammar_fn = nullptr;
const char *input_fn = nullptr;
bool dumpLexer = false;
start_time = clock();
setlocale(LC_NUMERIC, "");
std::vector<char> grammar_txt;
std::vector<mychar_t> input_txt;
if ( argc < 2 )
{
printf( "%s -g|--grammar grammar_fname -i|--input input_fname -d|--dumpLex\n", argv[0] );
printf( "\n" );
return EXIT_FAILURE;
}
int argi = 1;
while ( argi < argc )
{
if ( strcmp(argv[argi], "-g") == 0 || strcmp(argv[argi], "--grammar") == 0 )
{
grammar_fn = argv[argi + 1];
argi += 2;
}
else if ( strcmp(argv[argi], "-i") == 0 || strcmp(argv[argi], "--input") == 0 )
{
input_fn = argv[argi + 1];
argi += 2;
}
else if ( strcmp(argv[argi], "-d") == 0 || strcmp(argv[argi], "--dumpLex") == 0 )
{
dumpLexer = true;
argi += 1;
}
}
if(grammar_fn != nullptr)
{
int rc = read_file(grammar_fn, (std::vector<mychar_t>&)grammar_txt);
if(rc != EXIT_SUCCESS) return rc;
size_t grammar_txt_size = grammar_txt.size()-1; //-1 to account for the '\0' terminator
myShowDiffTime("read grammar");
printf("Grammar size = %d\n", (int)grammar_txt_size);
lalr::GrammarCompiler compiler;
lalr::ErrorPolicy error_policy;
int errors = compiler.compile( &grammar_txt[0], &grammar_txt[0] + grammar_txt_size, &error_policy );
myShowDiffTime("compile grammar");
if(errors != 0)
{
printf("Error count = %d\n", errors);
return EXIT_FAILURE;
}
compiler.showStats();
if(input_fn != nullptr)
{
rc = read_file(input_fn, input_txt);
if(rc != EXIT_SUCCESS) return rc;
size_t input_txt_size = input_txt.size()-1; //-1 to account for the '\0' terminator
myShowDiffTime("read input");
printf("Input size = %d\n", (int)input_txt_size);
lalr::ErrorPolicy error_policy_input;
lalr::Parser<const mychar_t*, ParseTreeUserData> parser( compiler.parser_state_machine(), &error_policy_input );
parser.set_default_action_handler(parsetreeMaker);
//lalr::Parser<const mychar_t*, AstUserDataDbg> parser( compiler.parser_state_machine(), &error_policy_input );
//parser.set_default_action_handler(astMakerDbg);
//lalr::Parser<const mychar_t*, int> parser( compiler.parser_state_machine(), &error_policy_input );
parser.lexer_action_handlers()
( "C_MultilineComment", &C_MultLineCommentLexer::string_lexer )
;
if(dumpLexer) parser.dumpLex( &input_txt[0], &input_txt[0] + input_txt_size );
else parser.parse( &input_txt[0], &input_txt[0] + input_txt_size );
myShowDiffTime("parse input");
printf( "accepted = %d, full = %d\n", parser.accepted(), parser.full());
if(parser.accepted() && parser.full())
{
print_parsetree( parser.user_data(), 0 );
}
}
}
return EXIT_SUCCESS;
}
Of course I don't mind share the playground link, ideally it'll be moved to github pages. I'm glad that we can join efforts to build an amazing tool to facilitate write/debug/develop grammars. Thank you again for your great work ! |
Actually I can't comment on individual commits from the PR so I'll just do it here: Fix to detect identifiers referenced in rules but not defined: Make possible to accept associativity/precedence syntax like bison/byacc: Check if '%whitespace' directive is present in the grammar and if not…: Add code to allow generate an EBNF for railroad diagram generation: Add method to dump the input from the lexer: Add a method to show grammar compilation stats.: Add a naive implementation of "%case_insensitive" directive, right no…: Make trivial methods inline.: |
Generally I think the playground directory should be a separate repository that uses a submodule or some dependency mechanism to bring in lalr. Then all of the output specific to the playground can go there too. I like that because that keeps lalr as a smaller, simpler C++ library. I think that also frees you up to not depend on me for PRs and feedback in a lot of cases. Thanks heaps, |
…ring like 'error' inside 'errors'
…ay I can generate a better parse tree.
…the line end is multi character like of '\r\n'
This is the first version of a wasm browser based playground to
lalr
.