-
Notifications
You must be signed in to change notification settings - Fork 0
/
ll1parsingalgorithm.html
285 lines (285 loc) · 10.9 KB
/
ll1parsingalgorithm.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<title>RIPAL: Responsive and Intuitive Parsing for the Analysis of Language</title>
<link href="styles/styles.css" rel="stylesheet" type="text/css" media="all"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
</head>
<body>
<h1>RIPAL: Responsive and Intuitive Parsing for the Analysis of Language</h1>
<h2>Pages</h2>
<nav aria-label="Pages">
<ul>
<li><a href="index.html">Home</a></li>
<li><a href="theory.html">Theory</a></li>
<li><a href="contact.html">Contact</a></li>
</ul>
</nav>
<h2>The LL(1) parsing algorithm</h2>
<div class="section">
<h3>Background</h3>
<p>We've seen what an LL(1) parse table looks like. Now we can solidify a parsing algorithm using an LL(1) parse table.</p>
</div>
<div class="section">
<h3>Caveat</h3>
<p>For now, we will ignore CFGs with ε productions to keep the setup of this algorithm easier to explain. CFGs with ε productions will be covered later.</p>
</div>
<div class="section">
<h3>Setup</h3>
<p>To apply the LL(1) parsing algorithm, we will need the following data structures set up:</p>
<p><em>The LL(1) parse table</em></p>
<p>As shown in the previous section, we will need a parse table that maps combinations of nonterminal symbols and terminal symbols to the application of a production rule.</p>
<p><em>The input queue</em></p>
<p>We place our input string into an input queue, with one input symbol in each slot.</p>
<p>The queue looks like:</p>
<table class="stackorqueue">
<tr>
<td>(head of queue)</td>
<td>σ<sub>1</sub></td>
<td>σ<sub>2</sub></td>
<td>...</td>
<td>σ<sub>2</sub></td>
</tr>
</table>
<p>σ<sub>i</sub> ∈ Σ</p>
<div class="subsection example">
<p><em>Example</em></p>
<p>Σ = {e, s, t}</p>
<p>s = test</p>
<p>Our queue looks like:</p>
<table class="stackorqueue">
<tr>
<td>(head of queue)</td>
<td>t</td>
<td>e</td>
<td>s</td>
<td>t</td>
</tr>
</table>
</div>
<p>The operations we need to support on the input queue are:</p>
<ol>
<li>Initialize a queue with a fixed set of symbols</li>
<li>Dequeue - remove and return the element at the head of the queue</li>
<li>Check if empty - determine whether the queue is empty</li>
</ol>
<p><em>The parse stack</em></p>
<p>We use a parse stack to store the symbols we are still in the process of handling.</p>
<p>The stack looks like:</p>
<table class="stackorqueue">
<tr>
<td>(top of stack)</td>
<td>sym<sub>1</sub></td>
<td>sym<sub>2</sub></td>
<td>...</td>
<td>sym<sub>n</sub></td>
</tr>
</table>
<p>sym<sub>i</sub> ∈ T ∪ N</p>
<p>The operations we need to support on the parse stack are:</p>
<ol>
<li>Initialize a stack with a fixed set of symbols</li>
<li>Push - add an element to the top of the stack</li>
<li>Pop - remove and return the element at the top of the stack</li>
<li>Check if empty - determine whether the stack is empty</li>
</ol>
</div>
<div class="section">
<h3>Initialization</h3>
<p>Initialize the input queue with the symbols from the input string.</p>
<p>Initialize the parse stack with the grammar's start symbol.</p>
</div>
<div class="section">
<h3>Input processing</h3>
<p>At each step of our parsing process, we pop and observe the top symbol from the parse stack.</p>
<p>If the top symbol is a terminal:</p>
<ol>
<li>Dequeue the first terminal in the input queue (t<sub>next</sub>)
<ol>
<li>If it matches the symbol from the top of the stack, continue parsing, discarding both symbols</li>
<li>If it does not match the symbol from the top of the stack, reject</li>
</ol>
</li>
</ol>
<p>If the top symbol is a nonterminal (n<sub>next</sub>):</p>
<ol>
<li>Look up the parse table entry (a<sub>next</sub>) corresponding to the top symbol from the stack (n<sub>next</sub>) and the first terminal from the input queue (t<sub>next</sub>)
<ol>
<li>If a<sub>next</sub> = ø, reject</li>
<li>Otherwise, look up our next production p<sub>next</sub> = n<sub>next</sub> → sym<sub>next<sub>1</sub></sub> sym<sub>next<sub>2</sub></sub> ... sym<sub>next<sub>n</sub></sub></li>
<li>Add sym<sub>next<sub>1</sub></sub>, sym<sub>next<sub>2</sub></sub>, ..., sym<sub>next<sub>n</sub></sub> to the top of the parse stack in reverse order such that sym<sub>next<sub>1</sub></sub> is at the top of the stack</li>
</ol>
</li>
</ol>
</div>
<div class="section">
<h3>Handing the end of input</h3>
<p>Outside of an input symbol or table lookup mismatch, the following conditions signal and end of processing:</p>
<ol>
<li>If the input queue and parse stack are both empty at the end of a processing step, accept</li>
<li>If the input queue is empty and the parse stack is not empty at the end of a processing step, reject</li>
<li>If the input queue is not empty and the parse stack is empty at the end of the processing step, reject</li>
</ol>
</div>
<div class="section">
<h3>Examples</h3>
<p>
S → ab
</p>
<table class="parsetable">
<tr>
<td></td>
<td>a</td>
<td>b</td>
</tr>
<tr>
<td>S</td>
<td>1</td>
<td></td>
</tr>
</table>
<div class="subsection example">
<p><em>Example</em></p>
<p>Parsing input string ab:</p>
<table class="parsetrace">
<tr>
<th>Input queue</th>
<th>Parse stack</th>
<th>Action</th>
</tr>
<tr>
<td>ab</td>
<td>S</td>
<td>Apply rule 1 since S and a correspond to this production in our table</td>
</tr>
<tr>
<td>ab</td>
<td>ab</td>
<td>Remove a from head of input queue and parse stack and continue parsing since terminal symbol matches</td>
</tr>
<tr>
<td>b</td>
<td>b</td>
<td>Remove b from head of input queue and parse stack and continue parsing since terminal symbol matches</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Accept, since input queue and parse stack are both empty</td>
</tr>
</table>
</div>
<div class="subsection example">
<p><em>Example</em></p>
<p>Parsing input string b:</p>
<table class="parsetrace">
<tr>
<th>Input queue</th>
<th>Parse stack</th>
<th>Action</th>
</tr>
<tr>
<td>b</td>
<td>S</td>
<td>Reject, since S and b correspond to ø in our table</td>
</tr>
</table>
</div>
<div class="subsection example">
<p><em>Example</em></p>
<p>Parsing input string aa:</p>
<table class="parsetrace">
<tr>
<th>Input queue</th>
<th>Parse stack</th>
<th>Action</th>
</tr>
<tr>
<td>aa</td>
<td>S</td>
<td>Apply rule 1 since S and a correspond to this production in our table</td>
</tr>
<tr>
<td>aa</td>
<td>ab</td>
<td>Remove a from head of input queue and parse stack and continue parsing since terminal symbol matches</td>
</tr>
<tr>
<td>a</td>
<td>b</td>
<td>Reject, since the symbol from the head of the input queue (a) doesn't match the symbol from the top of the parse stack (b)</td>
</tr>
</table>
</div>
<div class="subsection example">
<p><em>Example</em></p>
<p>Parsing input string a:</p>
<table class="parsetrace">
<tr>
<th>Input queue</th>
<th>Parse stack</th>
<th>Action</th>
</tr>
<tr>
<td>a</td>
<td>S</td>
<td>Apply rule 1 since S and a correspond to this production in our table</td>
</tr>
<tr>
<td>a</td>
<td>ab</td>
<td>Remove a from head of input queue and parse stack and continue parsing since terminal symbol matches</td>
</tr>
<tr>
<td></td>
<td>b</td>
<td>Reject, since input queue is empty but parse stack is not</td>
</tr>
</table>
</div>
<div class="subsection example">
<p><em>Example</em></p>
<p>Parsing string abc:</p>
<table class="parsetrace">
<tr>
<th>Input queue</th>
<th>Parse stack</th>
<th>Note</th>
</tr>
<tr>
<td>abc</td>
<td>S</td>
<td>Apply rule 1 since S and a correspond to this production in our table</td>
</tr>
<tr>
<td>abc</td>
<td>ab</td>
<td>Remove a from head of input queue and parse stack and continue parsing since terminal symbol matches</td>
</tr>
<tr>
<td>bc</td>
<td>b</td>
<td>Remove b from head of input queue and parse stack and continue parsing since terminal symbol matches</td>
</tr>
<tr>
<td>c</td>
<td></td>
<td>Reject, since parse stack is empty but input queue is not</td>
</tr>
</table>
</div>
</div>
<div class="section">
<h3>Intuitive explanation</h3>
<p>When applying the LL(1) parsing algorithm, we are making a selection of a production rule to apply based on the next terminal from the input string and the next nonterminal from the parse stack.</p>
<p>In general, we need the string produced by replacing nonterminals with terminals to match the input string exactly for a parse to be successful. Although an LL(1) parse table is a powerful tool, it is not always powerful enough to parse all context-free languages correctly.</p>
</div>
<div class="section">
<h3>Conclusion</h3>
<p>We've now learned the basic rules of LL(1) parsing. Next, we will look at how to interpret the results.</p>
</div>
<hr/>
<p>GitHub Repository: <a href="https://github.com/bprollinson/ripal">https://github.com/bprollinson/ripal</a></p>
<p>Copyright © 2017 Brendan Rollinson-Lorimer</p>
</body>
</html>