-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
202 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
-- A simple parsing library that imitates the naming of the Megaparsec | ||
-- API, but is implemented differently. In particular, this is a | ||
-- backtracking-based parser, because this is easier to explain. | ||
module QuasiParsec | ||
( -- Parser interface | ||
Parser, | ||
parse, | ||
-- Primitive parsers | ||
satisfy, | ||
chunk, | ||
notFollowedBy, | ||
choice, | ||
eof, | ||
-- Parser combinators | ||
many, | ||
some, | ||
) | ||
where | ||
|
||
import Control.Monad (ap) | ||
|
||
-- A parser of things | ||
-- is a function from strings | ||
-- to a list of pairs | ||
-- of things and strings | ||
newtype Parser a = Parser {runParser :: String -> [(a, String)]} | ||
|
||
parse :: Parser a -> String -> [(a, String)] | ||
parse (Parser f) s = f s | ||
|
||
instance Functor Parser where | ||
fmap f x = do | ||
x' <- x | ||
pure $ f x' | ||
|
||
instance Applicative Parser where | ||
(<*>) = ap | ||
pure x = Parser $ \s -> [(x, s)] | ||
|
||
instance Monad Parser where | ||
Parser f >>= g = Parser $ \s -> | ||
concatMap | ||
( \(x, s') -> | ||
let Parser g' = g x | ||
in g' s' | ||
) | ||
(f s) | ||
|
||
instance MonadFail Parser where | ||
fail _ = Parser $ \_ -> [] | ||
|
||
-- Primitive parsers | ||
|
||
satisfy :: (Char -> Bool) -> Parser Char | ||
satisfy p = Parser f | ||
where | ||
f [] = [] | ||
f (c : cs) = | ||
if p c | ||
then [(c, cs)] | ||
else [] | ||
|
||
notFollowedBy :: Parser a -> Parser () | ||
notFollowedBy (Parser f) = Parser $ \s -> | ||
case f s of | ||
[] -> [((), s)] | ||
_ -> [] | ||
|
||
choice :: [Parser a] -> Parser a | ||
choice ps = Parser $ \s -> concatMap (\p -> runParser p s) ps | ||
|
||
chunk :: String -> Parser String | ||
chunk = sequence . map (satisfy . (==)) | ||
|
||
eof :: Parser () | ||
eof = Parser $ \s -> | ||
case s of | ||
"" -> [((), "")] | ||
_ -> [] | ||
|
||
-- Parser combinators | ||
|
||
many :: Parser a -> Parser [a] | ||
many p = | ||
choice | ||
[ (:) <$> p <*> many p, | ||
pure [] | ||
] | ||
|
||
some :: Parser a -> Parser [a] | ||
some p = (:) <$> p <*> many p |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
-- Random code related to Week 3 | ||
|
||
import Data.Char (isDigit, ord) | ||
|
||
charInteger :: Char -> Integer | ||
charInteger c = toInteger $ ord c - ord '0' | ||
|
||
readInteger :: String -> Integer | ||
readInteger s = loop 1 $ reverse s | ||
where | ||
loop _ [] = 0 | ||
loop w (c : cs) = charInteger c * w + loop (w * 10) cs | ||
|
||
readIntegerMaybe :: String -> Maybe Integer | ||
readIntegerMaybe s = loop 1 $ reverse s | ||
where | ||
loop _ [] = Just 0 | ||
loop w (c : cs) | ||
| isDigit c = do | ||
x <- loop (w * 10) cs | ||
pure $ charInteger c * w + x | ||
| otherwise = | ||
Nothing |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,89 @@ | ||
# Monadic and Applicative Parsing | ||
|
||
It is sometimes the case that we have to operate on data that is not | ||
already in the form of a richly typed Haskell value, but is instead | ||
stored in a file or transmitted across the network in some serialised | ||
format - usually in the form of a sequence of bytes or characters. | ||
Although such situations are always regrettable, in this chapter we | ||
shall see a flexible technique for making sense out of unstructured | ||
data: *parser combinators*. | ||
|
||
## Parsing Integers Robustly | ||
|
||
To start with, consider turning a `String` of digit characters into | ||
the corresponding `Integer`. That is, we wish to construct the | ||
following function: | ||
|
||
```Haskell | ||
readInteger :: String -> Integer | ||
``` | ||
|
||
The function `ord :: Char -> Int` from `Data.Char` can convert a | ||
character into its corresponding numeric code. Exploiting the fact | ||
that the integers have consecutive codes, we can write a function for | ||
converting a digit character into its corresponding `Integer`. Note | ||
that we have to convert the `Int` produce by `ord` into an `Integer`: | ||
|
||
```Haskell | ||
import Data.Char (ord) | ||
|
||
charInteger :: Char -> Integer | ||
charInteger c = toInteger $ ord c - ord '0' | ||
``` | ||
|
||
Exploiting the property that the numeric characters are consecutively | ||
encoded, we can implement `readInt` with a straightforward recursive | ||
loop over the characters of the string, from right to left: | ||
|
||
```Haskell | ||
readInteger :: String -> Integer | ||
readInteger s = loop 1 $ reverse s | ||
where | ||
loop _ [] = 0 | ||
loop w (c : cs) = charInteger c * w + loop (w * 10) cs | ||
``` | ||
|
||
Example use: | ||
|
||
``` | ||
> readInteger "123" | ||
123 | ||
``` | ||
|
||
However, see what happens if we pass in invalid input: | ||
|
||
```Haskell | ||
λ> readInteger "xyz" | ||
8004 | ||
``` | ||
|
||
Silently producing garbage on invalid input is usually considered poor | ||
engineering. Instead, our function should return a `Maybe` type, | ||
indicating invalid input by returning `Nothing`. This can be done by | ||
using `isDigit` from `Data.Char` to check whether each character is a | ||
proper digit: | ||
|
||
```Haskell | ||
readIntegerMaybe :: String -> Maybe Integer | ||
readIntegerMaybe s = loop 1 $ reverse s | ||
where | ||
loop _ [] = Just 0 | ||
loop w (c : cs) | ||
| isDigit c = do | ||
x <- loop (w * 10) cs | ||
pure $ charInteger c * w + x | ||
| otherwise = | ||
Nothing | ||
``` | ||
|
||
Note how we are using the fact that `Maybe` is a monad to avoid | ||
explicitly checking whether the recursive call to `loop` fails. | ||
|
||
We now obtain the results we would expect: | ||
|
||
``` | ||
> readIntegerMaybe "123" | ||
Just 123 | ||
> readIntegerMaybe "xyz" | ||
Nothing | ||
``` |