haskell - Setting begin and end of multi-line input in Parsec -
i new @ parsec
. appreciate pointers problem here. say, have csv file fixed number of headers. instead of parsing each line separately, token @ beginning of line, , lines until next line non-empty token. example below:
token,flag,values a,1, ,,a ,,f b,2,
rule valid input is: if token field filled in, lines until next non-empty token field. so, parsec
multiple lines below first input (those multiple lines can parsed rule):
a,1, ,,a ,,f
then, process starts again on next line non-empty token field (last line in example here). trying figure out if there simple way specify rule in parsec
- lines meet rule. handed off parser. basically, looks kind of lookahead
rule specify valid multi-line input. did right?
we can ignore comma separator above now, , input begins when character found @ beginning of line, , ends when character found @ beginning of line.
i solved problem of @user2407038 suggested basic outline in comment. solution , explanation below (please see comments after function - show how function behaves input):
{-# language flexiblecontexts #-} import control.monad import text.parsec import control.applicative hiding ((<|>), many) -- | 1 accepts until newline, , discards newline -- | 1 used building block in functions below restofline :: stream s m char => parsect s u m [char] restofline = many1 (satisfy (\x -> not $ x == '\n')) <* char '\n' -- | line token "many alphanumeric characters" followed -- | characters until newline tokenline :: stream s m char => parsect s u m [char] tokenline = (++) <$> many1 alphanum <*> restofline -- | ghci test: -- | *main text.parsec> parsetest tokenline "a,1,,\n" -- | "a,1,," -- | *main text.parsec> parsetest tokenline ",1,,\n" -- | parse error @ (line 1, column 1): -- | unexpected "," -- |expecting letter or digit -- | non-token line line has number of spaces followed -- | ",", characters until newline nontokenline :: stream s m char => parsect s u m [char] nontokenline = (++) <$> (many space) <*> ((:) <$> char ',' <*> restofline) -- | ghci test: -- | *main text.parsec> parsetest nontokenline ",1,,\n" -- | ",1,," -- | *main text.parsec> parsetest nontokenline "a,1,,\n" -- | parse error @ (line 1, column 1): -- | unexpected "a" -- | expecting space or "," -- | 1 entry tokenline followed number of nontokenline oneentry :: stream s m char => parsect s u m [[char]] oneentry = (:) <$> tokenline <*> (many nontokenline) -- | ghci test - please note drops last line expected -- | *main text.parsec> parsetest oneentry "a,1,,\n,,a\n,,f\nb,2,,\n" -- | ["a,1,,",",,a",",,f"] -- | add 'many' oneentry parse entire file, , multiple match entries multientries :: stream s m char => parsect s u m [[string]] multientries = many oneentry -- | ghci test - please note gets 2 entries expected -- | *main text.parsec> parsetest multientries "a,1,,\n,,a\n,,f\nb,2,,\n" -- | [["a,1,,",",,a",",,f"],["b,2,,"]]
the parser error seen in comments expected on invalid inputs. can handled. above code basic building block started.