Reflections on writing 3 parser-combinator libraries

December 13, 2023

I have this article to thank for dragging me down the parser combinator rabbit hole. The result has been three separate attempts at writing a parsing library, so here are a some thoughts.

Warning: Unsubstantiated claims ahead.

You might be better off without error handling

A lot of parsing crates out there tout their ability to let you configure how errors are returned from a parse call. This always mucks up the return type, forces you to have a bunch of error-related combinators, and makes the code harder to read. Dispense with error handling from the library, and let users handle errors themselves.

You don't need to be generic over input

Wouldn't it be nice if your library could accept any type of “input stream” and still work? Yes...but no. It isn't worth the pain. The input is almost always bytes. Discard the “abstract stream” concept and force your users to give you bytes.

You don't need to be generic over output

One of the draws of parser combinators is that they let you write declarative parsers in an imperative language. After all the setup and ceremony is done, it feels a little bit like magic. But the magic comes at the cost of debuggability. console.log-style debugging is out the window because the execution looks like a tree, with each node have 0 context about how it got there. Even stepping through with a debugger is pain. One way out of this mess is to always output bytes from your parsers. This forces the library users to parse things one token at a time, in an imperative manner.

Taken together, these ideas lead to designing a library whose only focus is “identifying patterns in bytes”. Bytes in, bytes out. Simple. Understandable. Still useful.

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.