Winnow is a parser-combinator library for Rust and 0.5 is now out. I last wrote about the 0.3.0 release, so I'll be covering all of the releases since then.
What is Winnow?
To recap my last post, you can think of Winnow as a parser toolbox, making it easier to get up and running with your parser without getting in the way of hand-writing the trickier parts. My hope is that this can serve as a "do anything parser" much like how people treat regex.
Winnow started as a fork of nom as I had found its toolbox model of parsers worked much better for me than the framework model other parser libraries used like combine. The original goals for the fork were to improve the developer experience and to remove a corner case that had a performance cliff you could fall off of.
Being the fastest was a non-goal of the fork as that is a competition I didn't want to get distracted by and felt there were some optimizations that ran counter to my goals. Specifically, chumsky uses GATs to allow the calling parser to communicate to sub-parsers if there were steps that could be discarded (see niko's post "Many modes: a GATs pattern"). This works well for the framework model as you put together the building blocks and the framework takes care of the rest. The problem when applying GATs to the toolbox model is that hand-written parsers are a black box to the system and refactor a parser to be hand-written could mysteriously lead to a dramatic drop in performance. If/when a user is aware of this, to workaround it requires writing some fairly advanced Rust code with a decent amount of boilerplate.
What changed since 0.3.0...
Jump to the bottom if you just want to see performance numbers.
And checkout the changelog for more details.
Usability: Improving type inference
One area of weakness for Winnow was that type inference didn't work as often as I felt it should. We had code that looked roughly like:
pub trait Parser<I, O, E> {
fn parse_next(&self, input: I) -> IResult<I, V, E> {
// ...
}
// ...
}
impl<E> Parser<&str, char, E> for char {
fn parse_next(&self, input: I) -> IResult<I, O, E> {
one_of(self).parse_next(input)
}
}
This allows delimited('(', expr, ')')
to be a short-hand for delimited(one_of('('), expr, one_of(')'))
.
However, there are many parser combinators directly on the Parser
trait, much
like Iterator
, and type inference would fail for these, like
'n'.value('\n')
.
I had hoped that
using associated types would
help but that had its own complexity and downsides.
In the end, I found that we were assuming Rust would be able to infer types
between the calls to Value::parse_next
and Parser::value
when it wouldn't.
Instead, we could use
PhantomData
to do this bookkeeping for Rust to make type inference work.
#222 was released in 0.4.0.
Usability: Clearer top-level parser
In 0.3, we renamed nom's Parser::parse
to Parser::parse_next
to make it
more obvious that this is an incremental step and parsing isn't finished
(I've personally ran into problems with this in some of my projects).
Parser::parse_next
left people with (rest_input, output ErrMode<Error>)
and they needed to:
- Ensure
rest_input
is empty and discard it (easy to overlook) - Unify
ErrMode::Backtrack
andErrMode::Cut
variants
In Winnow 0.3, we provided FinishIResult::finish
:
impl std::str::FromStr for Color {
// The error must be owned
type Err = winnow::error::Error<String>;
fn from_str(s: &str) -> Result<Self, Self::Err> {
hex_color
.parse_next(s)
.finish()
.map_err(winnow::error::Error::into_owned)
}
}
The main problem with using an extension trait is discoverability. Effectively, the only real option is hope people read the right part of the docs to discover it.
We moved the logic from FinishIResultExt::finish
to Parser::parse
now that the name was available:
impl std::str::FromStr for Color {
// The error must be owned
type Err = winnow::error::Error<String>;
fn from_str(s: &str) -> Result<Self, Self::Err> {
hex_color
.parse(s)
.map_err(winnow::error::Error::into_owned)
}
}
#221 was released in 0.4.0.
Usability: Naming and organization
I've found that nom's naming was confusing and parsers were spread across so
may mod
s that it was hard to find what you were looking for as you were
always looking through a pinhole view of the API
#95.
While I don't think we are at a perfect state, I think things are getting better.
For the module hierarchy, we decided to focus on the high level need, so instead of
pub mod bits;
pub mod branch;
pub mod bytes;
pub mod character;
pub mod combinator;
pub mod multi; // a couple of parsers are almost exclusively for binary data
pub mod number; // a mix of ASCII and binary parsers
pub mod sequence;
pub mod trace;
we now have:
pub mod ascii;
pub mod binary;
pub mod combinator;
pub mod token;
pub mod trace; // probably should just be folded into `combinator`
Always open to people's ideas for how to further improve discoverability and clarity of parser code.
#239, #240 were released in 0.4.0.
Usability: Ranged parsers
Are take_while_m_n
and repeat_m_n
inclusive or exclusive for m
and n
?
In Rust parlance, I think its m..=n
but I don't even remember for sure.
We provide repeat0
, repeat1
, and repeat_m_n
, so if you need repeat5
,
might not immediately think to wrap repeat_m_n
.
And for myself, I think it'd be nice to reduce the redundancy of those parsers so we have a less overwhelming list of parsers.
So we switched from having separate 0..
, 1..
, and m..=n
parsers to a single version that takes in an
impl Into<Range>
,
replacing
repeat0(parser::key_value).parse_next(i)
with
repeat(0.., parser::key_value).parse_next(i)
In contrast, chumsky takes the approach of:
key_value.repeated()
with additional functions for controlling the range. I felt that within a parser's grammar, the fact that it is repeating is important information that I wanted it more obvious by leading with it.
I had hoped that we could do
(Repeat(0..) * parser::key_value).parse_next(i)
but I ran into issues.
#241 was released in 0.4.2 and #247 was released in 0.4.6.
Usability: Better out-of-the-box errors
IResult
defaulted the E
generic parameter to InputError
but I found that was rarely useful
(#103) with the output being:
error OneOf at:
name = "foo"
- This characterized the error by the
ErrorKind
which is debug information at best - This dumped all of the output from where the failure started to occur (in this case, there was a missing
]
before a newline) - We could not add additional context. For example, with
combine
my errors would list what tokens were expected
So I took the error I made from toml_edit
and pulled it in as ContextError
:
parse error at line 1, column 9
|
1 | [package
| ^
invalid table
expected ']'
In providing this new, rich error that is nearly as fast as no-op errors,
I went ahead and removed VerboseError
which makes parsing json take twice as
long by itself. This is a case where GATs would likely remove that overhead.
#282 was released in 0.5.0.
Performance: Inlining
As I mentioned, being the absolute fastest parser-combinator library is not our goal.
We fell into being one of the fastest by happenstance.
When working on API cleanups after the fork from nom, I sprinkled some
#[inline]
s where they intuitively made sense and things got faster.
I initially didn't advertise this performance gain because I hadn't fully
characterized it and didn't know how much of a fluke it was, because my attention was elsewhere.
Since then, we've found some other places where sprinkling around #[inline]
made a big difference.
Rather than trying to take the time to characterize each set of changes we
made, I'll leave that for the performance numbers at the end.
Performance: Direct people to the happy path
nom::one_of
would match a token from a set:
fn hexadecimal(input: &str) -> IResult<&str, &str> {
preceded(
alt((tag("0x"), tag("0X"))),
recognize(
many1(
terminated(one_of("0123456789abcdefABCDEF"), many0(char('_')))
)
)
)(input)
}
We carried this behavior forward and generalized to also accepts ranges of
tokens ('a'..='z'
) and tuples of token sets (('a'..='z', 'A'..='Z')
).
However, when trying to find the root cause of a performance regression, we
found that using &str
as a set of char
tokens was 7x slower than other
methods in micro-benchmarks, likely due to unpacking variable-length UTF-8
bytes into a char
(#226).
So we decided to remove the trait impl that allowed &str
to serve as a token set
to avoid people accidentally taking this performance hit.
So instead our parser would look like:
fn hexadecimal(input: &str) -> IResult<&str, &str> {
preceded(
alt(("0x", "0X")),
repeat(
0..,
terminated(one_of(('0'..='9', 'a'..='f', 'A'..='F')), repeat(0.., '_'))
).recognize()
).parse_next(input)
}
#252 was released in 0.5.0.
Performance: Imperative, rather than pure-functional parsing
We started with the parsing model of nom:
fn foo<I, O, E>(input: I) -> Result<(I, O), ErrMode<E>> {}
This works well!
As you parse input
, you return the mutated version of it.
This makes it easy to reason about and test.
But there is a problem and, from this section, you can guess its performance.
The good news is that this isn't a general performance issue.
In fixing this, most benchmarks didn't see a change.
I suspect this is specifically when dealing with large I
and O
making (I, O)
in the return type to be very large and causing a performance loss.
In toml_edit
, we hit this because we use I=Located<&str>
which is (&str, &str)
or 4 pointers combined with large O
s from tracking values and their
formatting.
We can instead switch the parser signature to:
fn foo<I, O, E>(input: &mut I) -> Result<O, ErrMode<E>> {}
and now the return type's size is independent of the size of I
.
The primary technical challenge is dealing with the error case.
Translating nom
s model to this, on any error, input
should revert back to
its original position.
Instead, I found we could have error cases leave input
as-is.
This doesn't just simplify and reduce overhead from making input
transactional but allows E
to not track the original location where the error
occurred but to instead use the final location input
was left at.
But do we lose the spirit of what we are trying to accomplish?
I'm still unsure. I was pleasantly surprised to find that a lot of code was simplified because of this model.
However, there are cases where explicit lifetimes are required:
fn foo<'i>(input: &mut &'i str) -> Result<&'i str, ErrMode<E>> {}
I'm concerned this style of parser is not as Rust beginner friendly.
That needs to be counter-balanced with my suspicion that this new API will allow users to avoid RefCell
.
Maybe its also ok because zero-copy parsing might already be more advanced?
At this point, I feel like the only way to know for sure is to give it a spin.
Performance by the numbers
Using chumsky
's json
benchmark:
library | version | time | note |
---|---|---|---|
pom | 3.3.0 | 7.3021 ms | |
pest | 2.7.0 | 1.2894 ms | |
nom | 7.1.3 | 341.28 µs | |
serde_json | 1.0.102 | 317.31 µs | |
chumsky (zero copy) | 7a2e2e6 | 191.06 µs | |
sn | 0.1.2 | 154.84 µs | |
winnow | 0.3.6 | 125.54 µs | |
winnow | 0.4.0 | 138.71 µs | |
winnow | 0.4.1 | 251.97 µs | |
winnow | 0.4.2 | 242.56 µs | |
winnow | 0.4.3 | 240.69 µs | |
winnow | 0.4.4 | 230.01 µs | |
winnow | 0.4.5 | 257.89 µs | |
winnow | 0.4.6 | 262.45 µs | |
winnow | 0.4.7 | 231.48 µs | |
winnow | 0.4.8 | 234.59 µs | |
winnow | 0.4.9 | 244.81 µs | |
winnow | 0.4.9 | 245.02 µs | migrated bench to non-deprecated APIs |
winnow | 0.4.9 | 245.82 µs | made bench more declarative |
winnow | 0.4.9 | 96.408 µs | migrated bench away from slow API |
winnow | 0.5.0 | 97.328 µs |
Using toml_edit
's cargo_manifest/toml_edit/medium
:
version | parser | version | time | note |
---|---|---|---|---|
0.15.0 | combine | 4.6.6 | 91.160 µs | |
0.16.0 | nom8 | 0.1.0 | 79.140 µs | |
0.16.1 | nom8 | 0.1.0 | 79.813 µs | |
0.16.2 | nom8 | 0.2.0 | 79.971 µs | |
0.19.3 | nom8 | 0.2.0 | 96.821 µs | |
0.19.4 | winnow | 0.3.0 | 78.559 µs | |
0.19.7 | winnow | 0.3.0 | 78.502 µs | |
0.19.7 | winnow | 0.3.6 | 79.326 µs | |
0.19.8 | winnow | 0.4.0 | 77.498 µs | |
0.19.8 | winnow | 0.4.1 | 76.795 µs | |
0.19.8 | winnow | 0.4.2 | 78.532 µs | |
0.19.8 | winnow | 0.4.3 | 78.682 µs | |
0.19.8 | winnow | 0.4.4 | 78.562 µs | |
0.19.8 | winnow | 0.4.5 | 78.572 µs | |
0.19.8 | winnow | 0.4.6 | 83.424 µs | |
0.19.9 | winnow | 0.4.6 | 83.856 µs | |
0.19.9 | winnow | 0.4.7 | 73.803 µs | |
0.19.9 | winnow | 0.4.8 | 73.605 µs | |
0.19.9 | winnow | 0.4.9 | 74.002 µs | |
0.19.12 | winnow | 0.4.9 | 74.029 µs | |
68be6ab | winnow | 0.4.9 | 74.085 µs | prep for winnow 0.5.0 |
68be6ab+ | winnow | 0.5.0 | 63.075 µs |
Methodology:
I ran cargo bench
on the individual bench until I observed a second cluster
of results and picked the lowest.
I've observed that on my machine, jitter is fairly tight around two clusters
and without waiting for that second cluster, the results can be misleading if the slower cluster manifested.
I would recommend not getting caught up in differences up to 10%, based on the variance I saw.
I should look into icount benchmarks...
Observations:
- The hassle of running benchmarks can lead to blindspots.
I tended to focus on the performance impact to a Winnow benchmark and a
toml_edit
benchmark. Winnow uses lower level parsers to better line the code up with the TOML BNF grammar and I was able to catch most performance regressions there. Similarly, we fixed all reported regressions and yet they didn't affect the json bench list above. Even if I ran all of the benchmarks, I was unlikely to catch some of these problems. Switching to icount benchmarks with a benchmark CI service would be a big help. - Seemingly minor refactors can have big affects. Some came from what was inlined or not (and adding more inlines helped) but I never figured it all out.
- The performance of some parsers seems dependent on the context its called in. In micro-benchmarks, I would not see a problem reproducible within a file-format parser using winnow. Again, I suspect inlining differences to be at play.
So if Winnow is the fastest, why is the title of this post a question?
- chumsky's benchmark is missing yap which seems promising on performance. For more parsers, see parse-rosetta-rs
- Performance will depend on your use case and needed feature set. I could make Winnow faster by changing the error type in the bench. The bench could also be slower if the parser was instrumented with span tracking or error recovery.
Note: every change that has led to a performance improvement has been reported back to nom