toml
v0.6
is now out with a new parser and renderer, addressing several existing issues
and ensuring compliance with the TOML 1.0 compliance tests. This was done by
leveraging the
toml_edit
crate.
Why toml_edit
Last March, alexcrichton put out a call for a maintainer for the
toml
crate. I had become a maintainer of
the toml_edit
crate as part of my work
on cargo add
, getting toml_edit
in shape that it became the only TOML parser in
cargo (see
rust-lang/cargo#10086) as the
cargo team wanted consistent parse behavior. I offered to take over
maintenance of toml
with goal of migrating toml
onto toml_edit
.
toml
fills a similar role as serde_json
, providing serde support for the
TOML format and a default data structure to deserialize
into. toml_edit
is more complex and slower because it needs to preserve all
end-user formatting from parse to display. As part of the cargo work, we got
toml_edit
a lot closer to toml
in performance
and offered the easy
module as a toml
compatibility layer.
In theory, we could just migrate everyone from toml
to toml_edit
and be
done.
In practice, there is no way to help people through a crate rename.
With structopt
being absorbed into clap
in 2021, we are still seeing people
use structopt
unaware of the change over. Additionally, there was interest
in a more stable API than what toml_edit
offers as we had a lot of churn due
to the low level details in the API and as we figure out how best to allow
editing of TOML documents.
So keeping the toml
crate around was worthwhile and we could lighten the
community's overall maintenance burden by combining efforts and code.
Impact on toml
users
toml
now passes all compliance tests for TOML 1.0, including
- Not erroring when a table appends to dotted keys
- No more stray
,
when writing arrays of tables - No more
ValueAfterTable
errors when writing top-level key-value pairs, requiring users to opt-in to a fix
Error information also improved, most notably the error messages are changing from the old
invalid type: string "a", expected isize\nin `foo.bar`
to toml_edit
's
TOML parse error at line 2, column 7
|
2 | bar = "a"
| ^^^
invalid type: string "a", expected isize
Callers can also render the errors as they wish, like with ariadne. We also improved the quality of the span information being reported.
Pulling in toml_edit
also helped highlight some issues with toml
s API.
toml::Value
would
allow you to Display
anything. If it looked like a document, it would be
rendered as such. Otherwise, it would be rendered as a value. This dynamic API
makes it easy to get things wrong. Instead, toml::Value
always Display
s a
TOML value while toml::Table
will Display
a TOML document. Similar for
parse. A concrete example of what this allows is for
unambiguously parsing ["a", "b"]
as either document or a value.
Users should also expect maintenance going forward to improve as the code base
is easier to support and not just because of the two-for-one maintenance.
Before, toml
had a handwritten parser that had to deal the non-linear nature
of TOML. Now, toml
parses everything to an AST and then deserialzies to the
end-users data types. Separating the steps of parsing and deserialize
simplifies them, making it easier to confidently make changes. The parser is
also easier to update as it is higher level, using a parser combinator crate.
In rough terms, we expect compiles to be slightly slower as more code across
more dependencies is being built. Parse time is also about twice what it
was before (62us to 110us for cargo's Cargo.toml
on my machine). We do have
ideas on how to further improve parse times. We do not track Display
time
for TOML, assuming it isn't in a critical path.
Impact on toml_edit
users
As already mentioned, toml_edit
users now have a more stable subset of the API that shares behavior and compile-time.
Otherwise, the biggest gain is span support. We now track the location of each
Item
within the original document while parsing. This allows you to
deserialize to
serde_spanned::Spanned<T>
to capture that location. Deserialize errors will now look more like parse
errors, showing the error location, and allow you to lookup the span
programmatically.
Unfortunately, span information is only exposed through serde and errors at
this time. To maintain performance,
we only capture spans while parsing rather than Strings for all of the
format-preserving information, avoiding allocations for serde support.
Document::from_str
has to replace those spans with strings to allow editing.
We don't keep the spans around for the editing API to keep the size of each
Item
in memory smaller. The spans also present a lot of challenges in an
editing API as we can't guarantee what source they are associated with.
Also, a lot of small pieces of polish were found through toml
s tests, e.g.
inconsistent casing and newlines in errors.
We did speed up toml_edit::de
, with parsing cargo's Cargo.toml
going from
115us to 111us. However, Document::from_str
slowed down, going from 85us to
103us. We have hopes to recover the performance loss.
Long term, the toml_edit::easy
API is going to go away in favor of toml
.