toml v0.6 is now out with a new parser and renderer, addressing several existing issues and ensuring compliance with the TOML 1.0 compliance tests. This was done by leveraging the toml_edit crate.

Why toml_edit

Last March, alexcrichton put out a call for a maintainer for the toml crate. I had become a maintainer of the toml_edit crate as part of my work on cargo add, getting toml_edit in shape that it became the only TOML parser in cargo (see rust-lang/cargo#10086) as the cargo team wanted consistent parse behavior. I offered to take over maintenance of toml with goal of migrating toml onto toml_edit.

toml fills a similar role as serde_json, providing serde support for the TOML format and a default data structure to deserialize into. toml_edit is more complex and slower because it needs to preserve all end-user formatting from parse to display. As part of the cargo work, we got toml_edit a lot closer to toml in performance and offered the easy module as a toml compatibility layer.

In theory, we could just migrate everyone from toml to toml_edit and be done.

In practice, there is no way to help people through a crate rename. With structopt being absorbed into clap in 2021, we are still seeing people use structopt unaware of the change over. Additionally, there was interest in a more stable API than what toml_edit offers as we had a lot of churn due to the low level details in the API and as we figure out how best to allow editing of TOML documents.

So keeping the toml crate around was worthwhile and we could lighten the community's overall maintenance burden by combining efforts and code.

Impact on toml users

toml now passes all compliance tests for TOML 1.0, including

Error information also improved, most notably the error messages are changing from the old

invalid type: string "a", expected isize\nin ``

to toml_edit's

TOML parse error at line 2, column 7
2 | bar = "a"
  |       ^^^
invalid type: string "a", expected isize

Callers can also render the errors as they wish, like with ariadne. We also improved the quality of the span information being reported.

Pulling in toml_edit also helped highlight some issues with tomls API. toml::Value would allow you to Display anything. If it looked like a document, it would be rendered as such. Otherwise, it would be rendered as a value. This dynamic API makes it easy to get things wrong. Instead, toml::Value always Displays a TOML value while toml::Table will Display a TOML document. Similar for parse. A concrete example of what this allows is for unambiguously parsing ["a", "b"] as either document or a value.

Users should also expect maintenance going forward to improve as the code base is easier to support and not just because of the two-for-one maintenance. Before, toml had a handwritten parser that had to deal the non-linear nature of TOML. Now, toml parses everything to an AST and then deserialzies to the end-users data types. Separating the steps of parsing and deserialize simplifies them, making it easier to confidently make changes. The parser is also easier to update as it is higher level, using a parser combinator crate.

In rough terms, we expect compiles to be slightly slower as more code across more dependencies is being built. Parse time is also about twice what it was before (62us to 110us for cargo's Cargo.toml on my machine). We do have ideas on how to further improve parse times. We do not track Display time for TOML, assuming it isn't in a critical path.

Impact on toml_edit users

As already mentioned, toml_edit users now have a more stable subset of the API that shares behavior and compile-time.

Otherwise, the biggest gain is span support. We now track the location of each Item within the original document while parsing. This allows you to deserialize to serde_spanned::Spanned<T> to capture that location. Deserialize errors will now look more like parse errors, showing the error location, and allow you to lookup the span programmatically.

Unfortunately, span information is only exposed through serde and errors at this time. To maintain performance, we only capture spans while parsing rather than Strings for all of the format-preserving information, avoiding allocations for serde support. Document::from_str has to replace those spans with strings to allow editing. We don't keep the spans around for the editing API to keep the size of each Item in memory smaller. The spans also present a lot of challenges in an editing API as we can't guarantee what source they are associated with.

Also, a lot of small pieces of polish were found through tomls tests, e.g. inconsistent casing and newlines in errors.

We did speed up toml_edit::de, with parsing cargo's Cargo.toml going from 115us to 111us. However, Document::from_str slowed down, going from 85us to 103us. We have hopes to recover the performance loss.

Long term, the toml_edit::easy API is going to go away in favor of toml.