tl;dr Cache your code-gen results with the
Lately, there has been talk talk about improving build times, with a focus on
reducing bloat like regex breaking out logic into features that can be
cargo-bloat going on a
new cargo features to identify slow-to-build
The area that has been impacting me lately is
build.rs. I've been
code-generating compile-time hash tables (phf)
which has added several dependencies to my build and takes a while.
imperative as an example.
imperative is a simple way to check a word is in the imperative-mood. The
logic was taken from pydocstyle where it
was used to ensure the subject-line for a doc-comment should start with an
- A set of blacklisted words (73 words).
- A map of word-stems to acceptable full-words (227 full-words).
And relies on the following unique dependencies:
What if instead of code-generating in
build.rs as part of
all dependents' builds, we checked in the result? The main risks are:
- Contributors forgetting to re-run code-gen
- Contributors changing the code-gen output without modifying the code-generator.
We can mitigate these risks by having the CI run a
--check mode in the
code-generator that ensures the output matches what should be generated.
- Add a dependency on
- Write the code-generator.
- Update the CI to check the generated output.
Now let's look at some very unscientific numbers for clean builds of
| || || |
| ||39.94 s||35.65 s|
| ||22.55 s||26.06 s|
Note that this technique might also help make crates work better with alternative build systems.