tl;dr Cache your code-gen results with the codegenrs
crate.
Lately, there has been talk talk about improving build times, with a focus on
reducing bloat like regex breaking out logic into features that can be
disabled,
cargo-bloat going on a
diet,
new cargo features to identify slow-to-build
dependencies.
The area that has been impacting me lately is build.rs
. I've been
code-generating compile-time hash tables (phf)
which has added several dependencies to my build and takes a while.
Let's use imperative
as an example.
imperative
is a simple way to check a word is in the imperative-mood. The
logic was taken from pydocstyle where it
was used to ensure the subject-line for a doc-comment should start with an
imperative-mood verb.
imperative
uses:
- A set of blacklisted words (73 words).
- A map of word-stems to acceptable full-words (227 full-words).
And relies on the following unique dependencies:
phf_codegen
multimap
What if instead of code-generating in build.rs
as part of imperative
and
all dependents' builds, we checked in the result? The main risks are:
- Contributors forgetting to re-run code-gen
- Contributors changing the code-gen output without modifying the code-generator.
We can mitigate these risks by having the CI run a --check
mode in the
code-generator that ensures the output matches what should be generated.
To setup imperative-codegen
:
- Set
package.publish
tofalse
. - Add a dependency on
codegenrs
. - Write the code-generator.
- Update the CI to check the generated output.
Now let's look at some very unscientific numbers for clean builds of imperative
(a
lib
crate):
imperative | cargo check | cargo build |
---|---|---|
build.rs | 39.94 s | 35.65 s |
codegenrs | 22.55 s | 26.06 s |
Note that this technique might also help make crates work better with alternative build systems.