Speeding Up Rust Builds: Code-Gen Edition

tl;dr Cache your code-gen results with the codegenrs crate.

Lately, there has been talk talk about improving build times, with a focus on reducing bloat like regex breaking out logic into features that can be disabled, cargo-bloat going on a diet, new cargo features to identify slow-to-build dependencies. The area that has been impacting me lately is build.rs. I've been code-generating compile-time hash tables (phf) which has added several dependencies to my build and takes a while.

Let's use imperative as an example. imperative is a simple way to check a word is in the imperative-mood. The logic was taken from pydocstyle where it was used to ensure the subject-line for a doc-comment should start with an imperative-mood verb.

imperative uses:

A set of blacklisted words (73 words).
A map of word-stems to acceptable full-words (227 full-words).

And relies on the following unique dependencies:

phf_codegen
multimap

What if instead of code-generating in build.rs as part of imperative and all dependents' builds, we checked in the result? The main risks are:

Contributors forgetting to re-run code-gen
Contributors changing the code-gen output without modifying the code-generator.

We can mitigate these risks by having the CI run a --check mode in the code-generator that ensures the output matches what should be generated.

To setup imperative-codegen:

Set package.publish to false.
Add a dependency on codegenrs.
Write the code-generator.
Update the CI to check the generated output.

Now let's look at some very unscientific numbers for clean builds of imperative (a lib crate):

`imperative`	`cargo check`	`cargo build`
`build.rs`	39.94 s	35.65 s
`codegenrs`	22.55 s	26.06 s

Note that this technique might also help make crates work better with alternative build systems.