Recently, there was an announcement for
a port of C++
transrangers to Rust with a follow up
the author of
Seeing the performance numbers, I was curious what the experience was like with the different techniques compared in the followup and how the performance worked out in a real world application.
I decided to port
typos, a source code
spell checker, to
For more on what
pushgen bring, see:
typos scans a git repo, spell checking file names and content. It will
automatically ignore binary files.
Spell checking involves:
- Extracting text that looks like identifiers from the file. To lower false positives, we try to detect and ignore emails, URLs, hashes, etc.
- Checking identifiers against a dictionary (atm has no content)
- If the dictionary could not confirm nor deny the spelling, we split the
identifier into individual words (e.g.
- Check each word against against a dictionary
typos looks like:
def walk_path(repo): for path in repo: check_file(path) def check_file(path): for typo in check_str(path.name): report(typo) content = path.read_text() if not is_binary(content): for typo in check_bytes(content): report(typo) def check_bytes(buffer): # similar for `check_str` for identifier in parser.parse_bytes(buffer): if identifier in dictionary: yield dictionary[identifier] else: for word in identifier.split(): if identifier in dictionary: yield dictionary[identifier] def parse_bytes(buffer): if is_ascii(buffer): for identifier in ascii_parser(buffer): yield identifier else: for text in utf8_chunks(buffer): for identifier in unicode_parser(buffer): yield identifier def parse_str(buffer): if is_ascii(buffer): for identifier in ascii_parser(buffer): yield identifier else: for identifier in unicode_parser(buffer): yield identifier
- Starting with
check_bytes, everything is composed of iterators nested in each other with transforms applied to them.
- Not as obvious in pseudo-code, extra layers are added to unify the iterator
types between each branch. In a language like Python, this is done by
typos, I chose
I need to first say, that I seem to have some jitter on my machine considering
showed parsing taking 7% more time in some cases despite not changing. I am
running in WSL on a laptop that is plugged in with most applications closed. I
didn't spend much time making the environment less jittery.
With that said,
try_for_each seems to have slowed down
check_file more than that jitter:
check_file/Typos/code time: [147.52 us 150.23 us 153.51 us] thrpt: [1.8886 MiB/s 1.9298 MiB/s 1.9653 MiB/s] change: time: [+26.226% +29.951% +34.477%] (p = 0.00 < 0.05) thrpt: [-25.638% -23.048% -20.777%] Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe check_file/Typos/corpus time: [50.001 ms 50.776 ms 51.638 ms] thrpt: [11.257 MiB/s 11.448 MiB/s 11.626 MiB/s] change: time: [+21.952% +24.403% +26.950%] (p = 0.00 < 0.05) thrpt: [-21.229% -19.616% -18.001%] Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) high mild 4 (4.00%) high severe
check_file is doing a lot more than running iterators, like IO.
One thing hurting
try_for_each is that iterator authors can't provide
their own optimized implementation yet because the
Try trait isn't stable.
Since all iterators composed together and were only consumed in
this made it easy to port to
try_for_each (technically overriding the provided
try_fold) is going to be required to get the performance gains, that is
unfortunate. When implementing
Iterator, its already easy to overlook what
all other features we should implement for being a good citizen (e.g. size
hints). Besides forgetting
try_for_each, we'll also have to implement our
algorithms twice and keep them in sync. A good test harness will help but all
of this increases the friction for taking advantage of
pushgen, the improvements from lowering iteration overhead are not noticeable compared to everything else, like IO:
check_file/Typos/code time: [113.92 us 115.85 us 118.02 us] thrpt: [2.4566 MiB/s 2.5025 MiB/s 2.5449 MiB/s] change: time: [-1.1247% +1.7826% +4.7350%] (p = 0.21 > 0.05) thrpt: [-4.5209% -1.7513% +1.1375%] No change in performance detected. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe check_file/Typos/corpus time: [38.571 ms 39.072 ms 39.617 ms] thrpt: [14.673 MiB/s 14.877 MiB/s 15.071 MiB/s] change: time: [-6.0200% -4.2708% -2.4899%] (p = 0.00 < 0.05) thrpt: [+2.5535% +4.4613% +6.4057%] Performance has improved. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) high mild 4 (4.00%) high severe
When breaking out just parsing, we do see decent improvements, for example:
parse_str/unicode/code time: [13.505 us 13.705 us 13.909 us] thrpt: [20.844 MiB/s 21.153 MiB/s 21.468 MiB/s] change: time: [-19.030% -16.653% -14.273%] (p = 0.00 < 0.05) thrpt: [+16.650% +19.980% +23.503%] Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe parse_str/ascii/code time: [13.742 us 13.959 us 14.208 us] thrpt: [20.405 MiB/s 20.769 MiB/s 21.096 MiB/s] change: time: [-12.491% -10.529% -8.5378%] (p = 0.00 < 0.05) thrpt: [+9.3348% +11.768% +14.274%] Performance has improved. Found 5 outliers among 100 measurements (5.00%) 4 (4.00%) high mild 1 (1.00%) high severe
And the overhead for adapting word-splitting into a
pushgen::Generator was minimal:
split/words/code time: [1.4384 us 1.4654 us 1.4973 us] thrpt: [193.63 MiB/s 197.84 MiB/s 201.56 MiB/s] change: time: [-1.9718% +2.1027% +5.9130%] (p = 0.29 > 0.05) thrpt: [-5.5829% -2.0594% +2.0114%] No change in performance detected. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild split/words/corpus time: [2.7414 ms 2.7730 ms 2.8080 ms] thrpt: [207.02 MiB/s 209.63 MiB/s 212.05 MiB/s] change: time: [+2.0584% +3.7444% +5.7863%] (p = 0.00 < 0.05) thrpt: [-5.4698% -3.6092% -2.0169%] Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 6 (6.00%) high mild 2 (2.00%) high severe
pushgen won't be a magic bullet to improve performance but it might help with
specific hot-loops. Let profilers and benchmarks be your guide.
pushgen is in its early stages, at v0.0.1. I had to implement a lot of
features before porting
typos to use
pushgen mostly involved importing traits and swapping types.
- I did port my custom
Iteratorto being a
- I adapted my word-splitting
pushgen::from_itersince I knew that would be messier and I wanted to know whether it was worth it first.
- Activating these
pushgen::Generators was taken care of by basing my branch off of my
See PR #311.
There are still areas to explore to match the feature set of
iterators, size hints, etc.
I imagine the name
Generator will become confusing as Rust