class: title # Testing My Patience ## An exploration of testing in rust ??? One of the strengths of Rust is that all of the basics are readily available, from dependency management, to documentation, to testing. --- name: history ## In the beginning... ```rust *#[test] fn some_case() { * assert_eq!(1, 2); } ``` ```console awesomeness-rs/ Cargo.toml * src/ # whitebox tests go here lib.rs submodule.rs submodule/ tests.rs * tests/ # blackbox tests go here is_awesome.rs ``` ??? There is little ceremony to testing; just drop an annotated function in the relevant file And things haven't really changed since then. Which isn't necessarily bad; simplicity can be a strength - High value-to-ceremony ratio - Exclusively running tests in Parallel puts pressure on tests being scalable - Being the standard way to test makes it easy to jump between projects --- name: pain ## Pain points Custom test harnesses - `cargo-test-macro` - `trybuild` - `trycmd` - `toml-test-rs` - `criterion` ??? For me, it hasn't been all roses. Day-to-day I'm dealing with libtest workarounds --- ## Holy envy Stendahl's three rules of religious understanding: 1. When you are trying to understand another religion, you should ask the adherents of that religion and not its enemies. 2. Don't compare your best to their worst. 3. **Leave room for "holy envy."** ??? I also have "holy envy" for Python's testing, a term coined by a Church of Sweden Bishop for discussing religions but is more broadly applicable. If you know of others, I would love to hear of why you feel "holy env" for. --- count: false ## Holy envy ```python def pytest_addoption(parser): * parser.addoption( "--can-in-interface", default="None", action="store", help="The CAN interface to use with the tests") *@pytest.fixture def can_in_interface(request): interface = request.config.getoption("--can-in-interface") if interface.lower() == "none": * pytest.skip("Test requires a CAN board") return interface def test_wait_for_intf_communicating(can_in_interface): # ... ``` ??? In this short pytest sample, we've got - CLI extensions - Fine-grained test fixtures - Runtime skipping of tests Among others However, Rust is not Python. Some of Python's features that help make pytest what it is include: - Use of exceptions - Dynamic nature of Python - Decorators having a lower barrier of entry to build or use than proc macros - Being third-party, it is easier to experiment and iterate - Standard CLI parser So let's step through some of these pain points --- ## Pain point: conditional ignores ```rust #[test] fn simple_hg() { // ... } ``` ```console $ cargo test Compiling cargo v0.72.0 Finished test [unoptimized + debuginfo] target(s) in 0.62s Running /home/epage/src/cargo/tests/testsuites/main.rs running 1 test * test new::simple_hg ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s ``` ??? Awesome, our test passes! --- count: false ## Pain point: conditional ignores ```rust #[test] fn simple_hg() { * if !has_command("hg") { * return; * } // ... } ``` ```console $ cargo test Compiling cargo v0.72.0 Finished test [unoptimized + debuginfo] target(s) in 0.62s Running /home/epage/src/cargo/tests/testsuites/main.rs running 1 test * test new::simple_hg ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s ``` ??? Yes, if you are watching your coverage closely enough, you might identify this sooner --- ## Pain point: conditional ignores ```rust #[test] fn simple_hg() { * if !has_command("hg") { * return; * } // ... } ``` ```python *@pytest.mark.skipif(!has_command("hg"), reason="requires `hg` CLI") def test_simple_hg(): pass ``` --- ## Pain point: lack of fixtures ```rust fn cargo_add_lockfile_updated() { * let scratch = tempfile::tempdir::new().unwrap(); // ... } ``` ??? RAII handle the role of fixtures for us --- count: false ## Pain point: lack of fixtures ```rust fn cargo_add_lockfile_updated() { * let scratch = tempfile::tempdir::new().unwrap(); // ... * scratch.close().unwrap(); } ``` ??? The error being ignored on implicit close actually masked errors in some of my tests on Windows. But how do we access to debug failures? - Its cleaned up - The name is not predictable Cargo instead only cleans up their temp directory fixture on the next run but that leads to a different problem: CI failing due to storage limits. So how do we identify which tests are taking up too much space? --- ## Pain point: lack of fixtures ```rust fn cargo_add_lockfile_updated() { * let scratch = tempfile::tempdir::new().unwrap(); // ... * scratch.close().unwrap(); } ``` ```python *def cargo_add_lockfile_updated(tmpdir): # ... ``` --- ## Pain point: lack of test generation ```rust #[test] fn integers() { let cases = [ ("+99", 99), ("42", 42), ("0", 0), ("-17", -17), ("1_2_3_4_5", 1_2_3_4_5), ("0xF", 15), ("0o0_755", 493), ("0b1_0_1", 5), (&std::i64::MIN.to_string()[..], std::i64::MIN), (&std::i64::MAX.to_string()[..], std::i64::MAX), ]; * for &(input, expected) in &cases { let parsed = integer.parse(new_input(input)); assert_eq!(parsed, Ok(expected)); } } ``` ??? Data driven tests are an easy way to cover a lot of cases (granted, property testing is even better) However, - You don't get context on `input` - You have to fix them in-order, requiring careful ordering of the cases - You don't get the bigger picture of whats working and failing - You can't select a specific case to run / debug - Debug output will be flooded from prior cases Side note: Alternatively - you could write a test per case, calling a shared function - you could write a macro to generate a test per case Side note: Another version of this is criterion with its bench groups --- ## Pain point: lack of test generation (part 2) ```rust #[test] fn ui() { let t = trybuild::TestCases::new(); * t.compile_fail("tests/ui/*.rs"); } ``` .image-middle[![trybuild output](https://user-images.githubusercontent.com/1940490/57186576-7b0b5200-6e96-11e9-8bfd-2de705125108.png)] ??? Some projects help simplify large, complex test generation, including - trybuild - trycmd - libtest-mimic - criterion Custom harnesses are a second class experience - Require their own test binary - Varying levels of support or extensions for how to interact with --- ## Pain point: lack of test generation ```rust #[test] fn ui() { let t = trybuild::TestCases::new(); * t.compile_fail("tests/ui/*.rs"); } ``` ```python *@pytest.mark.parametrize("sample_rs", trybuild.find("tests/ui/*.rs")) def ui(sample_rs): * trybuild.verify(sample_rs) ``` ??? --- ## Pain point: scaling up ```rust #[cargo_test(requires_hg)] fn simple_hg() { // ... } ``` ```rust #[test] fn ui() { let t = trybuild::TestCases::new(); t.compile_fail("tests/ui/*.rs"); } ``` ```rust #[test] fn cli_tests() { trycmd::TestCases::new() .case("tests/cmd/*.toml") .case("README.md"); } ``` ??? While this wasn't even exhaustive from my own experience, I think a common thread through this is "scaling up". After enough pain and with enough contributors, projects will work around these problems, like cargo having its own `cargo_test` macro with fixtures coupled tightly to that macro. This takes a toll on those projects until they say enough is enough. This also means they are each solving these problems in different ways, losing that transferability of experience that was one of the highlights of Rust's test experience and not being composable. --- name: forward class: title # Path forward ??? I feel a pytest-like API would give us the extensibility needed to cover all of the custom test harness use cases I know of. The biggest issue is that this is a lot to stabilize for libtest. The approach I'm taking is to develop the user-facing "pytest" features in an external crate, experimenting with what extension points a "libtest2" would need. --- ## Path forward ### Milestones 1. libtest2-mimic ←we are here 2. libtest2 3. pytest 4. criterion, trybuild, etc 5. Merge libtest2 into libtest? ??? I'm trying to break this down into smaller milestones to help vet the design as we go. My minimum expectations for this effort are: - A first-in-class external test harness to integrate around - A stablized json output to improve `cargo test` / `libtest` interactions - I'm dogfooding the json format by implementing all other output formats in terms of it --- ## Help? Questions - What do you have "holy envy" for? - What odd ball test scenarios have I not seen yet? Get involved: github.com/epage/pytest-rs Offload my work :) ??? Side note: interesting links - See also [custom test harness](https://github.com/rust-lang/rust/issues/50297). - [RFC 2011](https://github.com/rust-lang/rust/issues/44838) for nicer asserts - [#5609](https://github.com/rust-lang/cargo/issues/5609) for running tests in parallel - [#4324](https://github.com/rust-lang/cargo/issues/4324) for summarizing test results - [#2832](https://github.com/rust-lang/cargo/issues/2832) for reducing test noise