Test

class: title

# Testing My Patience

## An exploration of testing in rust

???
One of the strengths of Rust is that all of the basics are readily available, from dependency management, to documentation, to testing.

---
name: history 
## In the beginning...

```rust
*#[test]
fn some_case() {
*   assert_eq!(1, 2);
}
```

```console
awesomeness-rs/
   Cargo.toml
*  src/          # whitebox tests go here
    lib.rs
    submodule.rs
    submodule/
      tests.rs
*  tests/        # blackbox tests go here
   is_awesome.rs
```

???
There is little ceremony to testing; just drop an annotated function in the relevant file

And things haven't really changed since then.

Which isn't necessarily bad; simplicity can be a strength
- High value-to-ceremony ratio
- Exclusively running tests in Parallel puts pressure on tests being scalable
- Being the standard way to test makes it easy to jump between projects

---
name: pain
## Pain points

Custom test harnesses

- `cargo-test-macro`
- `trybuild`
- `trycmd`
- `toml-test-rs`
- `criterion`

???
For me, it hasn't been all roses. Day-to-day I'm dealing with libtest workarounds

---
## Holy envy

Stendahl's three rules of religious understanding:
1. When you are trying to understand another religion, you should ask the adherents of that religion and not its enemies.
2. Don't compare your best to their worst.
3. **Leave room for "holy envy."**

???

I also have "holy envy" for Python's testing, a term coined by a Church of
Sweden Bishop for discussing religions but is more broadly applicable.

If you know of others, I would love to hear of why you feel "holy env" for.

---
count: false
## Holy envy

```python
def pytest_addoption(parser):
*   parser.addoption(
        "--can-in-interface", default="None",
        action="store",
        help="The CAN interface to use with the tests")

*@pytest.fixture
 def can_in_interface(request):
    interface = request.config.getoption("--can-in-interface")
    if interface.lower() == "none":
*       pytest.skip("Test requires a CAN board")
    return interface

def test_wait_for_intf_communicating(can_in_interface):
    # ...
```

???
In this short pytest sample, we've got
- CLI extensions
- Fine-grained test fixtures
- Runtime skipping of tests

Among others

However, Rust is not Python.  Some of Python's features that help make pytest what it is include:
- Use of exceptions
- Dynamic nature of Python
- Decorators having a lower barrier of entry to build or use than proc macros
- Being third-party, it is easier to experiment and iterate
- Standard CLI parser

So let's step through some of these pain points

---
## Pain point: conditional ignores

```rust
#[test]
fn simple_hg() {

// ...
}
```
```console
$ cargo test
   Compiling cargo v0.72.0
    Finished test [unoptimized + debuginfo] target(s) in 0.62s
     Running  /home/epage/src/cargo/tests/testsuites/main.rs

running 1 test
* test new::simple_hg ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
```

???
Awesome, our test passes!

---
count: false
## Pain point: conditional ignores

```rust
#[test]
fn simple_hg() {
*   if !has_command("hg") {
*       return;
*   }

// ...
}
```
```console
$ cargo test
   Compiling cargo v0.72.0
    Finished test [unoptimized + debuginfo] target(s) in 0.62s
     Running  /home/epage/src/cargo/tests/testsuites/main.rs

running 1 test
* test new::simple_hg ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
```

???
Yes, if you are watching your coverage closely enough, you might identify this
sooner

---
## Pain point: conditional ignores

```rust
#[test]
fn simple_hg() {
*   if !has_command("hg") {
*       return;
*   }

// ...
}
```

```python
*@pytest.mark.skipif(!has_command("hg"), reason="requires `hg` CLI")
def test_simple_hg():
    pass
```

---
## Pain point: lack of fixtures

```rust
fn cargo_add_lockfile_updated() {
*   let scratch = tempfile::tempdir::new().unwrap();

// ...

}
```

???
RAII handle the role of fixtures for us

---
count: false
## Pain point: lack of fixtures

```rust
fn cargo_add_lockfile_updated() {
*   let scratch = tempfile::tempdir::new().unwrap();

// ...

*   scratch.close().unwrap();
}
```

???
The error being ignored on implicit close actually masked errors in some of my tests on Windows.

But how do we access to debug failures?
- Its cleaned up
- The name is not predictable

Cargo instead only cleans up their temp directory fixture on the next run but
that leads to a different problem: CI failing due to storage limits.  So how do
we identify which tests are taking up too much space?

---
## Pain point: lack of fixtures

```rust
fn cargo_add_lockfile_updated() {
*   let scratch = tempfile::tempdir::new().unwrap();

// ...

*   scratch.close().unwrap();
}
```
```python
*def cargo_add_lockfile_updated(tmpdir):
    # ...
```

---
## Pain point: lack of test generation

```rust
#[test]
fn integers() {
    let cases = [
        ("+99", 99),
        ("42", 42),
        ("0", 0),
        ("-17", -17),
        ("1_2_3_4_5", 1_2_3_4_5),
        ("0xF", 15),
        ("0o0_755", 493),
        ("0b1_0_1", 5),
        (&std::i64::MIN.to_string()[..], std::i64::MIN),
        (&std::i64::MAX.to_string()[..], std::i64::MAX),
    ];
*   for &(input, expected) in &cases {
        let parsed = integer.parse(new_input(input));
        assert_eq!(parsed, Ok(expected));
    }
}
```

???
Data driven tests are an easy way to cover a lot of cases (granted, property testing is even better)

However,
- You don't get context on `input`
- You have to fix them in-order, requiring careful ordering of the cases
- You don't get the bigger picture of whats working and failing
- You can't select a specific case to run / debug
- Debug output will be flooded from prior cases

Side note: Alternatively
- you could write a test per case, calling a shared function
- you could write a macro to generate a test per case

Side note: Another version of this is criterion with its bench groups

---
## Pain point: lack of test generation (part 2)

```rust
#[test]
fn ui() {
    let t = trybuild::TestCases::new();
*   t.compile_fail("tests/ui/*.rs");
}
```

.image-middle[![trybuild output](https://user-images.githubusercontent.com/1940490/57186576-7b0b5200-6e96-11e9-8bfd-2de705125108.png)]

???
Some projects help simplify large, complex test generation, including
- trybuild
- trycmd
- libtest-mimic
- criterion

Custom harnesses are a second class experience
- Require their own test binary
- Varying levels of support or extensions for how to interact with

---
## Pain point: lack of test generation

```rust
#[test]
fn ui() {
    let t = trybuild::TestCases::new();
*   t.compile_fail("tests/ui/*.rs");
}
```

```python
*@pytest.mark.parametrize("sample_rs", trybuild.find("tests/ui/*.rs"))
def ui(sample_rs):
*   trybuild.verify(sample_rs)
```

???

---
## Pain point: scaling up

```rust
#[cargo_test(requires_hg)]
fn simple_hg() {
    // ...
}
```
```rust
#[test]
fn ui() {
    let t = trybuild::TestCases::new();
    t.compile_fail("tests/ui/*.rs");
}
```
```rust
#[test]
fn cli_tests() {
    trycmd::TestCases::new()
        .case("tests/cmd/*.toml")
        .case("README.md");
}
```

???
While this wasn't even exhaustive from my own experience, I think a common
thread through this is "scaling up".

After enough pain and with enough contributors, projects will work around
these problems, like cargo having its own `cargo_test` macro with
fixtures coupled tightly to that macro.  This takes a toll on those projects
until they say enough is enough.  This also means they are each solving these
problems in different ways, losing that transferability of experience that was
one of the highlights of Rust's test experience and not being composable.

---
name: forward
class: title
# Path forward

???

I feel a pytest-like API would give us the extensibility needed to cover all of the
custom test harness use cases I know of.

The biggest issue is that this is a lot to stabilize for libtest.

The approach I'm taking is to develop the user-facing "pytest" features in an
external crate, experimenting with what extension points a "libtest2" would
need.

---
## Path forward

### Milestones

1. libtest2-mimic  ←we are here
2. libtest2
3. pytest
4. criterion, trybuild, etc
5. Merge libtest2 into libtest?

???
I'm trying to break this down into smaller milestones to help vet the design as we go.

My minimum expectations for this effort are:
- A first-in-class external test harness to integrate around
- A stabilized json output to improve `cargo test` / `libtest` interactions
  - I'm dogfooding the json format by implementing all other output formats in terms of it

---
## Help?

Questions
- What do you have "holy envy" for?
- What odd ball test scenarios have I not seen yet?

Get involved: github.com/epage/pytest-rs

Offload my work :)

???

Side note: interesting links
- See also [custom test harness](https://github.com/rust-lang/rust/issues/50297).
- [RFC 2011](https://github.com/rust-lang/rust/issues/44838) for nicer asserts
- [#5609](https://github.com/rust-lang/cargo/issues/5609) for running tests in parallel
- [#4324](https://github.com/rust-lang/cargo/issues/4324) for summarizing test results
- [#2832](https://github.com/rust-lang/cargo/issues/2832) for reducing test noise