I recently got the chance to redo the error handling in two different crates I help maintain. For liquid, I decided to write the error types by hand rather than use something like error-chain. In the case of assert_cli, I decided to finally give failure a try.

From failure's announcement:

Failure is a Rust library intended to make it easier to manage your error types. This library has been heavily influenced by learnings we gained from previous iterations in our error management story, especially the Error trait and the error-chain crate.

I think failure does a great job iterating on Rust's error management story, fixing a lot of fundamental problems in the Error trait and making it easier for new developers and prototypers to be successful. While a replacement for the Error trait is a disruptive change to the ecosystem, I feel the reasons are well justified and the crate looks like it does a great job bridging the gap.

On the other hand, I do feel that there are parts of failure that are a bit immature, particularly Context. The reason this is concerning is the implementation policies related to Context are coupled to the general Fail trait mechanism (see Separation of mechanism and policy). We cannot experiment and iterate on Context without breaking compatibility with Fail and there is a reasonable hesitance in breaking compatibility.

I'd recommend failure for anyone writing an application. I'd recommend library authors exercise caution, particularly if you have richer requirements for your errors. For the crate author, I'd suggest it is either not ready yet for "1.0" or to be more open to breaking changes than "the distant future". Below I'll go into more specific areas I think failure can be improved.

I know it is less than ideal to receive this type of feedback "late" in the process (1.0 is expected to release next week). I was strapped for time and only recently had solid use cases to push failures limits. I at least tried to provide theoretical feedback early on based on reading the code and docs out of an eagerness for failure to succeed. The ability to create a thorough analysis from just reading the docs / code is limited and the weight of such feedback is reasonably lower.

My Recent Failures

So let's step through my recent case studies in error management

liquid

liquid is a rust implementation of Shopify's liquid template language.

As mentioned above, I recently did a major revamp of liquids errors. I ended up skipping failure. It was easier for me to write my error types from scratch than to take the time to figure out how to map my needs to failure. I was also concerned about making an unstable crate such a fundamental part of my API.

My goals were:

An example of a template backtrace:

Error: liquid: Expected whole number, found `fractional number`
  with:
    end=10.5
from: {% for i in (1..max) reversed %}
from: {% if "blue skies" == var %}
  with:
    var=10

Here is a rough sketch of liquids errors:

#[derive(Fail, Debug)]
pub struct Error {
    inner: Box<InnerError>,
}

#[derive(Fail, Debug)]
struct InnerError {
    msg: borrow::Cow<'static, str>,
    user_backtrace: Vec<Trace>,
    #[cause] cause: Option<ErrorCause>,
}

#[derive(Clone, PartialEq, Eq, Debug, Default)]
pub struct Trace {
    trace: Option<String>,
    context: Vec<(borrow::Cow<'static, str>, String)>,
}

I've created various "ResultExt" traits to make it some-what ergonomic to create:

let value = value
    .as_scalar()
    .and_then(Scalar::to_integer)
    .ok_or_else(|| Error::with_msg(format!("Expected whole number, found `{}`", expected, value.type_name()))
    .context_with(|| (arg_name.to_string().into(), value.to_string()))?;
...
let mut range = self.range
    .evaluate(context)
    .trace_with(|| self.trace().into())?;

While the context API needs some work, the overall approach worked great and provides very help error messages to the user.

assert_cli

assert_cli provides assertions for program behavior to help with testing.

This week, I started on a refactoring of assert_cli in an effort to move it to "1.0" for the CLI working group. I needed some new errors and rather than continuing to extend the existing brittle system based on error-chain, I thought I'd give failure a try. I figured breaking changes in failure would have minimal impact on my users because 99% of assert_cli users are just unwrapping the error in their tests rather than passing it along.

I structured assert_cli's error-chain errors to leverage chaining. The chaining hierarchy is something like:

Most of these errors really just exist for the sake of adding context. This is greatly simplified by leveraging failure::Context and ErrorKind idiom.

This is what it looks like, in Rust pseudo-code:

#[derive(Copy, Clone, Eq, PartialEq, Debug, Fail)]
pub enum AssertionKind {
    #[fail(display = "Spawn failed.")] Spawn,
    #[fail(display = "Status mismatch.")] StatusMismatch,
    #[fail(display = "Exit code mismatch.")] ExitCodeMismatch,
    #[fail(display = "Output mismatch.")] OutputMismatch,
}

#[derive(Debug)]
pub struct AssertionError {
    inner: failure::Context<AssertionKind>,
}

Most of the rest of this post goes into my initial experience in converting over to this.

Feedback on failure::Context

error-chain and friends have been around for a while and given us different takes on how to write and chain errors but there hasn't been too much experimentation with adding context to errors. In failures case, it is using roughly the same approach from when it was first announced.

Context serves two purposes in failure.

Decouple Roles

Coupling these roles is initially convenient. A user can quickly create an error of from any piece of data they have and there isn't need for another struct that looks and acts similarly to Context.

The problem is in the details.

Say my errors are wrapped like this:

Context -> Context -> AssertionError -> io::Error

(remember: Context is a Fail)

Suggestions:

Errors are Displayed in Inverted Order

Because Context generically wraps afailure::Error`, the ordering is inverted when rendering an error for a user.

If I were to naively switch liquid to failure my errors would change from

Error: liquid: Expected whole number, found `fractional number`
  with:
    end=10.5
from: {% for i in (1..max) reversed %}
from: {% if "blue skies" == var %}
  with:
    var=10

to

end=10.5
cause: Error: liquid: Expected whole number, found `fractional number`
cause: {% for i in (1..max) reversed %}
cause: var=10
cause: {% if "blue skies" == var %}

Suggestion:

Better support type contracts

failure::Error is a fancy boxed version of failure::Fail. This is a great solution for prototyping and applications like Cobalt.

Libraries can be a different beast. In cases like aasert_cli and liquid, I want to ensure:

Having a type-erased failure::Fail makes this impossible. Any ? could be turning an error from a dependency into a failure::Fail and passing it right back up to my user. The only way for me to know is to exercise every error path and inspect the output.

Instead I prefer to return Result<T, AssertionError> instead of Result<T, failure::Error>.

That's fine, failure::Error is just a convenience that failure provides, like with Box<Error>, except:

The only alternative is to reimplement the Context machinery that is built-in to the failure::Fail and failure::ResultExt traits.

Suggestion:

Support a ContextPair

When converting assert_cli to failure, I found failure::Context works great when you want to dump strings but has limitations for my more common cases:

// Good: failure::Context works well in this case
return Err(AssertionError::new(AssertionKind::OutputMismatch))
    .context("expected to contain")
// Bad: failure::Context loses the context in this case
return Err(AssertionError::new(AssertionKind::OutputMismatch))
    .context(needle)
    .context(haystack)
// Alternative: Works but is a bit verbose, especially considering this is going to be 90% of my Contexts
return Err(AssertionError::new(AssertionKind::OutputMismatch))
    .context_with(|| format!("needle: {}", needle))
    .context_with(|| format!("haystack: {}", haystack))

So I added this:

#[derive(Debug)]
pub struct ContextPair
{
    key: &'static str,
    context: impl Display,
}

which can be used like:

return Err(AssertionError::new(AssertionKind::OutputMismatch))
    .context("expected to contain")
    .context(ContextPair("needle", needle.to_owned()))
    .context(ContextPair("haystack", haystack.to_owned()));

Suggestion:

Feedback on failure::Error

failure::Error is a fancy boxed version of Fail. Their APIs generally deviate in ways that make sense for their different feature sets.

A quick summary of their behavior:

  Fail Error
cause child item inner
causes starts with self starts with inner
root_cause includes self includes self
downcast acts on self acts on inner

failure::Error is not a Fail

While failure::Error acts as a boxed Failed, it isn't a Fail. This isn't possible without specialization because From<Fail> for Error would conflict with From<T> for T.

failure::Error::cause is a foot-gun

As note above, Error and Fail have similar APIs and failure::Error mostly behaves as a proxy for the inner Fail with cause being the exception. Despite the causes having different signatures (Fail returns an Option<&Fail> while cause returns &Fail>), I think this is going to trip up a lot of people.

Suggestion:

Feedback on failure::Fail

causes and root_cause are foot-guns

If cause is the child error, then causes would start with that and iterate beneath it, right? Unfortunately, no. The current Fail is the first cause.

Similarly, if your Fail has no cause, then it will be the root_cause.

I can understand the need for functions that behave like these but not with names that imply they won't include your current Fail. Granted, I personally would find causes not returning the top Fail more convinient.

Suggestion:

Addendum

Additional error management case studies

Cobalt

Cobalt is a static site generator that I maintain. As an end-user application, it is a perfect fit for failure. I haven't converted it over due to lack of a pressing need compared to the desired features.

Day Job

This is from a 15+ year old, multi-million LoC product that is 80% user-facing library and 20% user-facing application targeted at non-programmers.

Putting a flexible product into the hands of non-programmers means you need to provide a lot of guidance to help people get out of situations you couldn't predict. We consider good errors essential for our users.

The user-form of our errors look like (translated to Rust pseudo-code):

#[derive(Fail, Debug, Copy, Clone, ...)]
#[repr(C)]
pub enum ErrorKind {
    ...
}

#[derive(Fail, Debug)]
pub struct PublicError {
    kind: ErrorKind,
    context: String,
}

and the internal-form of our errors look like (translated to Rust pseudo-code):

#[derive(Fail, Debug, Copy, Clone, ...)]
#[repr(C)]
enum ContextKind {
    ...
}

struct Context {
    value: Vec<(ContextKind, Box<Display>)>,
    backtrace: failure::Backtrace,
}

struct InternalError {
    kind: ErrorKind,
    context: Option<Box<Context>>,
}

Things of note: