I recently got the chance to redo the error handling in two different crates I
help maintain. For liquid, I decided to write the error types by
hand rather than use something like error-chain. In the case
of assert_cli, I decided to finally give failure a
try.
From failure's announcement:
Failure is a Rust library intended to make it easier to manage your error types. This library has been heavily influenced by learnings we gained from previous iterations in our error management story, especially the
Errortrait and theerror-chaincrate.
I think failure does a great job iterating on Rust's error management story,
fixing a lot of fundamental problems in the Error trait and making it easier
for new developers and prototypers to be successful. While a replacement for
the Error trait is a disruptive change to the ecosystem, I feel the reasons
are well justified and the crate looks like it does a great job bridging the
gap.
On the other hand, I do feel that there are parts of failure that are a bit
immature, particularly Context. The reason this is concerning is
the implementation policies related to Context are coupled to the general
Fail trait mechanism (see Separation of mechanism and policy).
We cannot experiment and iterate on Context without breaking compatibility
with Fail and there is a reasonable hesitance in breaking
compatibility.
I'd recommend failure for anyone writing an application. I'd recommend
library authors exercise caution, particularly if you have richer requirements
for your errors. For the crate author, I'd suggest it is either not ready yet
for "1.0" or to be more open to breaking changes than "the distant
future". Below I'll go into more specific areas I think failure
can be improved.
I know it is less than ideal to receive this type of feedback "late" in the
process (1.0 is expected to release next week). I was strapped for
time and only recently had solid use cases to push failures limits. I at
least tried to provide theoretical feedback early on based on reading the code
and docs out of an eagerness for failure to succeed. The ability to create a
thorough analysis from just reading the docs / code is limited and the weight
of such feedback is reasonably lower.
My Recent Failures
So let's step through my recent case studies in error management
liquid
liquid is a rust implementation of Shopify's liquid template language.
As mentioned above, I recently did a major revamp of liquids
errors. I ended up
skipping failure. It was easier for me to write my error types from scratch
than to take the time to figure out how to map my needs to failure. I was
also concerned about making an unstable crate such a fundamental part of my
API.
My goals were:
- Low friction for detailed error reports.
- Give the user context to the errors with template back traces.
An example of a template backtrace:
Error: liquid: Expected whole number, found `fractional number`
with:
end=10.5
from: {% for i in (1..max) reversed %}
from: {% if "blue skies" == var %}
with:
var=10
Here is a rough sketch of liquids errors:
#[derive(Fail, Debug)]
pub struct Error {
inner: Box<InnerError>,
}
#[derive(Fail, Debug)]
struct InnerError {
msg: borrow::Cow<'static, str>,
user_backtrace: Vec<Trace>,
#[cause] cause: Option<ErrorCause>,
}
#[derive(Clone, PartialEq, Eq, Debug, Default)]
pub struct Trace {
trace: Option<String>,
context: Vec<(borrow::Cow<'static, str>, String)>,
}
I've created various "ResultExt" traits to make it some-what ergonomic to create:
let value = value
.as_scalar()
.and_then(Scalar::to_integer)
.ok_or_else(|| Error::with_msg(format!("Expected whole number, found `{}`", expected, value.type_name()))
.context_with(|| (arg_name.to_string().into(), value.to_string()))?;
...
let mut range = self.range
.evaluate(context)
.trace_with(|| self.trace().into())?;
While the context API needs some work, the overall approach worked great and provides very help error messages to the user.
assert_cli
assert_cli provides assertions for program behavior to help
with testing.
This week, I started on a refactoring of assert_cli in an effort to move it
to "1.0" for the CLI working group. I needed some new errors and rather than
continuing to extend the existing brittle system based on error-chain, I
thought I'd give failure a try. I figured breaking changes in failure would
have minimal impact on my users because 99% of assert_cli users are just
unwrapping the error in their tests rather than passing it along.
I structured assert_cli's error-chain errors to leverage chaining. The chaining hierarchy is something like:
- Spawn Failed
- Assertion Failed
- Status (success / failure) Failed
- Exit Code Failed
- Output
- Strings matched when shouldn't
- Strings matched when should
- Bytes matched when shouldn't
- Bytes matched when should
- Sub-strings matched when shouldn't
- Sub-strings matched when should
- Byte subset matched when shouldn't
- Byte subset matched when should
- Predicate failed
Most of these errors really just exist for the sake of adding context. This is
greatly simplified by leveraging failure::Context and ErrorKind
idiom.
This is what it looks like, in Rust pseudo-code:
#[derive(Copy, Clone, Eq, PartialEq, Debug, Fail)]
pub enum AssertionKind {
#[fail(display = "Spawn failed.")] Spawn,
#[fail(display = "Status mismatch.")] StatusMismatch,
#[fail(display = "Exit code mismatch.")] ExitCodeMismatch,
#[fail(display = "Output mismatch.")] OutputMismatch,
}
#[derive(Debug)]
pub struct AssertionError {
inner: failure::Context<AssertionKind>,
}
Most of the rest of this post goes into my initial experience in converting over to this.
Feedback on failure::Context
error-chain and friends have been around for a while and given us different
takes on how to write and chain errors but there hasn't been too much
experimentation with adding context to errors. In failures case, it is using
roughly the same approach from when it was first announced.
Context serves two purposes in failure.
- Wrap a
failure::Error, adding a singleimpl Displayof context (see documentation). - Quick and dirty way to create or chain an error (see documentation).
Decouple Roles
Coupling these roles is initially convenient. A user can quickly create an
error of from any piece of data they have and there isn't need for another
struct that looks and acts similarly to Context.
The problem is in the details.
Say my errors are wrapped like this:
Context -> Context -> AssertionError -> io::Error
(remember: Context is a Fail)
- Which
Fails'Terminationtrait should be respected for?inmain(when that becomes a thing)? - How does a user of my library identify my documented error type for making programmatically handling the error?
- If an application only wants to show the leaf error and not the causes, how does it identify what is a leaf error?
- How should adding
Contextinno_stdwork? With the current approach, theContextis passed back and the original error is dropped.
Suggestions:
- Separate the roles by providing an alternative "easy error".
- Separate the roles so a clearer
no_stdpolicy can exist. - Remove
Contextfrom the error chain by moving theContextfrom an error decorator to an error member, likebacktraceandcause. - Experiment with ways to give
no_stdusers more control over theContextpolicy like moving theContextfrom an error decorator to an error member, likebacktraceandcause.
Errors are Displayed in Inverted Order
Because Context generically wraps a failure::Error, the ordering is inverted when rendering an error for a user.
If I were to naively switch liquid to failure my errors would change from
Error: liquid: Expected whole number, found `fractional number`
with:
end=10.5
from: {% for i in (1..max) reversed %}
from: {% if "blue skies" == var %}
with:
var=10
to
end=10.5
cause: Error: liquid: Expected whole number, found `fractional number`
cause: {% for i in (1..max) reversed %}
cause: var=10
cause: {% if "blue skies" == var %}
Suggestion:
- Associate contexts with an error by, again, moving the
Contextfrom an error decorator to an error member, likebacktraceandcause.
Better support type contracts
failure::Error is a fancy boxed version of failure::Fail. This is a great
solution for prototyping and applications like Cobalt.
Libraries can be a different beast. In cases like aasert_cli and liquid, I want to ensure:
- User has a backwards compatible, defined, finite set of errors to programmatically deal with.
- The errors are useful.
Having a type-erased failure::Fail makes this impossible. Any ? could be
turning an error from a dependency into a failure::Fail and passing it right
back up to my user. The only way for me to know is to exercise every error path
and inspect the output.
Instead I prefer to return Result<T, AssertionError> instead of Result<T, failure::Error>.
That's fine, failure::Error is just a convenience that failure provides, like with Box<Error>, except:
- If I want to add
Contextto my errors, the only reasonable ergonomic way to do so is to returnResult<T, failure::Error>. Contextonly supports wrapping afailure::Error, making it so any error can escape through aContext.
The only alternative is to reimplement the Context machinery that is built-in to the failure::Fail and failure::ResultExt traits.
Suggestion:
- Allow using context without
failure::Errorby, yet again, moving theContextfrom an error decorator to an error member, likebacktraceandcause.
Support a ContextPair
When converting assert_cli to failure, I found failure::Context works
great when you want to dump strings but has limitations for my more common
cases:
// Good: failure::Context works well in this case
return Err(AssertionError::new(AssertionKind::OutputMismatch))
.context("expected to contain")
// Bad: failure::Context loses the context in this case
return Err(AssertionError::new(AssertionKind::OutputMismatch))
.context(needle)
.context(haystack)
// Alternative: Works but is a bit verbose, especially considering this is going to be 90% of my Contexts
return Err(AssertionError::new(AssertionKind::OutputMismatch))
.context_with(|| format!("needle: {}", needle))
.context_with(|| format!("haystack: {}", haystack))
So I added this:
#[derive(Debug)]
pub struct ContextPair
{
key: &'static str,
context: impl Display,
}
which can be used like:
return Err(AssertionError::new(AssertionKind::OutputMismatch))
.context("expected to contain")
.context(ContextPair("needle", needle.to_owned()))
.context(ContextPair("haystack", haystack.to_owned()));
Suggestion:
- Provide something like
ContextPairsofailurecan help developers fall into the pit of success for giving helpful errors to users.
Feedback on failure::Error
failure::Error is a fancy boxed version of Fail. Their APIs generally
deviate in ways that make sense for their different feature sets.
A quick summary of their behavior:
Fail | Error | |
|---|---|---|
cause | child item | inner |
causes | starts with self | starts with inner |
root_cause | includes self | includes self |
downcast | acts on self | acts on inner |
failure::Error is not a Fail
While failure::Error acts as a boxed Failed, it isn't a Fail. This
isn't possible without specialization because From<Fail> for Error would conflict with
From<T> for T.
failure::Error::cause is a foot-gun
As note above, Error and Fail have similar APIs and failure::Error mostly
behaves as a proxy for the inner Fail with cause being the
exception. Despite
the causes having different signatures (Fail returns an Option<&Fail>
while cause returns &Fail>), I think this is going to trip up a lot of
people.
Suggestion:
Error::causeshould behave exactly likeFail::causeto avoid surprising developers.
Feedback on failure::Fail
causes and root_cause are foot-guns
If cause is the child error, then causes would start with that and iterate
beneath it, right? Unfortunately, no. The current Fail is the first cause.
Similarly, if your Fail has no cause, then it will be the root_cause.
I can understand the need for functions that behave like these but not with
names that imply they won't include your current Fail. Granted, I personally
would find causes not returning the top Fail more convenient.
Suggestion:
- Either find new names or change the behavior of the functions to avoid surprising developers.
Addendum
Additional error management case studies
Cobalt
Cobalt is a static site generator that I maintain. As an end-user application,
it is a perfect fit for failure. I haven't converted it over due to lack of a
pressing need compared to the desired features.
Day Job
This is from a 15+ year old, multi-million LoC product that is 80% user-facing library and 20% user-facing application targeted at non-programmers.
Putting a flexible product into the hands of non-programmers means you need to provide a lot of guidance to help people get out of situations you couldn't predict. We consider good errors essential for our users.
The user-form of our errors look like (translated to Rust pseudo-code):
#[derive(Fail, Debug, Copy, Clone, ...)]
#[repr(C)]
pub enum ErrorKind {
...
}
#[derive(Fail, Debug)]
pub struct PublicError {
kind: ErrorKind,
context: String,
}
and the internal-form of our errors look like (translated to Rust pseudo-code):
#[derive(Fail, Debug, Copy, Clone, ...)]
#[repr(C)]
enum ContextKind {
...
}
struct Context {
value: Vec<(ContextKind, Box<Display>)>,
backtrace: failure::Backtrace,
}
struct InternalError {
kind: ErrorKind,
context: Option<Box<Context>>,
}
Things of note:
InternalErrorhas a way of localizing and formattingContextinto thePublicError::context.PublicError/InternalErrorhave a way of localizingErrorKind.ErrorKindalways includes "what", "why", and "how to fix it".
- OS and library errors are manually converted to
InternalError. Sometimes the root cause is captured inContextKinds that we hide in release builds. - Layers higher in the stack may add to
Context. They can even conditionally add tocontext.valueto avoid duplication. - Performance is controlled by only heap allocating or capturing a back trace if context is added.