#Error codes and cross contract calls

35 messages · Page 1 of 1 (latest)

digital dawn
#

I know @visual steeple is already working on cleaning up error codes. Things that seem to be missing right now are:

  • which codes are "fatal" (as in cannot be caught with try_call -- those are panics basically, for example when running out of cpu budget)
  • rollback strategy when catching errors -- right now it looks like we never rollback side effects to the store and this seems odd (reason we originally did not want to allow people to catch errors was that this is basically requires implementing something similar to LedgerTxn's CoW in the C++ code base)
#

actually that last one is rollback strategy when there is an error in general, as not having a try_call somewhere does not mean that there is not one at a higher level

gaunt sleet
#

now it looks like we never rollback side effects to the store
what kind of side effects do you mean? gracefully handling try_call acts as if it never happened (modulo spent cpu time/memory)

digital dawn
#

side effects in the store

gaunt sleet
#

sorry, what is 'store'?

digital dawn
#

storage

gaunt sleet
#

hmm, we do rollback storage, as well as auth (so e.g. nonce won't be consumed in some scenarios)

#

I'm not 100% sure about the events though

digital dawn
#

oh that rollbackpoint stuff

gaunt sleet
#

I also would like to revive the old topic of actually storing the error in the tx result (or meta?) - whatever refactoring we do won't be noticed much outside of maybe preflight (which has a different environment from core anyway)

digital dawn
#

events are properly rolledback except for debug events

#

(that is by design that debug events see the whole trace of execution)

gaunt sleet
#

there is an annoying inconsistency with the changes I'm doing now since exceeding e.g. read/write bytes can be handled at core level, while exceeding the CPU limit currently just causes INVOKE_HOST_FUNCTION_TRAPPED

radiant folio
#

There is another bug with errors. It appears that auth accounts errors impersonate their calling contract.

#

i.e. an auth account that errors, will appear as an error of the contract that is trying to do the authentication.

radiant folio
digital dawn
#

They are rolled back yes

#

re: error code at the operation level due to things happening before or after the invoke. If we think people will care to distinguish (from traps inside the invoke), we can just use different error codes

#

otherwise we can fake it and just put the details as if it was the result of the invoke (this may be annoying depending on how we shove the invoke result inside events -- what @stone wind is working on)

slow mortar
gaunt sleet
#

I think we need a bit more granular error reporting from the host (currently it's a blanket 'host function trapped')

digital dawn
#

I don't know if we want more error codes (that need to be handled in code everywhere client side) vs more debug events (for diagnostics). I don't think clients will do something different if fees were too low

radiant folio
gaunt sleet
#

I don't think clients will do something different if fees were too low
I envision a lot of confusion when the limit enforcement is launched

#

even without that, there is a lot of confusion as to why the 'host function trapped'. given that preflight is imperfect, it would be valuable to get something more useful from core than the blanket error

visual steeple
#

I've been looking at the error codes recently (made a spreadsheet to organize them as well as all the cases we're currently doing err_general) and thinking about ways to mitigate between the desire for "more specificity" for diagnostic assistance and "less specificity" for future compatibility, less-brittle evolution, simplicity. one thing I've observed is that the current organization (a major number with a bunch of sub-codes per major number) might not be the best, at least insofar as the minor numbers are often similar between majors.

#

I was thinking instead it might makes sense to treat the major number as something "outermost" like "the most recent subsystem this error caused to fail" and the minor number as "innermost", like a code that's about the narrow cause that set off the sequence of error propagations.

#

in other words: as errors propagate we might change the major number but keep the minor number. if you do an out-of-bounds vector access deep in a contract, for example, the minor number would remain "vec out of bounds" but the major number would start with the host subsystem it failed in, then change to something that identifies that it happened in-the-host rather than in-the-contract, then change to "contract call failed" as it propagates out to the contract's caller.

#

(in other-other words: I'm proposing making the two numbers orthogonal, tracking the two most-important ways we expect users to want to think about an error, because .. we want errors to remain small, can't convey infinite information!)

#

(it's also 64 - 8 = 56 bits, so if there were several different orthogonal bits of information we wanted to cram into an error code, we could probably manage fine, eg. it would probably work ok to have as many as 8 different 7-bit fields, or 7 8-bit fields; we're nowhere near having hundreds much less thousands of error codes)

#

(eg. we could fairly readily put a 7 or 8-frame mini stacktrace prefix or suffix inside an error code if that was a more valuable way of looking at it)

#

eg. "A happened while doing B, while doing C, while doing D, while doing E, ... some frames omitted ... while doing Y, while doing Z"

#

each coded by an 8 bit error-or-context number

#

one advantage of doing that is we'd have such "mini stacktraces" available in non-local-testing builds -- currently we only get "full stacktraces" in local-testing builds -- and could cut a huge amount of the dependency tree out of soroban (it currently includes an unwinder which in turn includes a dwarf reader)