Using `Hash(ScVal)` as the contract data key | Stellar Developers | Page 1

earnest hornet Mar 24, 2023, 1:43 PM

#

In https://github.com/stellar/rs-soroban-env/issues/629 discussion of limiting contract data key size. It was proposed to change contract data LedgerKey's ScVal to Hash(ScVal), thus making the LedgerKey size bounded (to 64 bytes, contractID_hash + ScVal_hash). This will have the following implications: 1. the key is no longer transparent. If the contract wants a piece of info (that used to be in key) transparent, it has to be included in the ContractDataEntry. 2. Storage access (as is now) will become more expensive gas-wise, accessing the storage by ScVal will require the host to hash the key first, unless 3. contract developers keep the preimage of their storage key and use hash to interact with the storage directly (we will be providing storage interfaces that work with hash directly). I do not see any issue with this conceptually (aside from implementation details e.g. adding a cache for hashes in the host), but want to get the opinions of others @fluid vessel @vestal sand @viral cedar @abstract phoenix as well as contract developers.

GitHub

Contract Data: limit key size · Issue #629 · stellar/rs-soroban-env

We need to decide how to limit the key size: it's a scalability issue in any index (like buckets db), and will cause problems in any of the ledger space reduction approach. Right now the key an...

viral cedar Mar 24, 2023, 3:20 PM

#

adding a cache for hashes in the host
~~I think we can do the rawval-keyed map (instead of the ledger keys) and only hash once when we 'flush' this.~~ nvm, that's wrong. but in any case caching probably doesn't help that much in a general case- I expect the average number of ledger key accesses per contract call to be close to 1

#

he key is no longer transparent
IMO that's not a huge deal. I don't think monitoring the ledger state is a good idea most of the time (which is why I'm advocating for making the SDK struct representation to be vector-based without field names). block explorers will mostly rely on the events. it is also possible to standardize some keys (say, balance is always [symbol("balance"), address], which allows for building the keys off-chain), or (better) use metadata to expose read-only access to the 'interesting' entries. arguably everything else is the contract implementation detail and shouldn't necessarily be generally easy to parse

#

. If the contract wants a piece of info (that used to be in key) transparent, it has to be included in the ContractDataEntry.
I don't think this is ever needed. in order to build the key you need to have all the data required to build it, hence there is no need to read it again from the storage

#

actually nvm, this talks about transparency only... which I've commented about above

#

Storage access (as is now) will become more expensive gas-wise, accessing the storage by ScVal will require the host to hash the key first, unless
what is the cost of sha256 currently? I think that's the main tradeoff here

earnest hornet Mar 24, 2023, 4:41 PM

#

viral cedar > 2. Storage access (as is now) will become more expensive gas-wise, accessing t...

It is a few 1000s cpu instructions, on the similar level as invoking a small host function. Not cheap so we'll definitely need the cache.

viral cedar Mar 24, 2023, 4:44 PM

#

given that this is about 3% of a VM invocation, I'd say the overall impact seems marginal (especially given that this also saves fees on IO)

#

and also on the tx size, given that the ledger footprint can be compressed into 32-bytes/key

swift tapir Mar 24, 2023, 4:48 PM

#

I don't love this approach. It's true that we don't expect dapps to directly use the storage but having meaningful inspectable keys allows building powerful developer tooling

viral cedar Mar 24, 2023, 4:49 PM

#

I'm not sure building tooling based on the ledger state and on contract data entries is a good idea

#

are there any existing tools that are built using the full ledger scans?

#

FWIW contract devs might still build short obfuscated keys for the sake of lowering the fees/rent, so it's not like we have any guarantees on the contract storage interpretability; it's flexible and internal to the contract by design

abstract phoenix Mar 24, 2023, 7:20 PM

#

my preference here is not to hash automatically

#

I would put a bound on the size and let the user choose what they want to do

#

if they want to use hashes as keys they always can -- the hash function is right there -- but they can also just use integers starting from 0 or the name of their cat or whatever. the only plausible advantage to hashing things automatically is that they don't have to think about the size of a key -- all key sizes work because they can be hashed to a constant size. but the whole premise of the argument is that users will care about sizes.

#

I also think that 99% of contracts will store ledger data under simple symbol names which are shorter than a 32-byte hash. if in some case users try storing big highly-structured keys, they'll hit the limit and change strategy, possibly to hashing, possibly to something else. I don't mind setting the limit reasonably short.

viral cedar Mar 24, 2023, 7:25 PM

#

the upside of hashing is that we can bound the key size at 32 byes, which isn't achievable at the contract level. maybe it's not worth it, IDK, but I'm also not sure if it's worth pursuing some interpretability of contract data entries

abstract phoenix Mar 24, 2023, 7:26 PM

#

it's helpful for debugging if nothing else. and I think entry browsing will be a thing.

viral cedar Mar 24, 2023, 7:26 PM

#

(don't forget that every key is >32 bytes)

abstract phoenix Mar 24, 2023, 7:26 PM

#

what do you mean "isn't achievable at the contract level"?

viral cedar Mar 24, 2023, 7:27 PM

#

again, I wonder if people would use the actual bucket list for debugging... the sandbox ledger entries are json anyway, so they can be whatever

abstract phoenix Mar 24, 2023, 7:27 PM

#

yes people will use the actual ledger for debugging

viral cedar Mar 24, 2023, 7:27 PM

#

I mean that we can hash contract id + scval and have 32 byte keys

abstract phoenix Mar 24, 2023, 7:27 PM

#

100% certain "look at the data" is a way people debug things

viral cedar Mar 24, 2023, 7:27 PM

#

hmm, but how do they even do that?

abstract phoenix Mar 24, 2023, 7:28 PM

#

... by downloading it and/or requesting it from a server that has it on hand

viral cedar Mar 24, 2023, 7:28 PM

#

(I mean, there is xdrquery util I wrote for the core, but I doubt there are external users)

abstract phoenix Mar 24, 2023, 7:29 PM

#

viral cedar I mean that we can hash contract id + scval and have 32 byte keys

do you mean you want to store CONTRACT_DATA entries keyed by the hashed combination of their contract ID and scval, rather than as a 2-level key with contract ID followed by scval?

viral cedar Mar 24, 2023, 7:29 PM

#

yeah, if we're hashing, why not use hash as the sole key for the whole entry

#

I think hashing just scval is counter-productive anyway

abstract phoenix Mar 24, 2023, 7:30 PM

#

because then you can't find much less delete all the contract entries of a contract

#

(it also defeats IO locality)

#

you don't want to do that

viral cedar Mar 24, 2023, 7:30 PM

#

but you can't delete the contract entries anyway...

#

but if locality is an issue, then sure

abstract phoenix Mar 24, 2023, 7:31 PM

#

if I delete a contract instance I want to be able to delete its data entries

viral cedar Mar 24, 2023, 7:31 PM

#

then I'd say that I vote against scval hashing as well; it's pretty pointless size-wise

#

but you can't delete a contract instance either...

abstract phoenix Mar 24, 2023, 7:31 PM

#

???

viral cedar Mar 24, 2023, 7:32 PM

#

there are no deletion mechanisms for the contract instances/wasm blobs (besides state expiration)

abstract phoenix Mar 24, 2023, 7:32 PM

#

we're going to expire and GC them eventually

viral cedar Mar 24, 2023, 7:32 PM

#

and these wouldn't trigger the data deletion automatically, at least as far as I know

abstract phoenix Mar 24, 2023, 7:32 PM

#

I was certainly assuming they would!

viral cedar Mar 24, 2023, 7:33 PM

#

(that would be potentially slow and also wrong semantically; you don't want to lose all the entries just because the contract entry rent has expired)

abstract phoenix Mar 24, 2023, 7:33 PM

#

uhh

#

there's no point expiring a contract otherwise

viral cedar Mar 24, 2023, 7:33 PM

#

the entries will expire as well eventually

#

but we don't want to lose thousands of user balances just because the rent on the instance has somehow defaulted

#

it's easy to unarchive the contract entry; it's much harder to unarchive the user entries

abstract phoenix Mar 24, 2023, 7:34 PM

#

hmm

viral cedar Mar 24, 2023, 7:35 PM

#

I don't think we've considered 'cascading' expirations; sometimes they're viable but we decided that for these cases just bundling data in the same entry is probably enough

#

(that's mostly concerning some security concerns, not purging all the contract data automatically)

abstract phoenix Mar 24, 2023, 7:37 PM

#

ok. I still think there's a likely need to go from contract => entries, for debugging if nothing else, and at very least a locality argument. hashing everything into a "I have no idea what this key even is" key is IMO a big design error in eth

#

it optimizes for one thing (uniformity) at the expense of everything else

viral cedar Mar 24, 2023, 7:39 PM

#

yeah, then there is really no point in hashing scvals; that will result in more fees more often than not. I have my doubts on usability of contract data scans (at least for now), but I agree that it's useful to have this at least as an extension point

#

that said, what is a viable limit then? would 1K be feasible? or should we be more conservative? I think it should be at least 256 bytes

abstract phoenix Mar 24, 2023, 7:43 PM

#

I think it should be the same limit as the topic limit, and yeah somewhere around there seems reasonable. symbol is max 64 bytes, so imagine 1 hash + 1 symbol + vec overhead. 128 bytes? 256?

viral cedar Mar 24, 2023, 7:44 PM

#

symbol is 32 bytes, isn't it?

#

I'm think more along the lines of 2-3 addresses (each of which is 36 bytes IIUC)

#

plus a symbol

#

and vec/tags

abstract phoenix Mar 24, 2023, 7:45 PM

#

const SCSYMBOL_LIMIT = 32 guess so!

#

so yeah you can easily bump into 128 but .. 256 is likely to cover say 5 addresses and 2 symbols and a vector

#

when we've discussed size limits before we talked about making it predictable by doing it in terms of total number of leaf objects and max size of a leaf object

#

eg. 8 objects each of one of the following leaf types (any number type, symbol, address, bytes <= 32, string <= 32)

viral cedar Mar 24, 2023, 7:48 PM

#

hmm, again, why not just look at the size of the resulting ledger key? looking at the leaf objects makes it unpredictable because it ignres the non-leaf structure

#

and your actual key sizes might deviate arbitrarily

abstract phoenix Mar 24, 2023, 7:49 PM

#

because making it depend on XDR makes it hard for a user to predict

#

they have to build a mental model of serialization

viral cedar Mar 24, 2023, 7:49 PM

#

validation is especially trivial for the ledger keys - we can check it at tx validation point

abstract phoenix Mar 24, 2023, 7:49 PM

#

which isn't present at the API level

#

we can account for the non-leaf structure too (eg. no more than 16 objects total)

viral cedar Mar 24, 2023, 7:50 PM

#

this seems too arbitrary for me...

abstract phoenix Mar 24, 2023, 7:50 PM

#

making users think of XDR is worse. we've been over this multiple times in threads.

#

this whole topic is about an arbitrary limit, but it has to be comprehensible

#

https://github.com/stellar/stellar-protocol/issues/1298

GitHub

CAP-0056 - Introduce a limited ScVal? · Issue #1298 · stellar/stell...

CAP-0056 introduces the concept of a topic, which is an ScVal, but restricted to types that aren't Vecs, Maps, or Binaries > 32 bytes. This restriction will be enforced by the host, but fail...

viral cedar Mar 24, 2023, 7:52 PM

#

the serialization format is, say +20 bytes overhead... I don't think users need to think about this for the most part, as long as we have tooling that tells about exceeding the size limits soon enough

abstract phoenix Mar 24, 2023, 7:53 PM

#

it's not that they can't find out when it happens, it's that the boundary between it happening and not-happening has to do with the sum across a serialization pass, which means they might fairly easily have code that works in testing but fails in production because people use slightly longer symbols or strings

#

if it's structural it's more likely the tested and production code use the same structure to build a key

viral cedar Mar 24, 2023, 8:09 PM

#

hmm, failure due to too large inputs is expected

#

especially for the unbounded types

#

like if something uses bytes or string as input, then it's bound to fail for some user inputs, which is expected

#

or vector

#

I wouldn't be overly concerned about keys, but losing flexibility in ledger entry values is problematic

#

like why can't I store an array of a 1000 i128?

#

given that there is a limit for bytes, there is incentive to just serialize scval to bytes and store that...

abstract phoenix Mar 24, 2023, 8:12 PM

#

well, this was a conversation about keys not vals

#

and I'm just reflecting previous conversation. "depending on the user knowing XDR serialization" wasn't a popular choice.

viral cedar Mar 24, 2023, 8:12 PM

#

true, that's a neighbor thread that discusses this

abstract phoenix Mar 24, 2023, 8:13 PM

#

I gotta go now, though, I think we're at least on the same page about "don't force hashing"?

#Using `Hash(ScVal)` as the contract data key