In https://github.com/stellar/rs-soroban-env/issues/629 discussion of limiting contract data key size. It was proposed to change contract data LedgerKey's ScVal to Hash(ScVal), thus making the LedgerKey size bounded (to 64 bytes, contractID_hash + ScVal_hash). This will have the following implications: 1. the key is no longer transparent. If the contract wants a piece of info (that used to be in key) transparent, it has to be included in the ContractDataEntry. 2. Storage access (as is now) will become more expensive gas-wise, accessing the storage by ScVal will require the host to hash the key first, unless 3. contract developers keep the preimage of their storage key and use hash to interact with the storage directly (we will be providing storage interfaces that work with hash directly). I do not see any issue with this conceptually (aside from implementation details e.g. adding a cache for hashes in the host), but want to get the opinions of others @fluid vessel @vestal sand @viral cedar @abstract phoenix as well as contract developers.
#Using `Hash(ScVal)` as the contract data key
91 messages · Page 1 of 1 (latest)
adding a cache for hashes in the host
I think we can do the rawval-keyed map (instead of the ledger keys) and only hash once when we 'flush' this.nvm, that's wrong. but in any case caching probably doesn't help that much in a general case- I expect the average number of ledger key accesses per contract call to be close to 1
he key is no longer transparent
IMO that's not a huge deal. I don't think monitoring the ledger state is a good idea most of the time (which is why I'm advocating for making the SDK struct representation to be vector-based without field names). block explorers will mostly rely on the events. it is also possible to standardize some keys (say, balance is always [symbol("balance"), address], which allows for building the keys off-chain), or (better) use metadata to expose read-only access to the 'interesting' entries. arguably everything else is the contract implementation detail and shouldn't necessarily be generally easy to parse
. If the contract wants a piece of info (that used to be in key) transparent, it has to be included in the ContractDataEntry.
I don't think this is ever needed. in order to build the key you need to have all the data required to build it, hence there is no need to read it again from the storage
actually nvm, this talks about transparency only... which I've commented about above
- Storage access (as is now) will become more expensive gas-wise, accessing the storage by ScVal will require the host to hash the key first, unless
what is the cost of sha256 currently? I think that's the main tradeoff here
It is a few 1000s cpu instructions, on the similar level as invoking a small host function. Not cheap so we'll definitely need the cache.
given that this is about 3% of a VM invocation, I'd say the overall impact seems marginal (especially given that this also saves fees on IO)
and also on the tx size, given that the ledger footprint can be compressed into 32-bytes/key
I don't love this approach. It's true that we don't expect dapps to directly use the storage but having meaningful inspectable keys allows building powerful developer tooling
I'm not sure building tooling based on the ledger state and on contract data entries is a good idea
are there any existing tools that are built using the full ledger scans?
FWIW contract devs might still build short obfuscated keys for the sake of lowering the fees/rent, so it's not like we have any guarantees on the contract storage interpretability; it's flexible and internal to the contract by design
my preference here is not to hash automatically
I would put a bound on the size and let the user choose what they want to do
if they want to use hashes as keys they always can -- the hash function is right there -- but they can also just use integers starting from 0 or the name of their cat or whatever. the only plausible advantage to hashing things automatically is that they don't have to think about the size of a key -- all key sizes work because they can be hashed to a constant size. but the whole premise of the argument is that users will care about sizes.
I also think that 99% of contracts will store ledger data under simple symbol names which are shorter than a 32-byte hash. if in some case users try storing big highly-structured keys, they'll hit the limit and change strategy, possibly to hashing, possibly to something else. I don't mind setting the limit reasonably short.
the upside of hashing is that we can bound the key size at 32 byes, which isn't achievable at the contract level. maybe it's not worth it, IDK, but I'm also not sure if it's worth pursuing some interpretability of contract data entries
it's helpful for debugging if nothing else. and I think entry browsing will be a thing.
(don't forget that every key is >32 bytes)
what do you mean "isn't achievable at the contract level"?
again, I wonder if people would use the actual bucket list for debugging... the sandbox ledger entries are json anyway, so they can be whatever
yes people will use the actual ledger for debugging
I mean that we can hash contract id + scval and have 32 byte keys
100% certain "look at the data" is a way people debug things
hmm, but how do they even do that?
... by downloading it and/or requesting it from a server that has it on hand
(I mean, there is xdrquery util I wrote for the core, but I doubt there are external users)
do you mean you want to store CONTRACT_DATA entries keyed by the hashed combination of their contract ID and scval, rather than as a 2-level key with contract ID followed by scval?
yeah, if we're hashing, why not use hash as the sole key for the whole entry
I think hashing just scval is counter-productive anyway
because then you can't find much less delete all the contract entries of a contract
(it also defeats IO locality)
you don't want to do that
but you can't delete the contract entries anyway...
but if locality is an issue, then sure
if I delete a contract instance I want to be able to delete its data entries
then I'd say that I vote against scval hashing as well; it's pretty pointless size-wise
but you can't delete a contract instance either...
???
there are no deletion mechanisms for the contract instances/wasm blobs (besides state expiration)
we're going to expire and GC them eventually
and these wouldn't trigger the data deletion automatically, at least as far as I know
I was certainly assuming they would!
(that would be potentially slow and also wrong semantically; you don't want to lose all the entries just because the contract entry rent has expired)
the entries will expire as well eventually
but we don't want to lose thousands of user balances just because the rent on the instance has somehow defaulted
it's easy to unarchive the contract entry; it's much harder to unarchive the user entries
hmm
I don't think we've considered 'cascading' expirations; sometimes they're viable but we decided that for these cases just bundling data in the same entry is probably enough
(that's mostly concerning some security concerns, not purging all the contract data automatically)
ok. I still think there's a likely need to go from contract => entries, for debugging if nothing else, and at very least a locality argument. hashing everything into a "I have no idea what this key even is" key is IMO a big design error in eth
it optimizes for one thing (uniformity) at the expense of everything else
yeah, then there is really no point in hashing scvals; that will result in more fees more often than not. I have my doubts on usability of contract data scans (at least for now), but I agree that it's useful to have this at least as an extension point
that said, what is a viable limit then? would 1K be feasible? or should we be more conservative? I think it should be at least 256 bytes
I think it should be the same limit as the topic limit, and yeah somewhere around there seems reasonable. symbol is max 64 bytes, so imagine 1 hash + 1 symbol + vec overhead. 128 bytes? 256?
symbol is 32 bytes, isn't it?
I'm think more along the lines of 2-3 addresses (each of which is 36 bytes IIUC)
plus a symbol
and vec/tags
const SCSYMBOL_LIMIT = 32 guess so!
so yeah you can easily bump into 128 but .. 256 is likely to cover say 5 addresses and 2 symbols and a vector
when we've discussed size limits before we talked about making it predictable by doing it in terms of total number of leaf objects and max size of a leaf object
eg. 8 objects each of one of the following leaf types (any number type, symbol, address, bytes <= 32, string <= 32)
hmm, again, why not just look at the size of the resulting ledger key? looking at the leaf objects makes it unpredictable because it ignres the non-leaf structure
and your actual key sizes might deviate arbitrarily
because making it depend on XDR makes it hard for a user to predict
they have to build a mental model of serialization
validation is especially trivial for the ledger keys - we can check it at tx validation point
which isn't present at the API level
we can account for the non-leaf structure too (eg. no more than 16 objects total)
this seems too arbitrary for me...
making users think of XDR is worse. we've been over this multiple times in threads.
this whole topic is about an arbitrary limit, but it has to be comprehensible
the serialization format is, say +20 bytes overhead... I don't think users need to think about this for the most part, as long as we have tooling that tells about exceeding the size limits soon enough
it's not that they can't find out when it happens, it's that the boundary between it happening and not-happening has to do with the sum across a serialization pass, which means they might fairly easily have code that works in testing but fails in production because people use slightly longer symbols or strings
if it's structural it's more likely the tested and production code use the same structure to build a key
hmm, failure due to too large inputs is expected
especially for the unbounded types
like if something uses bytes or string as input, then it's bound to fail for some user inputs, which is expected
or vector
I wouldn't be overly concerned about keys, but losing flexibility in ledger entry values is problematic
like why can't I store an array of a 1000 i128?
given that there is a limit for bytes, there is incentive to just serialize scval to bytes and store that...
well, this was a conversation about keys not vals
and I'm just reflecting previous conversation. "depending on the user knowing XDR serialization" wasn't a popular choice.
true, that's a neighbor thread that discusses this
I gotta go now, though, I think we're at least on the same page about "don't force hashing"?