#Rethinking LIFETIME_EXTENSIONS

1 messages · Page 1 of 1 (latest)

wet sapphire
#

Recently, there have been several issues with LIFETIME_EXTENSION implementation details. To briefly summarize, the idea behind LIFETIME_EXTENSION entries is to provide a way to bump large, primarily read-only data without having to rewrite the entire LedgerEntry. This is intended for WASM blobs, which are almost never changed and where the write costs dominate lifetime extension costs. The current design has the following issues:

  1. LIFETIME_EXTENSION entries are not concurrent in their current implementation. Every entry in the RO footprint is bumpable via a LIFETIME_EXTENSION, meaning there is a possible, implicit LIFETIME_EXTENSION write for each RO key. This implicit write of a read-only structure is complicated and diffcult to implement in a parallel way.

  2. BucketList invariants must be relaxed. Because LIFETIME_EXTENSION entries have a separate LedgerKey than the DATA_ENTRY they reference, but can still merge into a DATA_ENTRY, we have weird edge cases. For example, currently an INIT + INIT entry with no DEADENTRY in between is valid given the behavior of LIFETIME_EXTENSION merging. This also has negative effects for downstream system ingestion invariants.

  3. Meta is difficult to consume. Downstream systems must process a new type of LedgerEntry (LIFETIME_ENTRY) that is fundamentally different than other ledger entry types.

The issue is LIFETIME_EXTENSIONS break too many assumptions of the system. Namely, that a key can be marked as RO but is still modified (at least to some extent), and that a LedgerEntry can merge with another LedgerEntry that is is related to, but that has a different key.

I think the core issue is that our solution to the given problem is too general. In practice, ContractCode is the only entry type where read-only bumps are strictly necessary. Other entry types are either frequently written, or if they are read-only, are small enough such that write fees during bumps are still low (like auth entries).

#

Recently I've been trying to address each problem individually, but I think the system breaks too many assumptions. Instead, I'm proposing a simplified approach that only applies to the primary use case, Contract Code entries. The proposal is as follows:

Remove LIFETIME_EXTENSION entries all together. For ContractData entries, a lifetime bump is considered a write. To bump a Contract Data entry, the key must be in the footprint RW set and the invoker must pay a write fee.

Contract Code entries no longer store an expirationLedgerSeq, they only store the WASM blob (and other non-lifetime related metadata). Contract Code entries still have a "lifetime", but this lifetime is stored in a new type of LedgerEntry called the LifetimeEntry. Each Contract Code entry must also have a LifetimeEntry associated with it. The LedgerKey of the LifetimeEntry is different than the ContractCode ledgerkey, but is based off of it deterministically using something like a hash.

Unlike in the previous proposal, LifetimeEntries do not merge with the ContractCode entry they are associated with. This means that the storage overhead for Contract Code increases, but only by a small ammount. LifetimeEntries should be very lightweight and only contain a key, expirationLedgerSeq, and lastModifiedLedger. If this interface was extended to ContractData entries as well, this overhead would be significant. However, given that there should be signiifcantly less ContractCode entries, and that the ContractCode entries are significantly larger, this overhead should be manageable.

In order to extend the lifetime of a ContractCode entry, the corresponding LifetimeEntry must be included in the RW set. However, to just use a ContractCode entry, the LifetimeEntry does not need to be included in the RO set. This prevents future data dependencies on bumps.

#

We can have this asymetry in RO and RW sets because if contract code is in the RO set to be executed, the exact value of the LifetimeEntry is not important. As long as the lifetime is non zero (i.e., not expired) the contract code can be read and executed. The exact value of the LifetimeEntry is only necessary if the lifetime is potentially being extended. If the lifetime is 0 (i.e., the entry has expired) both the lifetime bump and WASM read for execution will fail anyway. To make a LifetimeEntry with a zero value non-zero again, a restore operation is required. This is a Stellar Classic Op, so it has no concurrency issues.

#

I believe this interface solves all major issues. By separating the WASM execution data dependency and the WASM lifetime extension data dependency, we can have concurrency friendly lifetime extension behavor for WASM code. For ContractData bumps, I suspect there will be significantly less contention, so concurrency should not be a major concern. Because LifetimeEntries have separate keys and do not merge into ContractCode entries, we maintain all current BucketList invariants with no modification or other weird edge cases. Finally, while downstream systems will have to maintain a new entry type (LifetimeEntry), this is a persisted LedgerEntry that follows the normal create, update, delete logic. This is significantly easier to deal with than the current LIFETIME_EXTENSION interface, which breaks the standard create, update, delete logic due to the way they are merged into DATA_ENTRY that has a different LedgerKey.

This approach also does not eliminate RO bumps from ContractData entries later on. This should be enough functionality for launch, but if the community expresses a need for ContractData entries that support RO bumps, we can later extend ContractData entries to optionally use this LifetimeEntry type.

#

While I think this solution solves the major issues, there are some drawbacks. Namely:

  1. ContractCode requires more disk overhead to store. Given the size of ContractCode entries, I think this overhead will be relatively small.
  2. ContractData can not recieve inexpensive, RO rent bumps (at least for V1). I think this is fine. For most RO entries, the size of the DATA_ENTRY and LIFETIME_EXTENSION entry is relatively comporable, so I doubt the savings would actually be significant in practice
  3. This requires XDR changes and a non-trivial amount of work, both for core and downstream systems. This will push our release dates.

That being said, I think this addresses the root issue (ContractCode rent bumps) and simplifies the implementation for core and downstream systems. It's late in the game but I think it's worth doing

sterile sapphire
#

I don't think this addresses the true root issue - concurrency support

#

any sort of RW bumps breaks the concurrency

#

cost is of course an issue as well, but the main thing is that we don't really want bumps to break concurrency

#

so if we really want to solve the problem, we need to rethink our bump approach first, not the storage layout

#

we could quickly hack the current approach to move the expiration extension entires to RW set and thus 'fix' most of the invariants. but, as discussed previously, we don't really want to do that

cunning jackal
#

If we expect contracts to bump the instance and contract code on most calls (like the SAC) wouldn't this solution effectively remove parallel access to a contract (I think this is Dima's point as well but not 100% sure)?

sterile sapphire
#

yes, it's exactly my point

willow leaf
#

@cunning jackal what exactly do you mean by that? like are you saying you expect most calls to an SAC instance X, say to transfer funds from account A to B, to carry with them sufficient fees to perform a lifetime bump on X?

sterile sapphire
#

to be more general, this is a bump model that we're currently suggesting as a 'best practice' for most of the contracts

#

of course we could say that it's not a 'best practice' (if we can come up with something else), but that's still a potential footgun that will either break concurrency for someone, or would promote very unstable contracts that would frequently expire due to lack of bumping infrastructure

willow leaf
#

ok so .. this is interesting! when I last talked to Garand about this, and about concurrency and contention issues, we came to a somewhat different conclusion, that most users will actually not want to risk personally shouldering the cost of keeping a token alive with their individual transaction fee, so they will set their fee below the bump threshold, and if their tx is asked to bump it will just fail and they will retry once someone else (the token admin) does the bump.

#

(which produces a fairly different model of bump-induced contention -- if a tx is going to fail for lack of fees anyways, it's not any different if it fails for lack of having a given bump-target in its footprint)

sterile sapphire
#

that doesn't look like good UX to me and I'm also not sure how would users achieve that

willow leaf
#

IOW we concluded that the likely practice would be for there to be two sets of operations on an instance, those that are cheap and don't bump, and some (possibly specific) expensive ones that do bump, and the admin would be the only person calling the latter

sterile sapphire
#

preflight just returns the necessary refundable fee to spend

#

this seems pretty different from the bump model we landed on. I'd say if we go with 'admin manages the contract expiration' approach, I'd say we should remove contract-driven bumps completely and only leave the respective operation. but I'm not sure that's a good approach in the first place, given that not every contract has a clear admin role that also has a financial responsibility over contract's lifetime

willow leaf
#

ok. I mean .. I am not trying to dictate terms or anything here, I'm saying (a) I think it's worth trying to understand what users will do in terms of free riding / not wanting to personally pay the fee to keep a contract alive and (b) more importantly I think some of the implementation choices Garand and I settled on were based on this understanding so if it's not actually going to happen we should revisit.

sterile sapphire
#

it's hard to predict what will the user behaviors be. however, I think that doing something weird like setting lower fees with a risk of failing a transaction will likely be inefficient. you save a bit on not bumping the contract instance, but you also waste all the fees for doing what you actually wanted to do. I think in majority of the cases non-refundable fee will be higher than the refundable fee

#

on a side note, having bumps encapsulated only in a single operation would make things much simpler - we could create one more phase just for bumps, so the contract phase is concurrent and doesn't care about bumps, and bump phase is sequential, but fast

#

(but during the previous discussions the consensus was that people would want to bump data from contracts because contracts know which keys to bump)

wet sapphire
#

I think the disconnect here is between the bump function we expose. When talking with Graydon, the suggested bump interface looked like bump(key, lowerBoundLifetime, bumpAmount), with an inteded use case being something like bump(instance, 1 month, 6 months). This bump would bump the instance to 6 months if the lifetime fell below 1 month. Given this interface, bumps are large and occasional, so it makes sense that many TXs may be preflighted before the bump was required and thus have to low of a fee. I believe Dima is working off of the current interface of bump(key, bumpTo). With that interface, the moment you fall below the threshold, every TX will bump the key by just a single ledger (assuming it's called every ledger).

willow leaf
#

@wet sapphire can you recall what part of our conversation concerning implementation choices rested on this assumption? I'm having trouble reconstructing it from memory and I think it's important.

#

(I should emphasize I have no desire to perturb anything about any part of the effort here, just get something (a) coherent (b) implementable (c) not destroying concurrency or unrealistic about contention (d) not unrealistic in terms of UX)

wet sapphire
#

This is important because the bump(key, bumpAmount) interface leads to very small, frequent bumps, where the invoker may not notice or care about the small bump. If we have the bump(key, lowerBound, bumpAmount) interface, you have infrequent, large and expensive bumps, where a user is much more likely to notice and want to put in some extra work wrt to fee setting to "free load"

#

I discussed this with Nico and at the planning meeting this morning and we came to the following conclusions:

  1. The current LIFETIME_EXTENSION interface is broken for several reasons not related to concurrency.
  2. The "fix" to concurrency that we discussed earlier doesn't actually work and is difficult for downstream systems to manage. This fix was where we "snapshotted" the expirationLedgerSeq at the beginning of the ledger such that all TX bumps were relative to the snapshot.
#

Fundamentally, we have a data dependency on an entry's lifetime. We've tried to get around this dependency through special case rules like the snapshot approach, seperating the lifetime value into a seperate entry, etc.). This is getting very complicated for everyone involved and still doesn't fully address all edge cases, like an entry that gets resized after extending it's lifetime. The conclusion we came to is instead of trying to solve for concurrency, just avoid read-write conflicts as much as possible. I think bump(key, lowerBoundLifetime, bumpAmount) accomplishes this. Assuming the SAC says something like bump(instance_and_wasm, 1_month, 6_moths), we will have a single ledger with bad parallelism every 6 months. I think (at least for V1) we can live with this. Down the line we could try to improve this by adding different phases where we apply bumps after TX application, have some sort of merge rules to resolve bump conflicts, etc, but for V1 I think this will work if we change the host function to bump(key, lowerBoundLifetime, bumpAmount). There are non concurrency related issues with LIFETIME_EXTENSION that requires a redesign and I should have a doc circulating later this afternoon with details on that

sterile sapphire
#

This is important because the bump(key, bumpAmount) interface leads to very small, frequent bumps, where the invoker may not notice or care about the small bump.
this is intentional and a product of the previous discussions, isn't it? the point was to more or less evenly load the bump fees on the contract users. with the proposed interface you would basically have one unlucky user every once in a while who needs to pay for the bump.

#

bump(key, lowerBoundLifetime, bumpAmount)
what is the lower bound on lowerBoundLifetime? if it's low enough, then nothing changes

willow leaf
#

it sounds like he's saying (Garand please correct me) that he and Nico concluded the only real options are "lots of little bumps with lots of contention" or "a few big bumps with a few periodic instances of contention"

#

which is .. disappointing, and exactly what I was hoping we were going to avoid 😦

sterile sapphire
#

if we are in this kind of situation, should we reconsider getting rid of programmatic bumps, as they don't seem to achieve much, but do lead to awkward usage scenarios? or at least programmatic bumps of the instances and code? (for the regular contract data concurrency can anyway be broken with 'hot' RW entries, so we can just make programmatic bumps of contract data to be just the regular RW ops)

#

and as discussed above, bumps with a separate op can just be all moved to a separate sequential phase

wet sapphire
#

I thnk we should definietely have contract defined bumps for data entries, but I could see a case for not allowing contract instance bumps programatically

#

That fixes the concurrency concerns since we could always apply the bump ops in another phase once we eneable concurrency in V2

#

Sounds like a developer UX question though, @karmic obsidian ?

sterile sapphire
#

to be clear, that's not good UX for contract maintainers. I'm just not sure if automatic bumps with large lower bound will be useful, as they would create bad UX for the contract users

karmic obsidian
#

We need to be able to support autonomous contracts. These are contracts that don't require a maintainer or benevolent clients in order to survive (assuming that they're actively being used)

karmic obsidian
sterile sapphire
#

right, but then the question is if that's ok that the majority of the costs would be paid by a handful of arbitrary contract users

karmic obsidian
#

No, I don't think that's acceptable

sterile sapphire
#

so basically we want to have the following requirements:

  • metadata/bucketlist semantics consistency
  • not breaking concurrency
  • 'smooth' bumps
    and it seems like we can only do 2 out of 3
wet sapphire
#

We could go back to lifetime "snapshots", but only for ExpirationEntries. This would give us concurrency and we wouldn't have to worry about weird cases with resizing, etc

#

The drawback is this would break the assumption that all meta are sequential changes, they would have to take the max of all ExpirationEntry changes when writing the final value to the DB

sterile sapphire
#

yeah, I've been thinking about this... resize is still is problem though, need to think more

#

(because while we could just offload bumps to a separate phase or pseudo-phase, we can't do so about writes, but we still need to compute fees)

wet sapphire
#

Right now ExpirationEntries only apply to ContractCode, so no resize possible. Still need to think about how we could apply this to instances in V2 though

sterile sapphire
#

as usual, I'm not sure if not building the right semantics in v1 is a good idea

#

maybe we just need Rust borrow semantics when building tx sets 🙂 one RW or any number of RO

wet sapphire
#

Hmm actually I think I have a concurrency model that will work, see "Concurrency Model" section of the doc

sterile sapphire
#

I'm not sure I understand it. I thought the approach is just that we put expiration in RW footprint

wet sapphire
#

This is a different approach where the expriationEntry key is neither in the RW or RO set

sterile sapphire
#

I left a comment, but basically I don't see how this can work.

#

(without the same kinds of issues we encounter with the current approach, that is)

willow leaf
#

@wet sapphire hmm, I'm not sure I really get the constraints on design / implementation you're trying to avoid or achieve, but one thing that comes to mind reading the doc is that you're describing "lanes" as a single concept when in practice I believe (and experiments so far strongly suggest) they are two separate concepts: first splitting a txset into logical partitions, and second scheduling those logical partitions for execution on physical cores.

#

the logical partitioning should happen during txqueue-to-nomination, before consensus. in other words a consensus txset should carry a logical partitioning. this is to ensure an maximum size for any given logical partition, and thus a maximum latency per ledger under the assumption of some consensus minimum on the number of physical cores each participant can have.

#

but the maximum number of cores on a node may differ, and in any case the mapping from logical partition to physical schedule is likely to be something like "dynamic work stealing" on each node, because a lot of partitions will be small.

#

I'm mentioning all this because if you are assuming it's safe to "inherit" sequential semantics from co-scheduling of partitions that share these "hidden" r-w dependencies, that creates the risk of observably different execution on different peers that run the same logical partitions in a different physical schedule

wet sapphire
#

To clarify, when I talk about "execution lanes," I mean txset logical partitions. The actual scheduling of those partitions should not matter I agree. Within each logical partition, the individual TX order is deterministic, right?

willow leaf
#

yes. but if you are place txs A and B in separate partitions P and Q, and A and B both have a hidden shared r-w dependency on a ledger entry E but you ignored it when putting A in P and B in Q, then running P and Q sequentially on a single core will, if I understand your plan correctly, produce different results than running P and Q separately on separate cores.

wet sapphire
#

Yeah, it looks like my solution assumes that each partition begins with the same snapshot of state and ends with the "conflict resolution join" once the entire partition has been applied

willow leaf
#

ok. I agree that for any type that can live with the semantics of everyone's-prestate-is-the-same-snapshot-read + everyone's-poststate-gets-a-deterministic-merge, concurrent and sequential .. probably mean the same thing.

#

I wasn't sure if that fits in the current implementation. it sounded like it doesn't when the snapshots happen at the tx level. but maybe it does at the partition level?

#

(is that .. really the crux of your proposal here? shift the boundary of snapshot-read and merge from per-tx to per-partition?)

#

(at least with respect to expiration entries)

wet sapphire
#

Yes, the snapshot is at the parition level such that each logical parition executes on state as if it was the only partition being executed during that ledger

willow leaf
#

ok. this does make partitioning "observable" to transactions in a way it's not presently. but maybe it doesn't matter, it's not like partitioning is secret.

#

just to clarify motives: your motivation for this semantics is that it's simpler than resetting each tx's view of this entry to a snapshot read, or that it's less likely to produce double-charges of bumps on otherwise-contended contracts, or .. both? neither?

#

I think an entry that's highly contended only on expiry will .. produce lots of double charges. but if that's coupled with only small charges, maybe that's ok / intended / the price of concurrency?

sterile sapphire
#

yeah, it seems unavoidable and I think it's ok as long as we go with the small bumps approach. but the initial proposal here went for low-frequency bumps (which I don't like much), so not sure what's the final state we're aiming at

willow leaf
#

(I should weaken the word "presently" to "in the previous plan", of course there is no partitioning presently)

wet sapphire
#

3 motivations: first it's a lot simpler and requires no semantic changes for V1 to TX application. While it does produce less double-charges, I'm not really too concerned about that because they should be infrequent, small, and not a huge deal. 2nd, it makes meta ingestion simpler for downstream systems. When I pitched the per-tx snapshot, there was significant pushback from the platform people. 3rd and most important reason, this solves our resize issue.

Suppose we have TX A and TX B in the same partition. Suppose we have some entry e that is 1 byte large. In TX A, e is bumped to the maximum lifetime. This is very cheap because e is small. In TX B, e is resized to be very large, but is not bumped. TX B does not see the lifetime extension from TX A, so it is not charged lifetime related fees for the resize. If you have a per-tx snapshot interface, the end result will be that e is resized with the mximum lifetime, even though you only paid a fee for that lifetime as if e was very small.

Because we have per-partition snapshots, this is no longer an issue. If there is any modification to entry e, it will force all other TXs that access e into the same partition. This means that when TX B resizes e, it sees the extended lifetime, and must pay a large lifetime fee for the resize (as it should).

sterile sapphire
#

on a somewhat unrelated note: I'm a bit suprised that we're considering building logical partitioning into txset. I'm really not sure if there is a fair way to build a txset while also maintaining high concurrency. we'll need to discuss this more when we actually start designing this, but at least at conceptual level the approach seems ok even if the partitioning is implicitly built by validators

willow leaf
#

leaving it to the validators risks nominating / confirming a txset that is too big to process within the alloted latency target (5s per ledger)

#

if it does not split

#

(this happens on solana, it is not hypothetical)

#

specifically: if we say "ah, the network can usually handle 16 way parallelism, so let us admit 16,000 txs in a txset" and we admit them and they happen to all contend on one LE, there is only one partition and the ledger takes 80 seconds to close.

#

the point is to say "we will not let any partition get larger than 1000 txs" and admit txs accordingly

sterile sapphire
#

I think that's orthogonal to the tx set contents - we still can put validity restrictions on the generated partition (as long as algorithm is specified by the protocol). but anyway, I don't want to steal this thread, just wanted to point out that this should not break any possible bandwidth optimizations for partitions

willow leaf
#

failing txs due to contention is a lot more disruptive than delaying (and if we were comfortable with that we could just avoid footprints altogether -- give out locks first-come-first-serve and fail any conflicts; we rejected this approach like 2 years ago when doing early design work on parallelism)

#

but, sure, let us not derail this thread too much 🙂

sterile sapphire
#

I want to come back to this, @wet sapphire

the initial proposal here went for low-frequency bumps (which I don't like much), so not sure what's the final state we're aiming at

wet sapphire
#

Yeah so with this change high frequency, small bumps are back on the menu

willow leaf
#

If there is any modification to entry e, it will force all other TXs that access e into the same partition

is this true if one tx modifies e and another tx modifies the expiry of e? I thought you were proposing expiry-of-e is an untracked dependency, in which case it could be in a different partition

wet sapphire
#

We can leave autobump alone, but I still think we probably want to change the host function so we don't bump on literally every single ledger. Like, even bumping every 10 ledgers is fine

sterile sapphire
#

I'm ok with leaving autobump out for v1. but we still need to figure out what to do with the contract data entries

willow leaf
#

like if A in partition P extends E's lifetime, and B in partition Q resizes E, don't you still have "the resize problem"?

sterile sapphire
#

(I'm fine with adding some minimum bump value, but I think it should be within 10-1000 ledgers, not within months)

wet sapphire
sterile sapphire
#

like if A in partition P extends E's lifetime, and B in partition Q resizes E, don't you still have "the resize problem"?
how is this possible? I thought all RO entries go into the same partition given an least one RW access

wet sapphire
#

Yeah that's what I though

wet sapphire
#

This gives us parallel WASM and instance bumps

sterile sapphire
#

yay, consistency!

wet sapphire
#

Assuming the isntance isn't being modified by some TX and makes its way into the RW set

willow leaf
#

I thought all RO entries go into the same partition given an least one RW access

oh, ok, so these will still count as RO accesses, not "accesses that are totally invisible to the partitioning function"

willow leaf
wet sapphire
#

Yeah, to bump requires the ContractCode/ContractData being bumped to be in the RO set

willow leaf
#

how can I square that with "Expiry of e is untracked, yes"?

#

(in what sense is it untracked?)

sterile sapphire
#

IIUC we still omit expiration entries from the footprint (for the sake of brevity mostly), but logically we add the same access, as for 'parent' entry, so if we have key K in RW set, then we treat Expiration(K) as if it's also in RW set

wet sapphire
#

Consider LedgerEntry LE with ExpirationEntry EE. EE dependencies are untracked such that 2 read-writes to the same EE may occur in different partitions. However, the underlying LE that the EE is associated with is tracked. Concretely, this means the following:
Case 1:
TX A: read only bump of LE, RO: {LE}, RW: {EE (implicit)}
TX B: read only bump of LE, RO: {LE}, RW: {EE (implicit)}

partition 0: {TX A}
partition 1: {TX B}

Case 2:
TX A: read only bump of LE, RO: {LE}, RW: {EE (implicit)}
TX B: write and bump of LE, RO: {}, RW: {LE, EE (implicit)}

partition 0: {TX A, TX B}

willow leaf
#

ok, I guess this hinges on the rule that bumping requires LE in RO. that seems .. odd? somehow artificial? like the LE is not being accessed right? what about it is being read?

#

bumping is just writing to EE, right?

wet sapphire
#

Essentially we have the following dependency graph. For some key k, we have LE = ContractCode(k), EE = ExpirationEntry(LE).

Data dependency graph:
LE -> LE
EE -> LE

But notably
EE !-> EE

wet sapphire
# willow leaf bumping is just writing to EE, right?

Correct, but we need some way to model that the ExpirationEntry has a data dependency on it's associated ContractCode/Data entry, but that the ExpirationEntry does not have a data dependency/conflict on itself, which is weird

willow leaf
#

well .. ok! if this is the easiest to implement, I don't think I find the semantics problematic. a little odd, but if they're sufficiently clearly described it seems within reason to me!

#

I gotta step out for a couple hours -- no further comments on this from me 🙂

#

(I mean, I guess also "check with everyone else" / "try sketching it in code to see what breaks"? idk what the next level of seriousness is. also there is like .. schedule-impact / delivery timing stuff, I assume you've talked to others about that..)

wet sapphire
#

Ya I think from a developer point of view, "read only lifetime bump" seems to make the most sense so we shove the key in RO. With the exception of the BumpFootprintOp (which we can specialize) I doubt there's a large use case for programatically bumping but not reading an entry