#What does the sqlite db in soroban-rpc store?

25 messages · Page 1 of 1 (latest)

polar pulsar
#

What does the sqlite db in soroban-rpc store?

Is it a cache of data that's been streamed from core?

When I query tables I see this which makes me think it's a cache of previously streamed data.

sqlite> select * from sqlite_master;
table|ledger_entries|ledger_entries|4|CREATE TABLE ledger_entries (
    key BLOB NOT NULL PRIMARY KEY,
    entry BLOB NOT NULL
)
index|sqlite_autoindex_ledger_entries_1|ledger_entries|5|
table|metadata|metadata|6|CREATE TABLE metadata (
    key TEXT NOT NULL PRIMARY KEY,
    value TEXT NOT NULL
)
index|sqlite_autoindex_metadata_1|metadata|7|
table|ledger_close_meta|ledger_close_meta|8|CREATE TABLE ledger_close_meta (
    sequence INTEGER NOT NULL PRIMARY KEY,
    meta BLOB NOT NULL
)

cc @vocal ice @topaz crest

vocal ice
#

Its storing the txclosemeta

#

As well as the ledger entries.

#

That's it.

polar pulsar
#

Current state ledger entries?

vocal ice
#

No processed information.

#

Yes, current state

#

So that getledgeremtries could fetch the data

#

As well as the backing storage for simulateTransaction

thin geyser
#

I have two fairly orthogonal questions:

  1. why store txclosemeta in a db instead of fetch it from history archives using the attached captive core, similar to what horizon does?
  2. if we already have a db, why not just put events and transactions in it instead of having a two layer on-disk/in-memory stoagre mechanism?
vocal ice
#
  1. there few advantages of having a local db containing the latest txclosemeta, some of these are longer-term goals. Short-term benefits including faster bootstrapping, as the local data would be much more readily-available compared to remote storage. Long-term goals include deploying a rpc that wouldn't use a captive-core at all as it's txclosemeta-stream. instead, it would be using another rpc. That configuration could be useful to work around the limitation where the captive core has to run on the same machine as the rpc server.
#
  1. placing the events & transaction on disk is a guaranteed way to repeat the same mistakes that have been done with horizon. It would spent huge amount of storage on data that no one would ever care for. Instead, we implemented a light rpc endpoint that provides all the information that a defi application needs. The existing implementation can guarantee a very high, consistent, request throughput which wouldn't be possible with on-disk approach.
polar pulsar
#

Another advantage of storing the source data is if the rpc changes the data it needs to serve there's no backfill process.

#

Another advantage is reducing the number of data formats that exist, which presumably will mean lower maintenance cost.

#

Does rpc need to do a decent amount of processing during boot? Do we have a sense of what the boot up time will be on a network similar to pubnet's size?

#

I'm really interested then what the tx look up process is. Do we keep an in memory index? Or have some way to know which meta to look at for it?

torpid atlas
wheat cave
#

What about scalability does it prune? Sqlite gets slow as it gets large

torpid atlas
#

there is a configurable retention window for the tx meta table. by default we only keep 24 hours of history in the tx meta table. as the rows in that table expire we delete them.

however, the ledger entries table contains every ledger entry in the network. so as the ledgers grow so will the ledger entries table

vocal ice
#

one note on the ledger entries growth : it would always be smaller than the core's bucket db, so I wouldn't worry on exploding disk size. ( or to rephrase it : I would start by worrying on the bucketdb size first.. )

wheat cave
#

Not disk size but performance of the sqlite

#

Eventually if the ledger entries table grows big enough it could hamper queries to take too long

#

Prob somewhere around a couple hundred MB before it becomes a concern but that isnt that big

vocal ice
#

Yes, I believe that sizing argument for the rpc is true. it will get slower as the size of the ledger entries grows - which would affect the simulateTransaction performance ( somewhat ). At that point, I hope that we will have enough rpc instances and that the rpc instances themselves would be load-balanced correctly.

#

this is a "challenge" we will need to tackle one day.. but we're long from that point.