#node startup times

1 messages ยท Page 1 of 1 (latest)

hidden pivot
#

just starting a thread not to clutter the chat

#

do the same nodes use similar startup times or is it random on each start?

#

do the startup times correlate with cometbft db size?

jolly cloud
#

no its completely random, never know if one is gonna take a few minutes or 10

hidden pivot
#

hm that's a bit weird

#

and system load with other processes running does not seem to be influencing it?

jolly cloud
#

nope

hidden pivot
#

interesting

jolly cloud
#

all hosts have the exact same services on them

hidden pivot
#

have to admit I never paid attention to whether my fullnodes took 2 or 7 minutes - just that they take a while

jolly cloud
#

I am not worried about it.. seen this on celestia, somm, nym, atom, etc. I think its cometbft.

hidden pivot
#

could be depending on db state when it shut down I guess

jolly cloud
#

agreed

hidden pivot
#

in any case, I think it's still correlated to db size - those who have pruned experience significantly shortened start times

#

I'm a little cautious of pruning, so I suffer the longer starts

#

๐Ÿ™‚

jolly cloud
#

well kind of, it could be a sign the DB is not shutting down properly.. and then has to patch state or something.

#

I also havent pruned anything yet.

hidden pivot
#

I know when shutting down a node, sometimes it exits cleanly and sometimes it's an appcrash.

#

haven't monitored for whether that makes a diff on next start

#

maybe worth doing some benchmarks on?

jolly cloud
#

I will pay closer attention next time we do an upgrade (app hash means longer start up time<- is what i will be testing)

jolly cloud
#

cc @flint jasper to track ^

hidden pivot
#

not to mention the client appcrashes when it doesn't like parameter choices ๐Ÿ™‚

#

I'm thinking it's a fair point. maybe it should be looked into if it can reduce startup-recovery times.

flint jasper
#

@jolly cloud was TuDudes one of those nodes? If so, VN? Fullnode?

jolly cloud
#

No those are some of our internal nodes. We see this on our relay and rpc nodes. Relay nodes are just telling our signer what to sign and then broadcasting.

thin maple
thin maple
#

I don't know a great deal about cometbft or cosmos/tendermint stuff, any ideas on what/where to look in cometbft data for signs of problems?

dawn lichen
#

hey guys, i'm trying to restore from snapshot of Polkachu (I need an archive node), but my node is crashing in the crediting part - has anyone had a similar problem after the hard fork?

INFO namada_node:๐Ÿš:init_chain: Crediting 106.880076 nam tokens to tnam1qzzzpksl4lym3wy2gp0j8c7rvgtjrqdhwusruc3s INFO namada_node:๐Ÿš:init_chain: Crediting 106.880076 nam tokens to tnam1qzzzqjs4kjcpghvsgdz0yjr7f5zmsxww8vl0gjgs INFO namada_node:๐Ÿš:init_chain: Crediting 106.880076 nam tokens to tnam1qzzzrfwwyv0kz5svprr428t05fwhu7rq5u89sxgm INFO namada_node:๐Ÿš:init_chain: Crediting 135.922579 nam tokens to tnam1qzzzt8mk8ynxfhyae6rcadtcwshssrkcw5j3a5kt

hidden pivot
jolly cloud
hidden pivot
#

guess I could try restarting the same a number of times for stats ๐Ÿ˜…

thin maple
#

@hidden pivot @jolly cloud managed to some brief investigation here on my mainnet node, I'm not seeing any variation in startup times, not sure how I can see whether namadan shutdown, cleanly or not though (what do you guys see in logs to indicate unclean shutdown?)

#

I did however realise that I got it inverted: it's cometbft waiting on namadan to finish initialising

#

I took a flamegraph of namadan during initialisation:

#

full zoomable flamegraph svg file:

#

ugh, not sure how to prevent discord from providing unwanted preview of text files...

jolly cloud
#
2025-06-02T16:19:10.630013Z  INFO namada_node: Done loading MASP verifying keys.
2025-06-02T16:19:10.630604Z  INFO namada_node::storage::rocksdb: Using 1 compactions threads for RocksDB.
...
{"_msg":"abci.socketClient failed to connect to tcp://0.0.0.0:26658.  Retrying after 3s...","connection":"query","err":"dial tcp 0.0.0.0:26658: connect: connection refused","level":"error","module":"abci-client","ts":"2025-06-02T16:28:28.847284106Z"}
...
2025-06-02T16:28:34.111657Z  INFO tower_abci::v037::server: ABCI server starting on tcp socket addr=0.0.0.0:26658

I am noticing it taking forever on creating the connection.. Why its happening, I am not sure yet.

thin maple
#

those json log entries are coming from comebft right?

#

this is the same problem, cometbft unable to connect to namadan's broadcaster service

#

just to clarify on my node I have:

tcp        0      0 127.0.0.1:26658         0.0.0.0:*               LISTEN      418840/namadan      
tcp        0      0 127.0.0.1:26657         0.0.0.0:*               LISTEN      418881/cometbft     
tcp6       0      0 :::26656                :::*                    LISTEN      418881/cometbft ```
#

i.e. namada listens on 26658

#

I suspect you have the same issue and that the flamegraph above is relevant to your long connection time

jolly cloud
#

seems accurate!

thin maple
#

so the vast majority of time is spent in the sparse_merkle_tree::validate function

#

cc @vapid hull any idea if it is normal for namadan to do this validation on every startup?

hidden pivot
hidden pivot
thin maple
#

not sure how terminal vs systemd launching changes that

hidden pivot
#

in a terminal

#

terminate with ctrl+c

#

sometimes it exits cleanly, often it crashes

#

we were wondering above if the longer startup times was after unclean exits

thin maple
#

ahh ok, do you have an example of logs from a crash?

hidden pivot
hidden pivot
#

imma shut down all my nodes soon and see if we can find any ๐Ÿ™‚

thin maple
hidden pivot
#

that was quick lol:
`2025-06-02T20:21:02.676515Z INFO namada_node::shell::finalize_block: txs executed: 0
The application panicked (crashed).2025-06-02T20:21:02.705991Z INFO namada_node: Tendermint node is no longer running.

Message: called Result::unwrap() on an Err value: Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" }
2025-06-02T20:21:02.706038Z INFO namada_node: Namada ledger node has shut down.
Location: /home/runner/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tower-abci-0.19.1/src/v037/server.rs:2025-06-02T20:21:02.706071Z INFO namada_node: Shutting down ABCI server...
35m179

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.`

hidden pivot
hidden pivot
thin maple
hidden pivot
#

yes it's dirty, I know.. ๐Ÿฅด

#

ok startup time wasn't too bad. around 1:40m on a not very hw-specced node (aux node, unimportant use case)

hidden pivot
thin maple
#

what about db size? what is the output of du -sh $NAMADA_DIR/db

hidden pivot
#

different system calls apparently hm interesting

hidden pivot
hidden pivot
#

interesting again: namada db size is 24G

#

cometbft db size around 102G

thin maple
#

systemctl termination then waits 90 seconds then sends SIGKILL if process has not exited, which is much more brutal

thin maple
#

mine is similar 25GB