#toubleshooting
1 messages ยท Page 1 of 1 (latest)
hi, yes
did you save your priv validator state json?
agreed
or was that reset too when you did the ledger reset?
yes, it's saved after the reset, I backed up the entire namada dat dir also before running the namadan ledger reset
i.e. I took a backup then ran the reset command on the original directory
so what I would do is
I would make sure you have the validator state with a high block count (just inspect the file)
then restore from snapshot (obvs make sure it's down when you do)
restore validator state and boot
I hadn't upgraded to 1.1.5 when the problem started, it OOM'd at around 16:00 UTC went into restart loop (failing to connect to abci client) last night, I've been busy with family and so didn't realise till now
ok I'll take a look at the file now...
is there some docs anywhere for this?
not really, but it's pretty simple
if you get a snapshot from let's say itrocket or mandragora, you basically replace the contents of only the db and cometbft/data folders - that's all a snapshot is..
I could recommend using a monitor like tenderduty or something else to keep an eye on your validator for you.
yes thanks, definitely need to do that!
ok so it looks like the reset did in fact delete the validator config json , I have a copy, I'll stop the node then copy it over and restart
would be good to know it's possible to resync from height 0 without using a snapshot though, should I report the issue?
I mean, there isn't much of a practical point to doing such a resync since it would imo require you employ multiple versions of node as you go along. I doubt it's 1) undoable but also 2) worth the effort for most. I wouldn't report until there is more certainty it actually can't be done.
just make sure it's the one that reflects last sign
and again I would def restore from a trusted snapshot. there is nothing wrong with doing that
oh really? the latest versions are not able to sync the chain from genesis, that's a shame, hopefully that can be addressed in the future
it's a problem long term if namada gains traction that it's not possible to verify the full chain and new nodes need to trust someone's snapshot, but yeah I realise it's an engineering effort to make that happen and the team have more pressing priorities I imagine
can you provide a link to a snapshot you would trust please? I am somewhat unfamiliar with where to find a trustworthy one ๐
can't say for sure but I think it's often like that due to how the blockchain works. it's not a bug just a natural thing with the functionality of it. (others may know the details better)
I don't think verification comes from being able to sync from scratch, it comes from present state consensus on-chain imo (my terminology may be imprecise)
I would use itrocket or mandragora
ok I'm following Daniel's instructions to use Mandragora snapshot, I had to change fast_sync/[fastsync] to blocksync in the configs (seems familiar, I'm sure I've done that sometime already!)
thanks for taking the time to help it's much appreciated ๐
I don't think the fastsync is that important - the snapshot will make sure you are almost up to date. take care in replacing the priv-validator-state before booting your node..
i agree
remember to unjail once you are back up and running btw - assuming you got jailed already
sadly, the problem remains:
E[2025-04-13|23:49:38.340] abci.socketClient failed to connect to tcp://127.0.0.1:26658. Retrying after 3s... module=abci-client connection=query err="dial tcp 127.0.0.1:26658: connect: connection refused" E[2025-04-13|23:49:41.343] abci.socketClient failed to connect to tcp://127.0.0.1:26658. Retrying after 3s... module=abci-client connection=query err="dial tcp 127.0.0.1:26658: connect: connection refused" E[2025-04-13|23:49:44.346] abci.socketClient failed to connect to tcp://127.0.0.1:26658. Retrying after 3s... module=abci-client connection=query err="dial tcp 127.0.0.1:26658: connect: connection refused" 2025-04-13T22:49:47.316123Z ERROR namada_node::broadcaster: Broadcaster failed to connect to CometBFT node 2025-04-13T22:49:47.316193Z ERROR namada_node::broadcaster: Broadcaster unexpectedly shut down. 2025-04-13T22:49:47.316199Z INFO namada_node::broadcaster: Shutting down broadcaster... 2025-04-13T22:49:47.316205Z INFO namada_node: Broadcaster is no longer running.
seems that cometbft is not opening a listening socket
what happens if you disable the service and try to boot the ledger in terminal?
service is not stopping, I'll need to kill -9 I think ๐ฌ
you need to make sure you disable the service so it's not running twice.
you are running cometbft 0.37.15?
@shrewd gyro if you want to dm me your config file, I could compare with some of my nodes and see if any differences
you're not running a dockerized setup by any chance?
I'm pretty sure it's 0.37.15 unless updating to namada 1.1.5 changed the cometbft version
it didn't, but would verify just to be sure (cometbft version)
yes 0.37.15
ok so namadad.service:
cat /etc/systemd/system/namadad.service [Unit] Description=namada After=network-online.target [Service] User=bod WorkingDirectory=/home/bod/.local/share/namada Environment=CMT_LOG_LEVEL=p2p:none,pex:error Environment=NAMADA_CMT_STDOUT=true ExecStart=/usr/local/bin/namadan ledger run Restart=always RestartSec=3 StandardOutput=journal StandardError=journal LimitNOFILE=65535 [Install] WantedBy=multi-user.target
I'll DM you the rest of configs now
btw running from terminal just hangs as follows:
bod@scraptop:~/.local/share/namada$ /usr/local/bin/namadan ledger run 2025-04-13T23:15:05.085116Z INFO namada_node: Available logical cores: 8 2025-04-13T23:15:05.085149Z INFO namada_node: Using 4 threads for Rayon. 2025-04-13T23:15:05.085155Z INFO namada_node: Using 4 threads for Tokio. 2025-04-13T23:15:05.122645Z INFO namada_node: VP WASM compilation cache size not configured, using 1/6 of available memory. 2025-04-13T23:15:05.122825Z INFO namada_node: Available memory: 22.327224731445313 GiB 2025-04-13T23:15:05.122838Z INFO namada_node: VP WASM compilation cache size: 3.7212041215971112 GiB 2025-04-13T23:15:05.122850Z INFO namada_node: Tx WASM compilation cache size not configured, using 1/6 of available memory. 2025-04-13T23:15:05.122854Z INFO namada_node: Tx WASM compilation cache size: 3.7212041215971112 GiB 2025-04-13T23:15:05.122858Z INFO namada_node: Block cache size not configured, using 1/3 of available memory. 2025-04-13T23:15:05.122861Z INFO namada_node: RocksDB block cache size: 7.4424082431942225 GiB 2025-04-13T23:15:05.122988Z INFO namada_node: Loading MASP verifying keys. 2025-04-13T23:15:05.123050Z INFO namada_node::ethereum_oracle: Ethereum event oracle is starting url="http://127.0.0.1:8545" 2025-04-13T23:15:05.124911Z INFO namada_node::ethereum_oracle: Oracle is awaiting initial configuration 2025-04-13T23:15:05.141030Z INFO namada_node::tendermint_node: CometBFT node started 2025-04-13T23:15:05.696349Z INFO namada_node: Done loading MASP verifying keys. 2025-04-13T23:15:05.696931Z INFO namada_node::storage::rocksdb: Using 2 compactions threads for RocksDB. 2025-04-13T23:15:05.697760Z INFO namada_node::broadcaster: Starting broadcaster.
you have nginx running? doing some port forwarding? could you try disabling nginx for a sec and just trying again to see if that's the culprit?
ok but that's interesting. it doesn't give you those port connection errors....
I bet if you waited a bit longer in terminal it would start running
netstat -tulnp does not show comebft opening a listener socket
oh, now it exited:
2025-04-13T23:15:05.697760Z INFO namada_node::broadcaster: Starting broadcaster. 2025-04-13T23:18:05.697962Z ERROR namada_node::broadcaster: Broadcaster failed to connect to CometBFT node 2025-04-13T23:18:05.698025Z ERROR namada_node::broadcaster: Broadcaster unexpectedly shut down. 2025-04-13T23:18:05.698032Z INFO namada_node::broadcaster: Shutting down broadcaster... 2025-04-13T23:18:05.698042Z INFO namada_node: Broadcaster is no longer running. 2025-04-13T23:19:01.855644Z INFO namada_node::abortable: Broadcaster has exited, shutting down... 2025-04-13T23:19:01.855688Z INFO namada_node::tendermint_node: Shutting down Tendermint node... 2025-04-13T23:19:01.855744Z INFO namada_node: Namada ledger node started. 2025-04-13T23:19:01.855798Z INFO namada_node: This node is a validator 2025-04-13T23:19:01.855777Z INFO tower_abci::v037::server: ABCI server starting on tcp socket addr=127.0.0.1:26658 2025-04-13T23:19:01.864985Z INFO namada_node: Tendermint node is no longer running. 2025-04-13T23:19:01.865037Z INFO namada_node: Namada ledger node has shut down. 2025-04-13T23:19:01.865124Z INFO namada_node: Shutting down ABCI server... 2025-04-13T23:19:01.936083Z INFO namada_node::ethereum_oracle: Ethereum event oracle is no longer running url="http://127.0.0.1:8545"
something weird has happened, I might try setting up a new node from scratch
problem is I've already lost a delegator ๐ฆ
maybe set up a scratch node, compare a vanilla config with your config file.
you're already jailed so I would say take your time to get it running again
more people have this issue so would be good to pinpoint
indeed, I'll get a fresh node setup with a fresh install of Ubuntu, I have a feeling something weird might have happened to the OS that also caused namadan to get OOM'd
possibly an Ubuntu update
in case anyone else still has this issue, it's worth trying this workaround: #๐-mainnet-operations message