toubleshooting | Namada | Page 1

pure escarp Apr 13, 2025, 9:07 PM

#

is this a validator?

#

think doing threads for these ones are better

shrewd gyro Apr 13, 2025, 9:07 PM

#

hi, yes

pure escarp Apr 13, 2025, 9:08 PM

#

did you save your priv validator state json?

shrewd gyro Apr 13, 2025, 9:08 PM

#

pure escarp think doing threads for these ones are better

agreed

pure escarp Apr 13, 2025, 9:08 PM

#

or was that reset too when you did the ledger reset?

shrewd gyro Apr 13, 2025, 9:08 PM

#

yes, it's saved after the reset, I backed up the entire namada dat dir also before running the namadan ledger reset

#

i.e. I took a backup then ran the reset command on the original directory

pure escarp Apr 13, 2025, 9:10 PM

#

so what I would do is

#

I would make sure you have the validator state with a high block count (just inspect the file)

#

then restore from snapshot (obvs make sure it's down when you do)

#

restore validator state and boot

shrewd gyro Apr 13, 2025, 9:11 PM

#

I hadn't upgraded to 1.1.5 when the problem started, it OOM'd at around 16:00 UTC went into restart loop (failing to connect to abci client) last night, I've been busy with family and so didn't realise till now

#

ok I'll take a look at the file now...

shrewd gyro Apr 13, 2025, 9:11 PM

#

pure escarp then restore from snapshot (obvs make sure it's down when you do)

is there some docs anywhere for this?

pure escarp Apr 13, 2025, 9:12 PM

#

shrewd gyro is there some docs anywhere for this?

not really, but it's pretty simple

#

if you get a snapshot from let's say itrocket or mandragora, you basically replace the contents of only the db and cometbft/data folders - that's all a snapshot is..

pure escarp Apr 13, 2025, 9:13 PM

#

shrewd gyro I hadn't upgraded to 1.1.5 when the problem started, it OOM'd at around 16:00 UT...

I could recommend using a monitor like tenderduty or something else to keep an eye on your validator for you.

shrewd gyro Apr 13, 2025, 9:14 PM

#

pure escarp I could recommend using a monitor like tenderduty or something else to keep an e...

yes thanks, definitely need to do that!

#

ok so it looks like the reset did in fact delete the validator config json , I have a copy, I'll stop the node then copy it over and restart

#

would be good to know it's possible to resync from height 0 without using a snapshot though, should I report the issue?

pure escarp Apr 13, 2025, 9:17 PM

#

shrewd gyro would be good to know it's possible to resync from height 0 without using a snap...

I mean, there isn't much of a practical point to doing such a resync since it would imo require you employ multiple versions of node as you go along. I doubt it's 1) undoable but also 2) worth the effort for most. I wouldn't report until there is more certainty it actually can't be done.

pure escarp Apr 13, 2025, 9:18 PM

#

shrewd gyro ok so it looks like the reset did in fact delete the validator config json , I h...

just make sure it's the one that reflects last sign

#

and again I would def restore from a trusted snapshot. there is nothing wrong with doing that

shrewd gyro Apr 13, 2025, 9:19 PM

#

oh really? the latest versions are not able to sync the chain from genesis, that's a shame, hopefully that can be addressed in the future

shrewd gyro Apr 13, 2025, 9:22 PM

#

pure escarp and again I would def restore from a trusted snapshot. there is nothing wrong wi...

it's a problem long term if namada gains traction that it's not possible to verify the full chain and new nodes need to trust someone's snapshot, but yeah I realise it's an engineering effort to make that happen and the team have more pressing priorities I imagine

#

can you provide a link to a snapshot you would trust please? I am somewhat unfamiliar with where to find a trustworthy one 🙂

pure escarp Apr 13, 2025, 9:42 PM

#

shrewd gyro oh really? the latest versions are not able to sync the chain from genesis, that...

can't say for sure but I think it's often like that due to how the blockchain works. it's not a bug just a natural thing with the functionality of it. (others may know the details better)

pure escarp Apr 13, 2025, 9:43 PM

#

shrewd gyro it's a problem long term if namada gains traction that it's not possible to veri...

I don't think verification comes from being able to sync from scratch, it comes from present state consensus on-chain imo (my terminology may be imprecise)

pure escarp Apr 13, 2025, 9:44 PM

#

shrewd gyro can you provide a link to a snapshot you would trust please? I am somewhat unfam...

I would use itrocket or mandragora

shrewd gyro Apr 13, 2025, 9:45 PM

#

ok I'm following Daniel's instructions to use Mandragora snapshot, I had to change fast_sync/[fastsync] to blocksync in the configs (seems familiar, I'm sure I've done that sometime already!)

#

thanks for taking the time to help it's much appreciated 🙏

pure escarp Apr 13, 2025, 9:53 PM

#

I don't think the fastsync is that important - the snapshot will make sure you are almost up to date. take care in replacing the priv-validator-state before booting your node..

floral mango Apr 13, 2025, 9:53 PM

#

pure escarp I don't think the fastsync is that important - the snapshot will make sure you a...

i agree

pure escarp Apr 13, 2025, 9:54 PM

#

remember to unjail once you are back up and running btw - assuming you got jailed already

shrewd gyro Apr 13, 2025, 10:52 PM

#

sadly, the problem remains:
E[2025-04-13|23:49:38.340] abci.socketClient failed to connect to tcp://127.0.0.1:26658. Retrying after 3s... module=abci-client connection=query err="dial tcp 127.0.0.1:26658: connect: connection refused" E[2025-04-13|23:49:41.343] abci.socketClient failed to connect to tcp://127.0.0.1:26658. Retrying after 3s... module=abci-client connection=query err="dial tcp 127.0.0.1:26658: connect: connection refused" E[2025-04-13|23:49:44.346] abci.socketClient failed to connect to tcp://127.0.0.1:26658. Retrying after 3s... module=abci-client connection=query err="dial tcp 127.0.0.1:26658: connect: connection refused" 2025-04-13T22:49:47.316123Z ERROR namada_node::broadcaster: Broadcaster failed to connect to CometBFT node 2025-04-13T22:49:47.316193Z ERROR namada_node::broadcaster: Broadcaster unexpectedly shut down. 2025-04-13T22:49:47.316199Z INFO namada_node::broadcaster: Shutting down broadcaster... 2025-04-13T22:49:47.316205Z INFO namada_node: Broadcaster is no longer running.

#

seems that cometbft is not opening a listening socket

pure escarp Apr 13, 2025, 10:58 PM

#

what happens if you disable the service and try to boot the ledger in terminal?

shrewd gyro Apr 13, 2025, 10:59 PM

#

service is not stopping, I'll need to kill -9 I think 😬

pure escarp Apr 13, 2025, 11:00 PM

#

you need to make sure you disable the service so it's not running twice.

#

you are running cometbft 0.37.15?

#

@shrewd gyro if you want to dm me your config file, I could compare with some of my nodes and see if any differences

#

you're not running a dockerized setup by any chance?

shrewd gyro Apr 13, 2025, 11:09 PM

#

I'm pretty sure it's 0.37.15 unless updating to namada 1.1.5 changed the cometbft version

pure escarp Apr 13, 2025, 11:11 PM

#

it didn't, but would verify just to be sure (cometbft version)

shrewd gyro Apr 13, 2025, 11:12 PM

#

yes 0.37.15

#

ok so namadad.service:
cat /etc/systemd/system/namadad.service [Unit] Description=namada After=network-online.target [Service] User=bod WorkingDirectory=/home/bod/.local/share/namada Environment=CMT_LOG_LEVEL=p2p:none,pex:error Environment=NAMADA_CMT_STDOUT=true ExecStart=/usr/local/bin/namadan ledger run Restart=always RestartSec=3 StandardOutput=journal StandardError=journal LimitNOFILE=65535 [Install] WantedBy=multi-user.target

#

I'll DM you the rest of configs now

#

btw running from terminal just hangs as follows:
bod@scraptop:~/.local/share/namada$ /usr/local/bin/namadan ledger run 2025-04-13T23:15:05.085116Z INFO namada_node: Available logical cores: 8 2025-04-13T23:15:05.085149Z INFO namada_node: Using 4 threads for Rayon. 2025-04-13T23:15:05.085155Z INFO namada_node: Using 4 threads for Tokio. 2025-04-13T23:15:05.122645Z INFO namada_node: VP WASM compilation cache size not configured, using 1/6 of available memory. 2025-04-13T23:15:05.122825Z INFO namada_node: Available memory: 22.327224731445313 GiB 2025-04-13T23:15:05.122838Z INFO namada_node: VP WASM compilation cache size: 3.7212041215971112 GiB 2025-04-13T23:15:05.122850Z INFO namada_node: Tx WASM compilation cache size not configured, using 1/6 of available memory. 2025-04-13T23:15:05.122854Z INFO namada_node: Tx WASM compilation cache size: 3.7212041215971112 GiB 2025-04-13T23:15:05.122858Z INFO namada_node: Block cache size not configured, using 1/3 of available memory. 2025-04-13T23:15:05.122861Z INFO namada_node: RocksDB block cache size: 7.4424082431942225 GiB 2025-04-13T23:15:05.122988Z INFO namada_node: Loading MASP verifying keys. 2025-04-13T23:15:05.123050Z INFO namada_node::ethereum_oracle: Ethereum event oracle is starting url="http://127.0.0.1:8545" 2025-04-13T23:15:05.124911Z INFO namada_node::ethereum_oracle: Oracle is awaiting initial configuration 2025-04-13T23:15:05.141030Z INFO namada_node::tendermint_node: CometBFT node started 2025-04-13T23:15:05.696349Z INFO namada_node: Done loading MASP verifying keys. 2025-04-13T23:15:05.696931Z INFO namada_node::storage::rocksdb: Using 2 compactions threads for RocksDB. 2025-04-13T23:15:05.697760Z INFO namada_node::broadcaster: Starting broadcaster.

pure escarp Apr 13, 2025, 11:18 PM

#

you have nginx running? doing some port forwarding? could you try disabling nginx for a sec and just trying again to see if that's the culprit?

pure escarp Apr 13, 2025, 11:19 PM

#

shrewd gyro btw running from terminal just hangs as follows: `bod@scraptop:~/.local/share/na...

ok but that's interesting. it doesn't give you those port connection errors....

#

I bet if you waited a bit longer in terminal it would start running

shrewd gyro Apr 13, 2025, 11:20 PM

#

netstat -tulnp does not show comebft opening a listener socket

#

oh, now it exited:
2025-04-13T23:15:05.697760Z INFO namada_node::broadcaster: Starting broadcaster. 2025-04-13T23:18:05.697962Z ERROR namada_node::broadcaster: Broadcaster failed to connect to CometBFT node 2025-04-13T23:18:05.698025Z ERROR namada_node::broadcaster: Broadcaster unexpectedly shut down. 2025-04-13T23:18:05.698032Z INFO namada_node::broadcaster: Shutting down broadcaster... 2025-04-13T23:18:05.698042Z INFO namada_node: Broadcaster is no longer running. 2025-04-13T23:19:01.855644Z INFO namada_node::abortable: Broadcaster has exited, shutting down... 2025-04-13T23:19:01.855688Z INFO namada_node::tendermint_node: Shutting down Tendermint node... 2025-04-13T23:19:01.855744Z INFO namada_node: Namada ledger node started. 2025-04-13T23:19:01.855798Z INFO namada_node: This node is a validator 2025-04-13T23:19:01.855777Z INFO tower_abci::v037::server: ABCI server starting on tcp socket addr=127.0.0.1:26658 2025-04-13T23:19:01.864985Z INFO namada_node: Tendermint node is no longer running. 2025-04-13T23:19:01.865037Z INFO namada_node: Namada ledger node has shut down. 2025-04-13T23:19:01.865124Z INFO namada_node: Shutting down ABCI server... 2025-04-13T23:19:01.936083Z INFO namada_node::ethereum_oracle: Ethereum event oracle is no longer running url="http://127.0.0.1:8545"

#

something weird has happened, I might try setting up a new node from scratch

#

problem is I've already lost a delegator 😦

pure escarp Apr 13, 2025, 11:22 PM

#

maybe set up a scratch node, compare a vanilla config with your config file.

#

you're already jailed so I would say take your time to get it running again

#

more people have this issue so would be good to pinpoint

shrewd gyro Apr 13, 2025, 11:25 PM

#

pure escarp more people have this issue so would be good to pinpoint

indeed, I'll get a fresh node setup with a fresh install of Ubuntu, I have a feeling something weird might have happened to the OS that also caused namadan to get OOM'd

#

possibly an Ubuntu update

shrewd gyro Apr 14, 2025, 2:23 PM

#

in case anyone else still has this issue, it's worth trying this workaround: #🚀-mainnet-operations message

#toubleshooting