#Netdata Node claims fine, but reports as `unseen`

1 messages · Page 1 of 1 (latest)

median umbra
#

Hi, so I just wanted to add a new node to a new room. The node itself is on ubuntu.
The node claimed successfully, but reports as unseen in the cloud.
I can access the local netdata web interface successfully.
I can also curl https://app.netdata.cloud.
Is there anything I'm missing?

root@Ubuntu-1804-bionic-64-minimal ~ # netdatacli aclk-state
ACLK Available: Yes
ACLK Version: 2
Protocols Supported: Protobuf
Protocol Used: Protobuf
MQTT Version: 5
Claimed: Yes
Claimed Id: 5ae86db2-83a6-11ee-9dd4-4b4a4695fe4b
Cloud URL: https://app.netdata.cloud
Online: No
Reconnect count: 0
Banned By Cloud: No
Next Connection Attempt At: 2023-11-15 12:45:44
Last Backoff: 347.983

root@Ubuntu-1804-bionic-64-minimal ~ # cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
root@Ubuntu-1804-bionic-64-minimal ~ # uname -a
Linux Ubuntu-1804-bionic-64-minimal 5.4.0-165-generic #182-Ubuntu SMP Mon Oct 2 19:43:28 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
strange viper
#

👍

median umbra
#

The netdata service is up and running.
I already tried restarting netdata a couple of times
I tried briefly disabling ufw
I did not try explicitly establishing a websocket however, mostly because I don't know how to do that via cli.

#

wget https://my-netdata.io

--2023-11-15 12:54:59--  https://my-netdata.io/
Resolving my-netdata.io (my-netdata.io)... 2a06:98c1:3120::3, 2a06:98c1:3121::3, 188.114.97.3, ...
Connecting to my-netdata.io (my-netdata.io)|2a06:98c1:3120::3|:443... connected.
ERROR: cannot verify my-netdata.io's certificate, issued by ‘CN=GTS CA 1P5,O=Google Trust Services LLC,C=US’:
  Unable to locally verify the issuer's authority.
To connect to my-netdata.io insecurely, use `--no-check-certificate'.
#

curl https://my-netdata.io

<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx/1.22.1</center>
</body>
</html>
#

Just tried reinstalling ca-certificates and wget. the wget behaviour has not changed.

#

Example of another node in a different room that works fine:

root@Ubuntu-1804-bionic-64-minimal ~ # cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
root@Ubuntu-1804-bionic-64-minimal ~ # netdatacli aclk-state
ACLK Available: Yes
ACLK Version: 2
Protocols Supported: Protobuf
Protocol Used: Protobuf
MQTT Version: 5
Claimed: Yes
Claimed Id: f1e3df90-768b-4112-81a9-7b7d22f858c7
Cloud URL: https://app.netdata.cloud
Online: Yes
Reconnect count: 0
Banned By Cloud: No
Last Connection Time: 2023-11-10 10:50:31
Last Connection Time + 3 PUBACKs received: 2023-11-10 10:50:31
Received Cloud MQTT Messages: 15627
MQTT Messages Confirmed by Remote Broker (PUBACKs): 16990

> Node Instance for mGUID: "e6bc928a-305b-11ee-8e5b-39cbc4e67206" hostname "Ubuntu-1804-bionic-64-minimal"
        Claimed ID: f1e3df90-768b-4112-81a9-7b7d22f858c7
        Node ID: 690cb550-6f2a-4e8a-a84c-4f241df345cf
        Streaming Hops: 0
        Relationship: self
        Alert Streaming Status:
                Updates: 1
                Pending Min Seq ID: 0
                Pending Max Seq ID: 0
                Last Submitted Seq ID: 8741
root@Ubuntu-1804-bionic-64-minimal ~ # uname -a
Linux Ubuntu-1804-bionic-64-minimal 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
#

need anything else?

strange viper
#

I think for now is enough, will ping if we need anything extra

median umbra
#

cool!

strange viper
#

actually one more thing, which is the agent version?

median umbra
#
root@Ubuntu-1804-bionic-64-minimal ~ # apt info netdata
Package: netdata
Version: 1.43.2
Priority: optional
Section: net
Maintainer: Netdata Builder <[email protected]>
Installed-Size: 51.4 MB
Pre-Depends: adduser, dpkg (>= 1.17.14), libcap2-bin (>= 1:2.0), lsb-base (>= 3.1-23.2)
Depends: openssl, libbson-1.0-0 (>= 1.21.0), libc6 (>= 2.35), libelf1 (>= 0.144), libgcc-s1 (>= 4.0), libjson-c5 (>= 0.15), liblz4-1 (>= 1.8.2), libmongoc-1.0-0 (>= 1.21.0), libprotobuf23 (>= 3.12.4), libsnappy1v5 (>= 1.1.8), libssl3 (>= 3.0.0~~alpha1), libstdc++6 (>= 11), libuuid1 (>= 2.16), libuv1 (>= 1.4.2), zlib1g (>= 1:1.2.3.3), netdata-plugin-ebpf (= 1.43.2), netdata-plugin-apps (= 1.43.2), netdata-plugin-pythond (= 1.43.2), netdata-plugin-go (= 1.43.2), netdata-plugin-debugfs (= 1.43.2), netdata-plugin-nfacct (= 1.43.2), netdata-plugin-chartsd (= 1.43.2), netdata-plugin-slabinfo (= 1.43.2), netdata-plugin-perf (= 1.43.2)
Recommends: netdata-plugin-systemd-journal (= 1.43.2)
Suggests: netdata-plugin-cups (= 1.43.2), netdata-plugin-freeipmi (= 1.43.2)
Conflicts: netdata-core, netdata-plugins-bash, netdata-plugins-python, netdata-web
Homepage: https://netdata.cloud
Download-Size: 12.9 MB
APT-Manual-Installed: yes
APT-Sources: http://repo.netdata.cloud/repos/stable/ubuntu jammy/ Packages
Description: real-time charts for system monitoring
 Netdata is a daemon that collects data in realtime (per second)
 and presents a web site to view and analyze them. The presentation
 is also real-time and full of interactive charts that precisely
 render all collected values.

N: There are 13 additional records. Please use the '-a' switch to see them.
root@Ubuntu-1804-bionic-64-minimal ~ # netdata -V
netdata v1.43.2
strange viper
#

getting some questions from the team

#

if curl doesn't have problem lets check which shared libraries is using wget

ldd /usr/bin/wget

vs

ldd /usr/bin/curl

median umbra
#

(the '```' at the start and end were inserted by me, because that is discord formatting. Discord just didn't allow me to send that as text, cause the message is too large)

strange viper
#

Claimed Id: f1e3df90-768b-4112-81a9-7b7d22f858c7 was looking to this node and I see that this one is running v1.42.4

#

would you be able to have the same agent version on the one that fails to connect to Cloud?

strange viper
#

ok, something strange on our internal events then

median umbra
#

I'll restart netdata on that node rq

#

mhmm...

root@Ubuntu-1804-bionic-64-minimal ~ # systemctl status netdata
● netdata.service - Real time performance monitoring
     Loaded: loaded (/lib/systemd/system/netdata.service; enabled; vendor preset: enabled)
     Active: deactivating (stop-sigterm) since Wed 2023-11-15 14:45:04 CET; 50s ago
   Main PID: 2387004 (netdata)
      Tasks: 102 (limit: 76957)
     Memory: 256.4M
        CPU: 4h 20min 59.848s
     CGroup: /system.slice/netdata.service
             ├─2387004 /usr/sbin/netdata -D -P /var/run/netdata/netdata.pid
             └─2387044 /usr/sbin/netdata --special-spawn-server

Nov 10 10:50:28 Ubuntu-1804-bionic-64-minimal systemd[1]: Started Real time performance monitoring.
Nov 10 10:50:29 Ubuntu-1804-bionic-64-minimal sudo[2388184]:  netdata : command not allowed ; PWD=/etc/netdata ; USER=root ; COMMAND=validate
Nov 15 14:45:04 Ubuntu-1804-bionic-64-minimal systemd[1]: Stopping Real time performance monitoring...

That node now has issues as well

strange viper
#

😅

#

so something around v1.43.2

median umbra
#

probably

#

I'll remove netdata and then reinstall 1.42.4 then ig

#

it gets wierder....

root@Ubuntu-1804-bionic-64-minimal ~ # apt install netdata=1.42.4
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 netdata : Depends: netdata-plugin-ebpf (= 1.42.4) but it is not going to be installed
           Depends: netdata-plugin-apps (= 1.42.4) but it is not going to be installed
           Depends: netdata-plugin-pythond (= 1.42.4) but it is not going to be installed
           Depends: netdata-plugin-go (= 1.42.4) but it is not going to be installed
           Depends: netdata-plugin-debugfs (= 1.42.4) but it is not going to be installed
           Depends: netdata-plugin-nfacct (= 1.42.4) but it is not going to be installed
           Depends: netdata-plugin-chartsd (= 1.42.4) but it is not going to be installed
           Depends: netdata-plugin-slabinfo (= 1.42.4) but it is not going to be installed
           Depends: netdata-plugin-perf (= 1.42.4) but it is not going to be installed
           Recommends: netdata-plugin-systemd-journal (= 1.42.4) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
root@Ubuntu-1804-bionic-64-minimal ~ # apt-mark showhold
#

okay this works for installing an older version apt install netdata=1.42.4 netdata-plugin-ebpf=1.42.4 netdata-plugin-apps=1.42.4 netdata-plugin-pythond=1.42.4 netdata-plugin-go=1.42.4 netdata-plugin-debugfs=1.42.4 netdata-plugin-nfacct=1.42.4 netdata-plugin-chartsd=1.42.4 netdata-plugin-slabinfo=1.42.4 netdata-plugin-perf=1.42.4 netdata-ebpf-code-legacy=1.42.4

#

okay the node with the claim-id f1e3df90-768b-4112-81a9-7b7d22f858c7 is downgraded to 1.42.4 and the start succeeded

strange viper
#

👍

median umbra
#

on another node apt autopurge netdata is taking way too long with the last message being Removing netdata (1.43.2) ...

#

downgraded c57aaa16-4781-47f5-b48f-901f633c413a also from 1.43.2 to 1.42.4 (same room as f1e3df90-768b-4112-81a9-7b7d22f858c7)

strange viper
#

did this one connect successfully as well?

median umbra
#

my downgrading procedure was:

apt autopurge netdata
apt install netdata=1.42.4 netdata-plugin-ebpf=1.42.4 netdata-plugin-apps=1.42.4 netdata-plugin-pythond=1.42.4 netdata-plugin-go=1.42.4 netdata-plugin-debugfs=1.42.4 netdata-plugin-nfacct=1.42.4 netdata-plugin-chartsd=1.42.4 netdata-plugin-slabinfo=1.42.4 netdata-plugin-perf=1.42.4 netdata-ebpf-code-legacy=1.42.4
apt-mark hold netdata-ebpf-code-legacy
apt-mark hold netdata-plugin-apps
apt-mark hold netdata-plugin-chartsd
apt-mark hold netdata-plugin-debugfs
apt-mark hold netdata-plugin-ebpf
apt-mark hold netdata-plugin-go
apt-mark hold netdata-plugin-nfacct
apt-mark hold netdata-plugin-perf
apt-mark hold netdata-plugin-pythond
apt-mark hold netdata-plugin-slabinfo
apt-mark hold netdata
median umbra
#

also downgraded the one node that I was initially writing about...

#

that one did not connect strangely enough

#
root@Ubuntu-1804-bionic-64-minimal ~ # netdatacli aclk-state
ACLK Available: Yes
ACLK Version: 2
Protocols Supported: Protobuf
Protocol Used: Protobuf
MQTT Version: 5
Claimed: Yes
Claimed Id: 5ae86db2-83a6-11ee-9dd4-4b4a4695fe4b
Cloud URL: https://app.netdata.cloud
Online: No
Reconnect count: 0
Banned By Cloud: No
Next Connection Attempt At: 2023-11-15 15:05:57
Last Backoff: 43.863
root@Ubuntu-1804-bionic-64-minimal ~ # netdata -v
netdata v1.42.4
#

I should really rename one of the two machines with the identical ubuntu names

#

f1e3df90-768b-4112-81a9-7b7d22f858c7 is now named PixelMain
c57aaa16-4781-47f5-b48f-901f633c413a is named Pixel1

5ae86db2-83a6-11ee-9dd4-4b4a4695fe4b is named the default ubuntu thing.
Let's just call that by the room name it's in, so ApolloNetwork.

median umbra
#

If you have any updates or need anything else please don't hesitate to @ me. Additionally @jovial burrow is the actual owner of the server ApolloNetwork. I am right now only acting on behalf of him (and cannot myself make sense of why this is not working).

#

additionally I think I just stumbled across something important... Lemme just push a logfile here (if discord allows)

#

oops. 2Mib logfile. But one thing should immediately become apparent:

2023-11-15 12:46:10: netdata INFO  : ACLK_MAIN : Attempting connection now
2023-11-15 12:46:11: netdata ERROR : ACLK_MAIN : Cert Chain verify error:num=20:unable to get local issuer certificate:depth=2:/C=US/O=Internet Security Research Group/CN=ISRG Root X1
2023-11-15 12:46:11: netdata ERROR : ACLK_MAIN : SSL_write Err: SSL_ERROR_SSL
2023-11-15 12:46:11: netdata ERROR : ACLK_MAIN : Couldn't write HTTP request header into SSL connection
2023-11-15 12:46:11: netdata ERROR : ACLK_MAIN : Couldn't process request
2023-11-15 12:46:11: netdata ERROR : ACLK_MAIN : Error trying to contact env endpoint
2023-11-15 12:46:11: netdata ERROR : ACLK_MAIN : Failed to Get ACLK environment (cannot contact ENV endpoint)
2023-11-15 12:46:11: netdata INFO  : ACLK_MAIN : Wait before attempting to reconnect in 0.916 seconds
2023-11-15 12:46:11: netdata ERROR : UV_WORKER[86] : DBENGINE: error while reading extent from datafile 2 of tier 0, at offset 3600384 (33085 bytes) to extract page (PD) from 1700047628 (2023-11-15 12:27:08) to 1700048651 (2023-11-15 12:44:11)  of metric 9911f37a-d0d1-4ea2-94de-23a3cc6a33e0: header is INVALID (similar messages repeated 77 times in the last 0 secs)
2023-11-15 12:46:11: netdata INFO  : ACLK_MAIN : Attempting connection now

etc. etc. etc.

#

so something with the node's certificates is funky

#

like really funky...

cyan frost
#

Hi! So the bottom line of this is that 1.43.2 does not connect (certificate issue) vs 1.42.4 that works.... I'll try to replicate this on a 18.04 and see

median umbra
#

1.42.4 works on 2/3 nodes

#

1 having the cert issue

#

1.43.3 doesn’t work on 2/3 nodes tested (one being the cert issue one, so really 1/2 tested)

#

Honestly I‘m not sure how to fix the cert issue. I already tried reinstalling ca-certificates

#

okay, I actually just fixed wget's certificate issues, by adding ca_directory = /etc/ssl/certs/ to /etc/wgetrc

#

however the node I've dubbed ApolloNetwork still has issues conecting

cyan frost
#

Hmmm, not sure if there is a way to specify globaly, e.g. by an env variable or similar

#

(load models etc?) ok, that's a different issue, has to do with ML

median umbra
#

not too worried about that for now

#

just found it odd (which is why I included it)

cyan frost
#

Although I haven't tried, just googling around, could you try specifying the path of certs in SSL_CERT_DIR ?

median umbra
#

you mean as in env variable?

cyan frost
#

yes

median umbra
#

I appended that to /etc/default/netdata. let's see

#

that worked...

#

still have those ml module issues tho

#

though I could just enable log protection again and don't worry about it

cyan frost
#

Ah, nice 🙂

#

It could be that because of installing a previous netdata version, the db is then "mixed" up let's say

median umbra
#

ah. got it.

cyan frost
#

You could try to remove /var/cache/netdata/ml something db (sorry don't remember the name of the sqlite db for ml)

median umbra
median umbra
#
2023-11-16 10:38:05: netdata ERROR : HEALTH : HEALTH [Ubuntu-1804-bionic-64-minimal]: Got null family field. Ignoring it.
2023-11-16 10:38:05: netdata LOG FLOOD PROTECTION too many logs (201 logs in 0 seconds, threshold is set to 200 logs in 1200 seconds). Preventing more logs from process 'netdata' for 1200 seconds.
cyan frost
#

Ah, so no more Failed to load modules, etc ?

median umbra
#

ohh never mind

#

I just disabled log flood protection, so I can actually see shit

#

and no more failed to load modules

#

(those errors have come after that logline in the past, so I couldn't tell what was going on)

cyan frost
#

Ah, okay, nice, thanks!

median umbra
#

any chance I can fix that null family field?

cyan frost
#

Hmm, let me check one sec...

#

You are running 1.42.X or 1.43 ?

median umbra
#

got that one on other nodes too

#

1.42.4

cyan frost
#

okay, that should be fixed in 1.43

median umbra
#

I'm gonna try 1.43.2 then

cyan frost
#

and 1.43.2 was def the one having cloud connection issues, right?

median umbra
#

1.43.2 was having starting issues

#

the cloud connection issues were solely because the ssl certs

#

and node specific

cyan frost
#

We could check the starting issue...

median umbra
#

okay updated ApolloNetwork to 1.43.2 without issue

cyan frost
#

Cool!

median umbra
#

updated all nodes to 1.43.2 now. dunno what systemctl was on back then

cyan frost
#

🙂

median umbra
#

I see no further issues that need addressing now. Thanks for your help!

#

I closed the post now, which apparently removes it from the post view in #1026434195024781412.