#Data retention after upgrade to 4.0

1 messages · Page 1 of 1 (latest)

reef shadow
#

It seems like the data retention has changed after upgrading from version 3 to 4.
Maybe it's just some of the dashboards.. right now I am looking at the Volumes and SVM dashboards and they are blank prior to the update... this is also true for Power details, while Aggregates seems to hold much older data... is this a bug ? Because I think we need to revert back if this upgrade has truncated data..

reef shadow
#

I have been looking at the /vm/ and it looks like "some" old data is there, but data prior to the upgrade has an instance called "nabox-harvest2" whereas the new data has an instance that is just "harvest"... I guess this makes a difference in the grafana graphs.. any way to "fix" this?

obsidian narwhal
#

That’s on custom panels ?

#

Ah maybe not. I’m not sure there is a way to rename the labels

reef shadow
#

As shown here... but it's not all data points as you can also see in this picture...

#

...we are looking back 7 days in the example, and you can clearly see that something is missing in the 3 lower graphs... but the three above seems OK?

obsidian narwhal
#

Ok, so one possible fix would be to change the container host name

#

It wouldn’t change the data collected under « havrest » but it would go back to « nabox-harvest2 » from then on.

reef shadow
obsidian narwhal
#

You’ll get those graph if you set a time period prior to the migration. Maybe that’s good enough ?

reef shadow
#

ok it seems like it.. but we cannot create a graph that spans from now and 90 days back... not great I'm afraid...

#

isn't it possible to reuse the same instance name as the old installation? I guess that's what "breaks" it?

obsidian narwhal
#

It’s possible it’s actually the container host name.

Maybe @rotund matrix or @dense canopy can chime in. Is there a way to make the harvest instance name less sticky ?

rotund matrix
obsidian narwhal
#

Sounds interesting. I could implement a blanket replacement of nabox-harvest2 to the new name

#

Let me do some tests.

obsidian narwhal
#

If that's the "instance" causing the problem, I see the port number is also part of it

obsidian narwhal
#

Something I didn't quite get. relabeling is done during scraping, not queries

rotund matrix
#

yep, it changes what is stored in the time series db

obsidian narwhal
#

dang it

obsidian narwhal
#

@reef shadow do you still have NAbox 3 running with full historical data ?

reef shadow
obsidian narwhal
#

Ok. I’m exploring some options with harvest team. One of them would be to relabel data at import time to normalize to the new convention in NAbox 4.
Another option would be to change instance name in 4.0.3 to revert to the same name as v3 but it would « break » data collected by new instances of NAbox 4. Compromises are necessary I’m afraid. Still debating the best option

reef shadow
obsidian narwhal
reef shadow
obsidian narwhal
#

You can upgrade your current v4 node, and delete all data in /data/victoria-metrics-data/* or you can deploy a new one and upgrade it

reef shadow
#

Then again... it seems to be the root that is full?

reef shadow
#

...I installed a new 4.0.3, applied the main.migrate update you provided, and started a new migration... lets see how it goes... (takes 4-5 hours) 😉

obsidian narwhal
#

I had that before for another user, filling up root. Happens if connection to Victoria metrics breaks. Which seems odd as it’s local

reef shadow
obsidian narwhal
#

ah...

#

did you restart services after ?

reef shadow
#

PS: Just ran into the DNS issue again with the new installation... I have never had any issues with other linux hosts running DHCP.. not sure if it's the name servers or the search domain that is not set... after I set them manually and rebooted, it seems to work... and I got the old data as well 🙂

reef shadow
obsidian narwhal
#

could be it then... I guess !

reef shadow
#

Your migration upgrade package has pushed the installation "back" to 4.0.2 right? Would it be OK to upgrade to 4.0.3 then?

obsidian narwhal
#

Not sure I understand ? "migration" binary is part of NAbox 4 distribution, that's why you strat by pulling it to the NAbox 3 instance

#

as I remember it, there is no pending changes that are not part of 4.0.3 so yes, you can upgrade to 4.0.3

#

now, you still need to release capacity in root correct ?

upbeat tendon
obsidian narwhal
#

if you dc down it should free up capacity, then dc up -d

obsidian narwhal
reef shadow
# obsidian narwhal So what happens exactly ? With DHCP you're not getting the DNS server ? nor the ...

To be honnest didn't even check.. it was installed from OVF with DHCP enabled.. then as I started to import from the old installation, I noticed that the nodes were not installed, I then tried to install them, but they complaint because I was unable to add them using their dns name.. (even full dns).. but I could add them with their IP no problem... I then remembered that I had the issue before... changed to static IP/DNS etc.. and it worked OK after that...

#

If you wan I can try to replicate it?

obsidian narwhal
#

If you get a chance I'd be curious. I know that domain names is tricky with some OS to propagate through DHCP, but DNS server is pretty basic stuff

#

And I never had that issue in any of the lab deployment

#

I can push a version that stores temporary migration data to /data if you need. It shouldn't be necessary but well...

reef shadow
obsidian narwhal
#

that's the build I provided

#

not 4.0.3 correct ?

reef shadow
reef shadow
obsidian narwhal
#

lol.

reef shadow
#

my guess is that it uses a global dns somewhere

obsidian narwhal
#

resolvectl status ?

reef shadow
#

naboxtest /home/admin # resolvconf status
Expected either -a or -d on the command line.

obsidian narwhal
#

uh ??

reef shadow
#

resolv.conf shows:

#

nameserver 127.0.0.53
options edns0 trust-ad
search .

#

naboxtest /home/admin # resolvconf -h
resolvconf -a INTERFACE < FILE
resolvconf -d INTERFACE

Register DNS server and domain configuration with systemd-resolved.

-h --help Show this help
--version Show package version
-a Register per-interface DNS server and domain data
-d Unregister per-interface DNS server and domain data
-f Ignore if specified interface does not exist
-x Send DNS traffic preferably over this interface

This is a compatibility alias for the resolvectl(1) tool, providing native
command line compatibility with the resolvconf(8) tool of various Linux
distributions and BSD systems. Some options supported by other implementations
are not supported and are ignored: -m, -p, -u. Various options supported by other
implementations are not supported and will cause the invocation to fail:
-I, -i, -l, -R, -r, -v, -V, --enable-updates, --disable-updates,
--updates-are-enabled.

See the resolvectl(1) man page for details.

obsidian narwhal
#

no search domain is ok I guess

#

as it might not be part of DHCP config

#

but resolvectl status not working is extremely weird

#

ls -al /bin/resolvectl

reef shadow
#

naboxtest /home/admin # ls -la /bin/resolvectl
-rwxr-xr-x. 1 root root 149776 Jul 1 23:17 /bin/resolvectl

obsidian narwhal
#

Ok, nothing weird

#

I'm surprised by your prompt

#

I have "admin@localhost ~ $ "

#

maybe your terminal emulator is overriding PS1 ?

reef shadow
#

I did a "sudo bash" 🙂

obsidian narwhal
#

ok makes sense

#

though resolvectl statusshould work

reef shadow
#

but same thing if I go back as admin

#

well sorry, but not here 🙂

obsidian narwhal
#

Wait, did you type resolvconf ?

reef shadow
#

aha 🙂

obsidian narwhal
#

that explains it

reef shadow
#

I use <TAB> because I'm lazy...

#

admin@naboxtest ~ $ resolvectl status
Global
Protocols: -LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (ens192)
Current Scopes: DNS
Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 10.0.2.101
DNS Servers: 10.0.2.101 10.0.2.102

obsidian narwhal
#

and 10.0.2.101 should definitely resolv your local host names ?

reef shadow
#

no 10.0.2.101 is one of my DNS servers...

#

admin@naboxtest ~ $ nslookup 10.0.2.101
101.2.0.10.in-addr.arpa name = dc01.bbaas.local.

obsidian narwhal
#

that's what I'm saying, it should be able to resolv your local host names

reef shadow
#

admin@naboxtest ~ $ nslookup dc01.bbaas.local
Server: 127.0.0.53
Address: 127.0.0.53#53

** server can't find dc01.bbaas.local: SERVFAIL

admin@naboxtest ~ $ ping dc01.bbaas.local
ping: dc01.bbaas.local: Temporary failure in name resolution

obsidian narwhal
#

SERVFAIL is interesting

reef shadow
#

reverse lookup seems to work...

obsidian narwhal
#

sudo journalctl -u systemd-resolved

reef shadow
#

the DHCP server is running on a Mikrotik with RouterOS, and DNS is a Windoze box.... but as mentioned I have never seen an issue with any of our other Linux dists... mostly Ubuntu and Rocky...

obsidian narwhal
#

Yes that's definitely a host side issue

reef shadow
#

admin@naboxtest ~ $ sudo journalctl -u systemd-resolved
Jul 08 15:14:52 localhost systemd[1]: Starting systemd-resolved.service - Network Name Resolution...
Jul 08 15:14:52 localhost systemd-resolved[303]: Positive Trust Anchors:
Jul 08 15:14:52 localhost systemd-resolved[303]: . IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d
Jul 08 15:14:52 localhost systemd-resolved[303]: Negative trust anchors: home.arpa 10.in-addr.arpa 16.172.in-addr.arpa 17.172.in-addr.arpa 18.1>
Jul 08 15:14:52 localhost systemd-resolved[303]: Defaulting to hostname 'linux'.
Jul 08 15:14:52 localhost systemd[1]: Started systemd-resolved.service - Network Name Resolution.
Jul 08 15:14:56 localhost systemd[1]: Stopping systemd-resolved.service - Network Name Resolution...
Jul 08 15:14:56 localhost systemd[1]: systemd-resolved.service: Deactivated successfully.
Jul 08 15:14:56 localhost systemd[1]: Stopped systemd-resolved.service - Network Name Resolution.
Jul 08 15:15:00 localhost systemd[1]: Starting systemd-resolved.service - Network Name Resolution...
Jul 08 15:15:00 localhost systemd-resolved[1347]: Positive Trust Anchors:
Jul 08 15:15:00 localhost systemd-resolved[1347]: . IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d
Jul 08 15:15:00 localhost systemd-resolved[1347]: Negative trust anchors: home.arpa 10.in-addr.arpa 16.172.in-addr.arpa 17.172.in-addr.arpa 18.>
Jul 08 15:15:00 localhost systemd-resolved[1347]: Defaulting to hostname 'linux'.
Jul 08 15:15:00 localhost systemd[1]: Started systemd-resolved.service - Network Name Resolution.
Jul 08 15:15:01 naboxtest systemd-resolved[1347]: System hostname changed to 'naboxtest'.
Jul 08 15:14:46 naboxtest systemd-resolved[1347]: Clock change detected. Flushing caches.
Jul 08 15:15:02 naboxtest systemd-resolved[1347]: Clock change detected. Flushing caches.

obsidian narwhal
#

dig @127.0.0.53 dc01.bbaas.local

#

mmmm. Maybe .local is throwing it off, that's supposed to be reserved by mDNSResponder

#

do you have something else than .local to try and resolve ? That would be internal ?

#

shot in the dark. Can you edit /etc/systemd/resolved.conf and put MulticastDNS=no. Then sudo systemctl restart systemd-resolved

reef shadow
#

all our local domain are .local

obsidian narwhal
#

Yes. Change it.

#

😄

reef shadow
#

The MulticastDNS=no doesn't fix it...

obsidian narwhal
#

dang it

#

I think it's safe to assume specificly .local name don't resolve. Public names are fine

reef shadow
#

just created a test domain and I am able to lookup "host.test.netapp" ok..

#

(and ping it)

obsidian narwhal
#

ok that would confirm the theory

reef shadow
#

so .local is an issue with the dist you are using I guess?

obsidian narwhal
#

Now how do we tell resolved to treat .local as regular tld

#

It seems like it

#

But couldn't google foo it yet

reef shadow
#

Seem that nsswitch.conf needs to be changed

obsidian narwhal
#

Maybe there is a combination that would work there for the hosts line

reef shadow
#

but it's handled by something.. cannot be altered directly

#

"hosts: mymachines resolve [!UNAVAIL=return] files usrfiles dns" this line need to be changed to where "dns" is before the UNAVAIL...

obsidian narwhal
#

ah yes, this is in a readonly partition

#

/etc is a symlink

reef shadow
#

so it's a remount with "rw" or somerhing

obsidian narwhal
#

No that's an immutable part of the system. Well, actually you can remove the symlink and put it directly in etc

reef shadow
#

ok, not sure how much time I would like to put into this, as a simple fix for me is just to set a static IP... I guess you are aware what the issue is and possibly how to fix it?

obsidian narwhal
#

Yes you can leave it to me, I'll setup a .local zone and do some tests

reef shadow
#

thanks

#

...and it was OK to upgrade to 4.0.3 right?

reef shadow
#

ermh... if I reboot my new 4.0.3 host installed with DHCP, then set to static IP (via GUI) it doesn't seem to reember it, and returns to DHCP setup ?? ...

#

Manage to set this by using the vApp setting on the VM..

#

...somewhat confusing way to configure IP if you ask me 😉

obsidian narwhal
#

Normally it should not reapply vmx config if it has not been changed

reef shadow
#

well I "fixed" it by setting the vApp values... (remembering to set ip like "10.10.10.10/24" syntax 😉

obsidian narwhal
#

With my research so far, it seems the "fix" is to list ".local" in the DNS domains, which is what you're doing, but I don't think you have to setup static IP for this.

I consider it a "good enough" workaround for now but I'm not closing the issue and will keep looking

obsidian narwhal
#

And I'm fixing the bug with OVF configuration reloading

obsidian narwhal
reef shadow
obsidian narwhal
#

Mmmm. And now it’s there. Maybe GitHub caching !

reef shadow