#NAbox v4 migrate: Failed to take Prometheus snapshot

1 messages · Page 1 of 1 (latest)

lucid lodge
#

Just tried the migrate utility från v3 to v4 and it failed with the following message.

Taking snapshot...time=2024-11-12T12:18:33.118Z level=ERROR source=migrator.go:59 msg="Failed to take Prometheus snapshot" error="Failed to take prometheus snapshot: %!w(<nil>)"```

So I tried following the manual process but get stumped on the first CLI command:
```$ cd /usr/local/nabox/docker-compose
-bash: cd: /usr/local/nabox/docker-compose: No such file or directory```

My goal is the migrate the historical data from v3 virtual appliance to v4 virtual appliance.
Any advice?
prime lion
#

@winter skiff

winter skiff
#

You need to be root.

lucid lodge
#

I am root for the migrate utility.

winter skiff
#

But it’s safer to connect as root. You have an incomplete environment if you simply su from admin

#

If you did « su - » it should be fine though

#

Regarding the cd command that’s extremely weird. Are you on v3 when running it ?

lucid lodge
#

I did "su -" but also tried it now by connecting as root (since that is allowed) and got the same error.

#

It's not mentioned in the instructions but yeah, I figured out I need to run the manual process on v3 but get stuck here instead:

WARN[0000] The "SNAPSHOT_PATH" variable is not set. Defaulting to a blank string.
WARN[0000] The "NABOX4_IP" variable is not set. Defaulting to a blank string.
[+] Running 1/0
 ✔ Container prometheus  Running 
naboxv3:/usr/local/nabox/docker-compose# curl -k -X POST https://localhost/prometheus/api/v1/admin/tsdb/snapshot
{"status":"error","errorType":"unavailable","error":"admin APIs disabled"}```
winter skiff
#

do you use docker-compose.override.yaml by any chance ?

lucid lodge
#

Not that I am aware of.

#

Where are the SNAPSHOT_PATH and NABOX4_IP variables supposed to be set?

winter skiff
#

no that's fine here

#

it looks like there is an extra carriage return before -d prometheus, can you check ?

#

Also what is v3 minor release ?

lucid lodge
#

NAbox 3.5.4

#

I think the carriage return is just a formatting issue, it does not look like that in the terminal.

#

I ran the command again and updated the message

lucid lodge
#

Am I perhaps missing the "--web.enable-admin-api" parameter?

winter skiff
#

That's possible, what's the prometheus block in docker-compose.yaml ? Can you do a ls -l too ?

lucid lodge
#
    image: prom/prometheus:latest
    container_name: prometheus
    hostname: prometheus
    command:
      - --config.file=/etc/prometheus/prometheus.yml
      - --web.route-prefix=/
      - --web.external-url=http://example.com/prometheus
      - --storage.tsdb.retention.time=2y
      - ${WEB_ENABLE_ADMIN_API:---log.level=info}
    volumes:
      - ../files/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ../files/prometheus/alert_rules.yml:/etc/prometheus/alert_rules.yml:ro
      - ../files/prometheus/ems_alert_rules.yml:/etc/prometheus/ems_alert_rules.yml:ro
      - "${NABOX_DATA}/prometheus:/prometheus"
      - "${NABOX_OPT}/prometheus-scrapers:/scrapers"
    labels:
      - nabox.core=true
      - traefik.enable=true
      - traefik.http.middlewares.stripPrometheus.stripprefix.prefixes=/prometheus
      - traefik.http.middlewares.stripPrometheus.stripprefix.forceslash=false
      - traefik.http.routers.prometheus.middlewares=stripPrometheus
      - traefik.http.routers.prometheus.rule=PathPrefix(`/prometheus`)
      - traefik.http.routers.prometheus.entrypoints=${TRAEFIK_ENDPOINTS}
      - traefik.http.routers.prometheus.tls=${TRAEFIK_TLS}

    restart: always```
#
total 24
-rw-r--r-- 1 root root   879 Oct 25 09:08 docker-compose.override.yaml
-rw-r--r-- 1 root root   879 Oct 25 09:08 docker-compose.override.yaml.bak
-rw-r--r-- 1 root root 11265 Aug 12 13:03 docker-compose.yaml
-rwxr-xr-x 1 root root   200 Aug 12 13:03 nabox-traefik-entrypoint.sh```
winter skiff
#

ok you do have an override file

#

That's the problem, you need - ${WEB_ENABLE_ADMIN_API:---log.level=info} in the command: block in there as well.

lucid lodge
#

Interesting! I'll try it.
Then I run the WEB_ENABLE_ADMIN_API command again, right?

austere glade
winter skiff
#

It works but you need to have that line in there

#

Otherwise, as it’s not stated in the command from the override it is dismissed from the compose file.

#

The full « command » block from override takes precedence if you prefer. It’s not merged.

lucid lodge
#

Awesome, that made it go to the next step. Tried the "migrate" utility again but it assumes Internet access to download a docker image

winter skiff
#

Mmmm, you should reapply 3.5.4 sorry about that. Sometimes docker image add fails.

lucid lodge
#

Ooh cool, thanks! Trying it. 🙂

#

Sweet, it's migrating!

lucid lodge
#

Migration finished after almost 10 hours.
However.. if I check the ONTAP: Volume dashboard I can't see the migrated samples.
Should they not appear automatically for historical data?

winter skiff
#

Yes you should see migrated data, Can you check that ports in harvest configuration are the same at the source and destination ?

lucid lodge
#

Can you please elaborate on where I see the ports used for harvest?

winter skiff
#

You would have to check harvest.yaml on both side

#

/etc/nabox/harvest/harvest.yml on v4 side, /opt/harvest2-conf/harvest.yml on v3 side

lucid lodge
#

For the Pollers the prometheus port is different for all entries, 12990, 12991 etc. in v3 but in v4 the range is 12000, 12001 etc.

winter skiff
#

Were those clusters added by migrate tool or manually ?

#

Change porte un NAbox 4 so they match NAbox 3

#

And dc restart havrest

lucid lodge
#

Thank you very much! That worked great. Migration complete!

#

But why is it called "havrest"? 😄
Seems like there's a story there.

lucid lodge
# austere glade Does the migration not work if i have an override file? This would be our case a...

My migration is done so I leave you with the summarized notes and wish you good luck:

  1. Make sure the WEB_ENABLE_ADMIN_API line is in the override file.
  2. Run the migrate tool and if it fails on pulling docker image, reapply latest v3 update file.
  3. If resorting to manual migration, only the first and last step in the process is done on v4. The rest is on v3.
  4. We don't have many systems and save data for 180 days but it took 10 hours to migrate, make sure to run it in "screen".
  5. Make sure the Poller prometheus ports are the same after migration tool is done so the historical data will be displayed.
austere glade
#

@winter skiff We have the same problem.
We have followed all steps and put the Parameter for admin api in both files and rebooted nabox.
NABOX Version 3.5.2

Restarting Prometheus with admin API enabled... OK
Waiting for Prometheus.................... Failed
Taking snapshot...time=2024-11-15T08:33:19.616Z level=ERROR source=migrator.go:59 msg="Failed to take Prometheus snapshot" error="Failed to take prometheus snapshot: %!w(<nil>)"
~ #
version: "3.7"
services:
  grafana:
    environment:
      - GF_SMTP_ENABLED=true
      - GF_SMTP_HOST=mail.muenchen.de:25
      - GF_SMTP_USER=storage.reporting
      - GF_SMTP_PASSWORD=VXrU7fuhQBNzilZdDoKh
      - GF_SMTP_FROM_ADDRESS=storage.reporting@muenchen.de
      - GF_SMTP_FROM_NAME=Grafana
      - GF_SMTP_SKIP_VERIFY=true
  prometheus:
    command:
      - --config.file=/etc/prometheus/prometheus.yml
      - --web.route-prefix=/
      - --web.external-url=http://example.com/prometheus
      - --storage.tsdb.retention.time=545d
      - ${WEB_ENABLE_ADMIN_API:---log.level=info}
lucid lodge
#

Yeah, you are missing the line for admin API

austere glade
#

Sorry this was after edit back

lucid lodge
#

Add this at the bottom there:
- ${WEB_ENABLE_ADMIN_API:---log.level=info}

austere glade
lucid lodge
#

Are you sure the formatting was correct?

austere glade
winter skiff
lucid lodge
#

Aah, clever! haha

winter skiff
#

@austere glade what about manual steps ?

austere glade
winter skiff
#

No I'm more interested in what fails when you try to do it manually

austere glade
#

Ah lol ok.
Will try Monday.
How much space does the snapshot require?
Ourd used space is 1.1t and free 400G

winter skiff
#

I think it's fine, it's going to run for a couple of days maybe

austere glade
#

Ok

#

Then i will try manually and report you if any error occours

winter skiff
#

It's important to let the migrate script create the cluster, and double check that the ports used are the same.
Don't miss the step to copy migrate from the NAbox 4 appliance

austere glade
#

Have done it. It creates the systems on our nabox4. It fails on snapshot creation

winter skiff
#

Ok Can you show me ?

austere glade
# winter skiff Ok Can you show me ?

Is that enough?
Otherwise i have to restart.

Waiting for Prometheus.................... Failed
Taking snapshot...time=2024-11-15T08:33:19.616Z level=ERROR source=migrator.go:59 msg="Failed to take Prometheus snapshot" error="Failed to take prometheus snapshot: %!w(<nil>)"
~ #```
winter skiff
#

Looks like Prometheus is taking a very long time to start

#

dc logs prometheus ?

austere glade
winter skiff
#

Ok what about the manual steps?

austere glade
#

Manual steps will try next week. Today we dont have a Change Window. We must request a new one

winter skiff
#

ok, you can try migrate -debug also if you want

austere glade
#

Sent you the debug log as PM

winter skiff
#

Yep, nothing much that's new. Let's see how the manual steps go.

late shoal
#

Running into the same issue. Re-applied the 3.5.4 update. Tried the manual steps, checked if I use an override file (I don't).
Checked the ports, they are different and were added by migrate. Will set them to the same values.

sm103959:/usr/local/nabox/docker-compose# WEB_ENABLE_ADMIN_API=--web.enable-admin-api docker compose --env-file .env --env-file .env.custom up -d prometheus WARN[0000] The "SNAPSHOT_PATH" variable is not set. Defaulting to a blank string. WARN[0000] The "NABOX4_IP" variable is not set. Defaulting to a blank string. [+] Running 1/0 ✔ Container prometheus Running 0.0s sm103959:/usr/local/nabox/docker-compose# curl -k -X POST https://localhost/prometheus/api/v1/admin/tsdb/snapshot {"status":"error","errorType":"unavailable","error":"admin APIs disabled"}

winter skiff
#

Is it possible that you used migrate on an old v4 version, or forgot to curl -o /usr/bin/migrate -k https://<NAbox 4 ip>/migrate && chmod 755 /usr/bin/migrate ?

#

can you grep WEB_ENABLE_ADMIN_API /usr/local/nabox/docker-compose/*.y*

late shoal
#

There was a override file...
Added the line under command, the migration is running.

winter skiff
#

Ha ! 😄 Excellent !

late shoal
#

Just calculated, it will run about 90 hours 😉

winter skiff
#

just in time for 2025 !

late shoal
#

Nope

winter skiff
#

today is off, doesn't count

late shoal
#

Still going strong

late shoal
#

Yeah!
2025/01/03 19:06:59 Import finished!
2025/01/03 19:06:59 VictoriaMetrics importer stats:
idle duration: 3h28m54.835022287s;
time spent while importing: 100h11m5.066899245s;
total samples: 654942658406;
samples/s: 1815930.40;
total bytes: 12.3 TB;
VM worker 0:↓ 899483 samples/s
VM worker 1:↓ 875179 samples/s
Processing blocks: 41 / 41 [██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 100.00%
2025/01/03 19:06:59 Total time: 100h11m5.822046426s

#

More than 50% space savings 😉