#Upgrade to 22.11
1 messages · Page 1 of 1 (latest)
docker compose
when I go into http://harvest-app001:9090/alerts its showing things as inactive
was following https://netapp.github.io/harvest/22.11/install/containers/ the upgrade steps & https://github.com/NetApp/harvest/blob/main/docs/MigratePrometheusDocker.md
basically stopped all the containers, did the data migrations and then brought everything back up.
[root@harvest-app001 current]# docker volume ls
DRIVER VOLUME NAME
local harvest-21080-6_linux_amd64_grafana_data
local harvest-21080-6_linux_amd64_harvest
local harvest-21110-1_linux_amd64_grafana_data
local harvest-21110-1_linux_amd64_harvest
local harvest-21111-1_linux_amd64_grafana_data
local harvest-21111-1_linux_amd64_harvest
local harvest-22020-4_linux_amd64_grafana_data
local harvest-22020-4_linux_amd64_harvest
local harvest-22110-1_linux_amd64_grafana_data
local harvest-22110-1_linux_amd64_harvest
local harvest-22110-1_linux_amd64_prometheus_data
local harvest-220401-nightly_linux_amd64_grafana_data
local harvest-220401-nightly_linux_amd64_harvest
local harvest_prometheus_data
those are the volumes I currently have
lot of older ones for older versions so not sure I need them?
or if those also need data moved over?
I moved everything to the "harvest_prometheus_data" volume
for prometheus data migration
thanks! yep that looks good. it's probably sufficient to cp the data from the most recent version you were using previously to 22.11. Those older volumes can be cleaned up, but no rush if you want to move slower. Cleaning up is step 5 on that page, but the more important thing for us to figure out is why aren't you seeing data in 22.11
That would be great
So there is a build specific volume, but then teh harvest_prometheus_data is that more the archive one then? It'd be nice to not have to worry about moving data between upgrades
agreed. you will not need to do that in the future. That's the change we made in 22.11, to used named volumes that are not named after the release
Even though I see one for this version?
local harvest-22110-1_linux_amd64_grafana_data
local harvest-22110-1_linux_amd64_harvest
local harvest-22110-1_linux_amd64_prometheus_data
plus this one ; local harvest_prometheus_data
let's check your prom-stack - can you paste the output of head -10 prom-stack.tmpl
and head -10 prom-stack.yml
version: '3.7'
volumes:
prometheus_data: {}
grafana_data: {}
harvest: {}
networks:
frontend:
backend:
humm, seems a bit different from the template
was that from yml or .tmpl ? that's the old version that we fixed in 22.11. It should look like this
ok that makes sense, can you check the tmpl file and paste it's contents too?
version: '3.7'
volumes:
prometheus_data:
name: harvest_prometheus_data
grafana_data:
name: harvest_grafana_data
networks:
frontend:
backend:
can you rerun this command bin/harvest generate docker full --port --output harvest-compose.yml
and double check that your prom-stack.yml now matches the tmpl
no i think our documentation should be explicit about that step. We said this
and that's too vague 🙂 sorry about the confusion. If you run docker-compose -f prom-stack.yml -f harvest-compose.yml up -d --remove-orphans now do you see your earlier copied prom data?
humm, missing our pollers now
also getting " Error updating options: Bad Gateway " in graphana
oh, it regenerated the harvest.yml
when you regenerated the harvest-compose.yml it by default used the harvest.yml file in the current directory, which if you downloaded a new release, will be an example harvest.yml. we don't rewrite that file. Maybe you did not copy your earlier one into this directory?
that could be
let me recopy that in and do it again
prometheus seems to having an issue, restarting
ts=2022-12-08T15:25:48.160Z caller=main.go:798 level=info msg="Stopping scrape discovery manager..."
ts=2022-12-08T15:25:48.160Z caller=main.go:812 level=info msg="Stopping notify discovery manager..."
ts=2022-12-08T15:25:48.160Z caller=main.go:834 level=info msg="Stopping scrape manager..."
ts=2022-12-08T15:25:48.160Z caller=main.go:808 level=info msg="Notify discovery manager stopped"
ts=2022-12-08T15:25:48.160Z caller=main.go:794 level=info msg="Scrape discovery manager stopped"
ts=2022-12-08T15:25:48.160Z caller=manager.go:945 level=info component="rule manager" msg="Stopping rule manager..."
ts=2022-12-08T15:25:48.160Z caller=manager.go:955 level=info component="rule manager" msg="Rule manager stopped"
ts=2022-12-08T15:25:48.160Z caller=notifier.go:600 level=info component=notifier msg="Stopping notification manager..."
ts=2022-12-08T15:25:48.160Z caller=main.go:1054 level=info msg="Notifier manager stopped"
ts=2022-12-08T15:25:48.160Z caller=main.go:828 level=info msg="Scrape manager stopped"
ts=2022-12-08T15:25:48.160Z caller=main.go:1063 level=error err="opening storage failed: get segment range: segments are not sequential"
that makes it sound like the earlier Copy the historical Prometheus data had issues? We kinda skipped over it, but which of the many prom volumes did you copy in step 4? I wonder if that was a "good" source to copy from https://github.com/NetApp/harvest/blob/main/docs/MigratePrometheusDocker.md#copy-the-historical-prometheus-data
we can keep helping you try to migrate the prometheus data if you want. Or if you don't care about that data, we can blow it away, recreate the volume, and up everything again to get you going
one idea - the previous prometheus volume with the most data is probably the one you want? if so,
docker system df -v will show you the largest in the Local Volumes space usage: section.
there was like 4 older volumes I copied from to get all the data into the new one
ah! ok that makes sense. I doubt Prometheus supports copying multiple into the same folder
what if we delete the new volume, recreate it, copy the most recent or the largest, re up, and see if that unblocks you?
Sure, so I guess if we can get it going and then not have to worry about data migrations again then could just create a new one
yes, we should never need to have this conversation again in the future 😆
docker volume rm harvest_prometheus_data will remove the recently created one
oh i see it in the docs
docker volume create --name harvest_prometheus_data
then what you said
yep, the rm first and then that volume create, you got it
ok better.. docker ps shows the pollers running, but not sure they are collecting. harvest metadata showing there is only the unix poller
maybe just need to wait a bit?
oh, thats just that chart...
sounds promising! you can check one of the pollers if you want to docker logs -f name-of-poller-from-docker-ps
so much data from perf
🙂
Here's what I captured, two changes to the documentation:
- you must regenerate your harvest-compose file
- mention that you should only copy one previous prometheus_data into the new volume. Not multiple.
sounds good
and sounds like you're all set?
Had a couple other questions if you could
sure, shoot
When we login to Grafana is there a way to put in better authentication? ldap to ad ?
How do we add back the "ALL" to select multiple DC's and Clusters?
you're in luck - that was one of the features in 22.11 😄
Is there a way to keep data longer ?
now with the new volume for prometheus / assuming grafana gets data from prometheus ?
can we keep data for like 13 months ?
yes, Grafana uses Prometheus as its datasource. you can change Prometheus's retention https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects
Trying to get the ldap going based on the above example.
t=2022-12-08T22:45:19+0000 lvl=eror msg="Failed to read plugin provisioning files from directory" logger=provisioning.plugins path=/etc/grafana/provisioning/plugins error="open /etc/grafana/provisioning/plugins: no such file or directory"
t=2022-12-08T22:45:19+0000 lvl=eror msg="Can't read alert notification provisioning files from directory" logger=provisioning.notifiers path=/etc/grafana/provisioning/notifiers error="open /etc/grafana/provisioning/notifiers: no such file or directory"
t=2022-12-08T22:45:19+0000 lvl=info msg="warming cache for startup" logger=ngalert
t=2022-12-08T22:45:19+0000 lvl=info msg="starting MultiOrg Alertmanager" logger=ngalert.multiorg.alertmanager
t=2022-12-08T22:45:19+0000 lvl=info msg="HTTP Server Listen" logger=http.server address=[::]:3000 protocol=http subUrl= socket=
t=2022-12-08T22:45:53+0000 lvl=info msg="LDAP enabled, reading config file" logger=ldap file=/etc/grafana/ldap.toml
t=2022-12-08T22:45:53+0000 lvl=eror msg="Error while trying to authenticate user" logger=context userId=0 orgId=0 uname= error="LDAP Result Code 200 "Network Error": dial tcp 127.0.0.1:389: connect: connection refused" remote_addr=10.94.8.240
t=2022-12-08T22:45:53+0000 lvl=eror msg="Request Completed" logger=context userId=0 orgId=0 uname= method=POST path=/login status=500 remote_addr=10.94.8.240 time_ms=3 size=53 referer=http://harvest-app001:3000/login
t=2022-12-08T22:45:54+0000 lvl=info msg="LDAP enabled, reading config file" logger=ldap file=/etc/grafana/ldap.toml
t=2022-12-08T22:45:54+0000 lvl=eror msg="Error while trying to authenticate user" logger=context userId=0 orgId=0 uname= error="LDAP Result Code 200 "Network Error": dial tcp 127.0.0.1:389: connect: connection refused" remote_addr=10.94.8.60
t=2022-12-08T22:45:54+0000 lvl=eror msg="Request Completed" logger=context userId=0 orgId=0 uname= method=POST path=/login status=500 remote_addr=10.94.8.60 time_ms=3 size=53 referer=http://harvest-app001:3000/login
[root@harvest-app001 current]#
from the docker logs.
from prom-stack.yml
grafana:
container_name: grafana
image: grafana/grafana:8.3.4
depends_on:
- prometheus
ports:
- 3000:3000
volumes:
- grafana_data:/var/lib/grafana
- ./grafana:/etc/grafana/provisioning # import Harvest dashboards
- ./docker/grafana/ldap.toml:/etc/grafana/ldap.toml
- /etc/grafana/ldap.toml:/etc/grafana/ldap.toml
networks:
- backend
- frontend
restart: unless-stopped
labels:
kompose.service.type: nodeport
environment:
- GF_AUTH_LDAP_ENABLED=true
- GF_AUTH_LDAP_CONFIG_FILE=/etc/grafana/ldap.toml
hi @fresh epoch maybe you ldap.toml has the wrong address? Looks like Grafana reads the file fine but when it tries to connect to the ldap server it fails with connection refused. Firewall, wrong ip? t=2022-12-08T22:45:54+0000 lvl=eror msg="Error while trying to authenticate user" logger=context userId=0 orgId=0 uname= error="LDAP Result Code 200 "Network Error": dial tcp 127.0.0.1:389: connect: connection refused" remote_addr=10.94.8.60
yah I noticed that error, seems to trying to be sending it to the localhost? Can I DM/email you with that file info?
yes, I'll take a look, but you might get better help from Grafana. https://github.com/NetApp/harvest/wiki/FAQ#how-do-i-share-sensitive-log-files-with-netapp
what version of Grafana?
fd221b0b9278 grafana/grafana:8.3.4 "/run.sh" 16 hours ago Up 16 hours 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp grafana
looks like 8.3.4 ?
yep
i know something we can check, let me form the command
i suspect the ldap.toml grafana is using is different than the one you setup. let's try to confirm that by looking at the one Grafana is using by running docker exec -it grafana less /etc/grafana/ldap.toml
yep, looks like default config
vs what we have entered, so may not be reading it in right
you can copy your file into the container or setup a volume mount so the container "sees" your version
generally a volume mount is better
i can dig up the volume mount for you if needed
I believe we have that in the promstack.yml
volumes:
- grafana_data:/var/lib/grafana
- ./grafana:/etc/grafana/provisioning # import Harvest dashboards
- /etc/grafana/ldap.toml:/etc/grafana/ldap.toml
environment:
- GF_AUTH_LDAP_ENABLED=true
- GF_AUTH_LDAP_CONFIG_FILE=/etc/grafana/ldap.toml
although, just saw my co-worker remmed that out yesterday so just un-remmed it out again and stopped/started
now whenI do
docker exec -it grafana less /etc/grafana/ldap.toml
less: can't open '/etc/grafana/ldap.toml': Permission denied
progress!!! 😄
-rw-r----- 1 root grafana 2654 Dec 8 16:44 ldap.toml
unless its taking it from here
ls -al ./docker/grafana/ldap.toml
-rw-r----- 1 root root 2641 Dec 8 16:40 ./docker/grafana/ldap.toml
that's not owned by grafana
your volume mount says the file is on the host machine at /etc/grafana/ldap.toml
the permission issue should be on that file or the potentially the parent directory /etc/grafana you could rule those out I suppose with some permissive chmods
could be selinux also? I made the mapping locally and i see the ldap.toml inside the container without issue. This SO thread has some things that look relevant https://stackoverflow.com/questions/60175493/docker-container-cant-access-mapped-directory-from-host
drwxr-xr-x. 3 root root 84 Dec 8 16:44 grafana
that is /etc/grafana
ls -al /etc/grafana/ldap.toml
-rw-r----- 1 root grafana 2654 Dec 8 16:44 /etc/grafana/ldap.toml
[root@harvest-app001 current]# sestatus
SELinux status: disabled
those are the permissions seen inside the container on my side
I added the volume like so, just in my current Harvest directory
- ./ldap.toml:/etc/grafana/ldap.toml
strange
[root@harvest-app001 current]# docker exec -it grafana ls -la /etc/grafana/
total 48
drwxr-xr-x 3 root root 62 Jan 17 2022 .
drwxr-xr-x 1 root root 66 Dec 9 14:44 ..
-rw-r--r-- 1 root root 43461 Jan 17 2022 grafana.ini
-rw-r----- 1 root 984 2654 Dec 8 22:44 ldap.toml
drwxr-xr-x 5 root root 54 Dec 8 22:42 provisioning
its lost its group
i'll try how you have it..
hmm yeah id 984 is the problem i reckon and that was from yesterday. Maybe stop the grafana container, rm it and reup?
perhaps that was from yesterdays experiment
oooh
that seemed to help
for whatever reason
[root@harvest-app001 current]# docker exec -it grafana ls -al /etc/grafana/
total 48
drwxr-xr-x 3 root root 62 Jan 17 2022 .
drwxr-xr-x 1 root root 66 Dec 9 15:01 ..
-rw-r--r-- 1 root root 43461 Jan 17 2022 grafana.ini
-rw-r----- 1 root root 2654 Dec 9 15:00 ldap.toml
drwxr-xr-x 5 root root 54 Dec 8 22:42 provisioning
the stop, rm, and reup or moving the file locally?
now its root:root
copied the ldap.toml to the current version directory same as where harvest.yml and prom-stack.yml are
then I stopped the containers and started it
cool, probably caused by yesterday's experiments. we undid those and now hopefully you're all set
Still not letting me in so maybe I have some sort of other config problem in the ldap file
hopefully something new in the grafana log file?
not seeing much as far as errors
t=2022-12-09T15:06:39+0000 lvl=info msg="LDAP enabled, reading config file" logger=ldap file=/etc/grafana/ldap.toml
t=2022-12-09T15:06:39+0000 lvl=eror msg="Invalid username or password" logger=context userId=0 orgId=0 uname= error="invalid username or password" remote_addr=10.94.8.60
t=2022-12-09T15:06:39+0000 lvl=info msg="Request Completed" logger=context userId=0 orgId=0 uname= method=POST path=/login status=401 remote_addr=10.94.8.60 time_ms=17 size=42 referer=http://harvest-app001:3000/login
looks like it's talking to the ldap server now so that's good - but looks like the uname is blank and/or the username/password is wrong? t=2022-12-09T15:06:39+0000 lvl=eror msg="Invalid username or password" logger=context userId=0 orgId=0 uname= error="invalid username or password" remote_addr=10.94.8.60
entering username/password and is correct.. comparing a sample to what I have