#Nabox stopped collecting

1 messages · Page 1 of 1 (latest)

north heron
#

A couple of weeks ago we noticed that our nabox instance all stopped actively reporting for all the clusters it monitored. Not sure where the best place to go is to start looking at logs but we currently have 0 actively reporting clusters when we should have 5. We restart nabox and it all briefly works again for an hour or so. Not sure if this is a bug or if it would help to provide some version numbers. Running nabox 3.1.2.

We did recently before add another cluster in running 9.12 in to monitoring, wondering if this has pushed it over the edge in terms of collecting metrics? or the new cluster is causing the rest to crash completely. Just an observation. Attached an image which shows an alert we also get strangely, though it does collect metrics albeit in briefly for an hour.

Tia

ruby cypress
#

Hi Tia, What’s the memory for the VM, and do you have available space ?

north heron
#

8gm ram, what's best command to check for space?

north heron
#

/lv_data is 100% used that the problem?

north heron
#

what is the procedure is for extending the disk?

exotic bough
#

@ruby cypress I don't see this documented on the nabox site. Can you document the steps to extend the disk there?

ruby cypress
#

Grow the VMDK and reboot 🙂

#

I should make it harder, so the paragraph would be more visible 😄

exotic bough
#

thanks. I missed that it was already documented. @north heron does that help?

north heron
#

Yeh we extended the disk and didn't auto extend the partition. Managed to use resize2fs to get it expanded in the end. All good now.

exotic bough
#

it might be worth including the need for resize2fs in the docs, I think this has been a stumbling block before

ruby cypress
#

ok that's weird

#

Not the first time indeed, but it should be fixed. When I tried it was working fine

north heron
#

Also worth noting we didn't get any alerts for space running out. Not sure what alerting method it uses but we do have alerts setup in grafana for threshold limits for the the actual clusters we monitor. But for the nabox system itself I'm not sure how that's achieved. Apologies if I've missed this in the faq site somewhere

ruby cypress
#

You didn't miss that, I don't think this is reported in NAbox stats. I'll open an issue for this and see what can be done

#

This is what NAbox does every time it boots up :

pvresize /dev/sdc && (pvdisplay /dev/sdc|grep -E -q "Free PE +0" || lvresize -r -l +"100%FREE" /dev/vg_data/lv_data)
#

lvresize -r practically does resize2fs so it should work

exotic bough
#

if the VMDK is increased but NABox is not restarted, it would cause the behavior twodot0h reported. Do you recall if you rebooted after increasing @north heron ?

north heron
#

Yeh we did, there was something mentioned about the disk not bring able to be resized but it was via a console and since lost that output.

ruby cypress
#

Ok that's interesting. I'll keep an eye on it

exotic bough
#

@ruby cypress would those cmds be logged to dmesg or somewhere else we can check for errors?

ruby cypress
#

possibly /var/log/messages but not sure

north heron
#

I'll check tomorrow and report back

exotic bough
#

thanks!

ruby cypress
#

So there are situations when lvresize -r fails if... there is not enough space 😄

ruby cypress
north heron
#

Thanks Yann. Will take a look. 👍

north heron
#

Did the nabox dashboard get upgraded after 3.1.2, all I see is processes and cpu stats in my nabox dashboard.

ruby cypress
#

Yes that’s very possible. Btw there is a bug in 3.3 as containers stats aren’t collected, but you’ll have disk stats.