#Shelf fault
1 messages · Page 1 of 1 (latest)
the box is beeing decomissioned soon, i am trying to understand if this condition is such that our interim-but-low-competence hw support can handle it by swapping out something - i would fully expect to hold their hand during the operation. But if we can expect nothing bad to happen for another month or so.. then we probably leave the cluster as it is - we already ordered keystone from netapp. ....and then we fired 1000 people to finance it. (bad joke)
You need to determine what failed otherwise it's impossible to give you suggestions.
What's the output of this?
rows 0
run -node * "environment status"
sehanna04st-01> environment status chassis all
Sensor Name State Current Critical Warning Warning Critical
Reading Low Low High High
PSU2 GOOD
PSU1 GOOD
Fan3 GOOD
Fan2 GOOD
Fan1 GOOD
SP Status IPMI_HB_OK
mSATA Status OK
mSATA Pres PRESENT
sehanna04st-01>
full output coming up:
...am i interpreting this right, the circuit required to monitoring voltage is broken, but the powersupply is still doing its job?
might be. But it also might be that the PSU is slowly failing. I'd try to reseat it to see if it will recover. Wait at least 30s between pulling and plugging it back in
yeah, i was considering that - but i rather have a "slow fail" - than a provoced instant-fail if you understand what i mean.
we just need the time to evacuate data from there
cluster going to trashbin soon
maybe i let my boss make the call 🙂
Well currently you have a single-point-of-failure since PSU1 is not working. You can only improve things if you try the reseat.
Do you see the same output regarding this shelf from the other node? Or is this a single-node system?
its a nodepair, the output was done in clustershell and should show both ?
one sec
$ cat env.txt | egrep 'Environmental failure on shelves on this channel?'
Environmental failure on shelves on this channel? yes
Environmental failure on shelves on this channel? no
Environmental failure on shelves on this channel? yes
Environmental failure on shelves on this channel? no
so, yeah - we get it twice
see also first screenshots from syslog
does swapping out the PSU require takeover? i hope not, right?
no
then our interim support can handle it
Simply press the PSU switch to 0, then pull it out 5cm, wait 30s and push it back in
passed the instructions on to our data-center team
i am like one flight away from that place 🙂
you guys in raleigh?
Germany 😉
😄
Als ich noch jung und schön war hab ich als TSE in Shiphol gearbeitet 🙂
Ich arbeite nicht bei NetApp, sondern bei einem Partner
Oh, klasse...
das goldene NetApp-Logo ist für Partner
war ganz nett seinerzeit
reseating didnt work, we now source a spare somewhere