#scripting the cli via ssh

1 messages · Page 1 of 1 (latest)

wheat fossil
#

I've an odd need to collect stats from the cli by way of wafltop.

This is what I'm working with

date=`date +%y_%m_%d_%H"h"_%M"m"`
touch ~/logs/$date.log
( ssh user@host 'rows 0;date;qos statistics volume latency show;qos statistics volume resource cpu show -iterations 25 -node local ode;qos statistics volume resource cpu show -iterations 25 -node localnode2;security session kill-cli -vserver host -username user -application ssh -session-id *;date' >> ~/logs/$date.log )&
( ssh user@host 'rows 0;date;node run -node <local node>; priv set advanced;options stats.wafltop.config volume,message,process;wafltop start;wafltop show -v cpu -i 10 -n 10;wafltop stop;options stats.wafltop.config volume,process;security session kill-cli -vserver host -username user -application ssh -session-id *;date' >> ~/logs/$date.log )&
sleep 6000
for PID in $(pgrep -P $$) ; do echo die ; kill -HUP $PID ; done

The goal is to gather stats in a big dump to send to support for review, they want them at an ungodly hour that frankly I don't want to provide my butt being up for that time.

Getting beyond the "node run -node node1" is breaking the command inputs.
Additionally I can't feed these through the ssh pipeline via grepping a file into it or using the ssh user@node < input_file method.

In short. Is there a method in which I can automate this collection of WAFLTOP stats?

cursive crag
#

what do you mean "node run is breaking the inputs"? you can definitely do ssh user@host 'priv set diag; node run ...' through a noninteractive SSH shell, I have a script which does exactly this. My guess is your commands are not properly quoted/escaped

wheat fossil
cursive crag
#

it's enough if you set it in the cluster shell, diag-mode translates down to nodeshell as well

wheat fossil
#

I'll give that a shot then come Tuesday.

candid hull
#

Here’s an option
Unlock diag user
systemshell -node x sudo bash
vi /mroot/cmdfile.txt (place your node shell commands here)
Set the file to +x
Now when your node shell, call that file

wheat fossil
#

TMAC coming at it with cheat codes 🙂

candid hull
#

Except I jumped the gun here. I don’t see any system shell. Just node shell.

candid hull
#

This might help

  1. When you "ssh" to the cluster and kick off the "node run" command, when you are done, it will auto-kill the ssh
  2. You need to fix the locality of the commands:
    This works from SSH (not script yet, still testing). I changed to 1 iteration, notice the priv set advanced is in new quotes along with all the "node run" commands" Hence. when the command is over, it exits back to the ssh host

ssh admin@192.168.0.101 'rows 0;date;node run -node local "priv set advanced; options stats.wafltop.config volume,message,process;wafltop start;wafltop show -v cpu -i 1 -n 1;wafltop stop;options stats.wafltop.config volume,process;exit"'

#

This seems to have worked (changed a few typos):

date=date +%y_%m_%d_%H"h"_%M"m"
touch ~/logs/$date.log
( ssh admin@192.168.0.101 'rows 0 ; date; qos statistics volume latency show;qos statistics volume resource cpu show -iterations 1 -node local ; qos statistics volume resource cpu show -iterations 1 -node local ; security session kill-cli -vserver cluster1 -username admin -application ssh -session-id *;date' >> ~/logs/$date.log )&
( ssh admin@192.168.0.101 'date ; node run -node local "priv set advanced;options stats.wafltop.config volume,message,process;wafltop start ; wafltop show -v cpu -i 1 -n 1 ; wafltop stop;options stats.wafltop.config volume,process ; exit"' >> ~/logs/$date.log )&
sleep 3
for PID in $(pgrep -P $$) ; do echo die ; kill -HUP $PID ; done

Of course I also do the "security login create -user-or-group-name admin -authentication-method publickey -application ssh" and the follow-up 'security login publickey create -username admin -index 1 -publickey "from .ssh/id_rsa.pub"'

#

That way ssh works with publickey/no password

#

"rows 0" not needed for the "node run" command, bypasses cluster shell

#

Ok @wheat fossil that might do it for you

wheat fossil
#

I have got all of it handled by keys // no pass for now as well, makes testing simpler, especially for a no-login account running the cron job once this is done.

That said, issuing everything from "" within the '' kickoff is genius and will help a bunch

valid galleon
#

And as one of the people who would want perf data, you'd be better off just sending perf archives.

wheat fossil
valid galleon
#

Oh. Just run a perfstat.

#

They probably need it for the sizing tool.

#

(yes it's not "supported" but that's not a techincal limitation)

wheat fossil
#

Re-sizing tool more like it.

valid galleon
#

Just run it for many iterations of 3 minutes, like 60 iterations.

wheat fossil
#

😛

#

Had a P1.5 ticket in October crop up that required a node failover to resolve.

Gents on the horn told us our A400 pair was too busy to fail over and not choke.
Got the perf team on the line a day or so later and they said "eh it's all back-end workload just let it rip".
We visited this right after we got the A400's with fabric pool in place and was assured if it came down to P1 ticket time. That everyone would be well versed and able to make that call during crunch time.

Needless to say my management have found a scab and are ripping it off with wild abandon.

valid galleon
#

Oof. Yeah FabricPool (tiering scanners, etc.) will use extra CPU so that's expected.

#

Failover won't be impacted.