#Monitoring FSx for ONTAP file systems us...

1 messages · Page 1 of 1 (latest)

rocky dragon
#

yes, ive heard from some customers that have. Are you hitting problems?

rugged wraith
#

I am, so I've managed to get my instance deployed when it deployed it seemed to leave my secret behind which amazon is meant to store in aws secrets. They left the poller in a restarting state, spotting the secret wasn't created I've updated the harvest.yml file which it stores it in /home/ec2/harvest_install and i've now been able to get my pollers online CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0c96a329cf31 cr.netapp.io/harvest:latest "bin/poller --config…" 7 days ago Up About a minute harvest_cluster-2 9baad4ab7877 cr.netapp.io/harvest:latest "bin/poller --config…" 7 days ago Up About a minute harvest_cluster-1

#

Only the poller is not grabbing any data

#

Ah its started to log some error messages now whereas before it was clean

#

I'm getting a connection error now which I probably can fix

#

intrigued as to why the secret never got created it never does report errors in the cloud formation logs

rocky dragon
#

agreed that seems problematic and hopefully is not a harbinger of things to come. Looking at their fsx-ontap-harvest-grafana.template it looks pretty straightforward, these are the only two references to FSxAdminPassword=

FSxAdminPassword=aws secretsmanager get-secret-value --secret-id ${SecretName} --region ${AWS::Region} | jq --raw-output .SecretString | jq -r .password
and then catting into yaml
password: $FSxAdminPassword

#

perhaps this cmd returned nothing?

#

aws secretsmanager get-secret-value --secret-id ${SecretName} --region ${AWS::Region} | jq --raw-output .SecretString | jq -r .password

rugged wraith
#

from what I can gather it just never had the secret to even read it, despite giving it the secret

#

you can still even see the defaults you give it in the cloudformation stack

#

Do you have any idea why they also run it on port 80? I'm half tempted to bin this instance and just build my own

rocky dragon
rugged wraith
#

yeah its basically that repo with all the defaults

#
username: admin
password: pass
#

i'm a little more use to NAbox and how that uses traefik on top. Does the ansible version just run nginx? I can probably configure grafana to run https and spin the container

rocky dragon
#

it almost sounds like this isn't running for you?

rugged wraith
#

thats exactly what failed

#

An error occurred (ResourceNotFoundException) when calling the GetSecretValue operation: Secrets Manager can't find the specified secret.
./harvest-grafana.sh: line 17: ansible-playbook: command not found
./harvest-grafana.sh: line 18: ansible-playbook: command not found
rocky dragon
#

at a high-level, the Ansible script creates a container per poller, extracts the Prom ports from harvest.yml and uses those to create the scrape targets for prometheus.yml, starts Prometheus and Grafana containers, and then each poller container

#

ansible-playbook: command not found sounds like a more fundamental problem. That should have been installed as part of the runcmd: in their template

#

i'll do my best to help debug @rugged wraith, but since this is Amazon's script, it might be better to ask them too. Or as you said, bail and stand up your own instance. Are you trying to monitor FSx or deploy Harvest in AWS?

#

maybe the password didn't work because of the required format they mention? Not sure if you saw that or not Validate secret is stored in format {"username" : "fsxadmin", "password" : "<your password>"}

rugged wraith
#

Looks like ansible was installed, trying to monitor FSX ```Package Version


ansible 6.5.0
ansible-core 2.13.5
certifi 2022.9.24```

rocky dragon
#

ansible is installed but what about ansible-playbook?

rugged wraith
#

they dont actually specify it in their runcmd

rocky dragon
#

yep yep I see that, my guess is ansible should install it?? but the error is command not found - is that what line 17 has?

rugged wraith
#
     2
     3    FSxAdminPassword=`aws secretsmanager get-secret-value --secret-id ! --region eu-west-1 | jq --raw-output .SecretString | jq -r .password`
     4
     5    cat <<__EOF__ >> /home/ec2-user/harvest_install/harvest/harvest.yml
     6      cluster-2:
     7        datacenter: eu-west-1
     8        addr: 
     9        auth_style: basic_auth
    10        username: fsxadmin
    11        password: $FSxAdminPassword
    12        ansible_port: 25002
    13    __EOF__
    14
    15    chmod 500 /home/ec2-user/harvest_install/harvest/harvest.yml
    16    cd /home/ec2-user/harvest_install
    17    ansible-playbook manage_harvest.yml
    18    ansible-playbook manage_harvest.yml --tags api
    19    sleep 10
    20    /opt/aws/bin/cfn-signal -e $? --stack aws-netapperf --resource Instance --region eu-west-1```
rocky dragon
#

right. and what if you run ansible-playbook --version?

#

seems like your will get command not found

rugged wraith
#

So as the default ec2 user I get some permission errors

rocky dragon
#

ah!

rugged wraith
#
. Unable to create local directories(/home/ec2-user/.ansible/tmp): [Errno 13] Permission denied: b'/home/ec2-user/.ansible'```
rocky dragon
#

sounds very promising as a root cause 😃

rugged wraith
#

Yeah I think so, one for AWS you think?

rocky dragon
#

for sure

#

hopefully once that's sorted, you'll be off to the races

rugged wraith
#

Thanks for the assistance Chris

rocky dragon
#

you bet - if you get a chance, let us know when you get unstuck

rugged wraith
#

Hey @rocky dragon coming back to this - I got the aws harvest working, it appears that it gives you a very limited ability to monitor. Only volumes/svm's and no cluster details/CPU etc.. with alerts such as 3:29PM ERR collector/zapi.go:525 > Unable to get nodes. error="failure invoking zapi: system-node-get-iter API request rejected => Insufficient privileges: user 'x' does not have read access to this resource" Poller=cluster-2 collector=Zapi:Node

#

Upon trying to assign a role/permissions for this FSXN does not give you the permissions either, is this expected behaviour?

#

Error: command failed: not authorized for that command```
rocky dragon
#

hi @rugged wraith awesome! Glad you got it sorted. yes, that is a limitation of the permissions that FSx exposes. Let me see if I can dig up any documentation on that

rugged wraith
#

I tried my best to find anything as i'd really like to see atleast some of the normal dashboards, I'm just in the middle of raising a case with AWS on the same

rocky dragon
#

right, with the limited set of permissions, there are only two Harvest dashboards with data

rugged wraith
#

volume and svm

#

actually security also works but is that the expected behaviour then?

#

Seems like such a waste 😦

rocky dragon
#

yes that is expected - it would be a good question to ask in #┊・hybrid-cloud - unfortunately, there is nothing Harvest can do about it, those are the only resources the fsxadmin has access to

rugged wraith
#

sure

rocky dragon
#

i take that back, the Harvest team can better document that so you know before you try setting it up too

rugged wraith
#

I mean most of it is in AWS anway

#

but i'd have expected the compatibility to be similar

#

i'll try moan at aws 😄

rocky dragon
#

"encourage" 😆

rugged wraith
#

something like that 😄

rocky dragon
#

let me know if you make any headway, in the meantime, I'll make sure our docs are updated

timid socket
#

@rugged wraith I’m the one to whom you can moan. Feel feee to ping me