#Monitoring FSx for ONTAP file systems us...
1 messages · Page 1 of 1 (latest)
I am, so I've managed to get my instance deployed when it deployed it seemed to leave my secret behind which amazon is meant to store in aws secrets. They left the poller in a restarting state, spotting the secret wasn't created I've updated the harvest.yml file which it stores it in /home/ec2/harvest_install and i've now been able to get my pollers online CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0c96a329cf31 cr.netapp.io/harvest:latest "bin/poller --config…" 7 days ago Up About a minute harvest_cluster-2 9baad4ab7877 cr.netapp.io/harvest:latest "bin/poller --config…" 7 days ago Up About a minute harvest_cluster-1
Only the poller is not grabbing any data
Ah its started to log some error messages now whereas before it was clean
I'm getting a connection error now which I probably can fix
intrigued as to why the secret never got created it never does report errors in the cloud formation logs
agreed that seems problematic and hopefully is not a harbinger of things to come. Looking at their fsx-ontap-harvest-grafana.template it looks pretty straightforward, these are the only two references to FSxAdminPassword=
FSxAdminPassword=aws secretsmanager get-secret-value --secret-id ${SecretName} --region ${AWS::Region} | jq --raw-output .SecretString | jq -r .password
and then catting into yaml
password: $FSxAdminPassword
perhaps this cmd returned nothing?
aws secretsmanager get-secret-value --secret-id ${SecretName} --region ${AWS::Region} | jq --raw-output .SecretString | jq -r .password
from what I can gather it just never had the secret to even read it, despite giving it the secret
you can still even see the defaults you give it in the cloudformation stack
Do you have any idea why they also run it on port 80? I'm half tempted to bin this instance and just build my own
i don't. I helped the Amazon engineers that wrote the cloud formation script a few months ago. The issue then was Python version differences related to Ansible since their script uses this under the covers https://github.com/netapp-automation/harvest_install
yeah its basically that repo with all the defaults
username: admin
password: pass
i'm a little more use to NAbox and how that uses traefik on top. Does the ansible version just run nginx? I can probably configure grafana to run https and spin the container
it almost sounds like this isn't running for you?
thats exactly what failed
An error occurred (ResourceNotFoundException) when calling the GetSecretValue operation: Secrets Manager can't find the specified secret.
./harvest-grafana.sh: line 17: ansible-playbook: command not found
./harvest-grafana.sh: line 18: ansible-playbook: command not found
at a high-level, the Ansible script creates a container per poller, extracts the Prom ports from harvest.yml and uses those to create the scrape targets for prometheus.yml, starts Prometheus and Grafana containers, and then each poller container
ansible-playbook: command not found sounds like a more fundamental problem. That should have been installed as part of the runcmd: in their template
i'll do my best to help debug @rugged wraith, but since this is Amazon's script, it might be better to ask them too. Or as you said, bail and stand up your own instance. Are you trying to monitor FSx or deploy Harvest in AWS?
maybe the password didn't work because of the required format they mention? Not sure if you saw that or not Validate secret is stored in format {"username" : "fsxadmin", "password" : "<your password>"}
Looks like ansible was installed, trying to monitor FSX ```Package Version
ansible 6.5.0
ansible-core 2.13.5
certifi 2022.9.24```
ansible is installed but what about ansible-playbook?
they dont actually specify it in their runcmd
yep yep I see that, my guess is ansible should install it?? but the error is command not found - is that what line 17 has?
2
3 FSxAdminPassword=`aws secretsmanager get-secret-value --secret-id ! --region eu-west-1 | jq --raw-output .SecretString | jq -r .password`
4
5 cat <<__EOF__ >> /home/ec2-user/harvest_install/harvest/harvest.yml
6 cluster-2:
7 datacenter: eu-west-1
8 addr:
9 auth_style: basic_auth
10 username: fsxadmin
11 password: $FSxAdminPassword
12 ansible_port: 25002
13 __EOF__
14
15 chmod 500 /home/ec2-user/harvest_install/harvest/harvest.yml
16 cd /home/ec2-user/harvest_install
17 ansible-playbook manage_harvest.yml
18 ansible-playbook manage_harvest.yml --tags api
19 sleep 10
20 /opt/aws/bin/cfn-signal -e $? --stack aws-netapperf --resource Instance --region eu-west-1```
right. and what if you run ansible-playbook --version?
seems like your will get command not found
So as the default ec2 user I get some permission errors
ah!
. Unable to create local directories(/home/ec2-user/.ansible/tmp): [Errno 13] Permission denied: b'/home/ec2-user/.ansible'```
sounds very promising as a root cause 😃
Yeah I think so, one for AWS you think?
Thanks for the assistance Chris
you bet - if you get a chance, let us know when you get unstuck
Hey @rocky dragon coming back to this - I got the aws harvest working, it appears that it gives you a very limited ability to monitor. Only volumes/svm's and no cluster details/CPU etc.. with alerts such as 3:29PM ERR collector/zapi.go:525 > Unable to get nodes. error="failure invoking zapi: system-node-get-iter API request rejected => Insufficient privileges: user 'x' does not have read access to this resource" Poller=cluster-2 collector=Zapi:Node
Upon trying to assign a role/permissions for this FSXN does not give you the permissions either, is this expected behaviour?
Error: command failed: not authorized for that command```
hi @rugged wraith awesome! Glad you got it sorted. yes, that is a limitation of the permissions that FSx exposes. Let me see if I can dig up any documentation on that
I tried my best to find anything as i'd really like to see atleast some of the normal dashboards, I'm just in the middle of raising a case with AWS on the same
right, with the limited set of permissions, there are only two Harvest dashboards with data
volume and svm
actually security also works but is that the expected behaviour then?
Seems like such a waste 😦
yes that is expected - it would be a good question to ask in #┊・hybrid-cloud - unfortunately, there is nothing Harvest can do about it, those are the only resources the fsxadmin has access to
sure
i take that back, the Harvest team can better document that so you know before you try setting it up too
I mean most of it is in AWS anway
but i'd have expected the compatibility to be similar
i'll try moan at aws 😄
"encourage" 😆
something like that 😄
let me know if you make any headway, in the meantime, I'll make sure our docs are updated
@rugged wraith I’m the one to whom you can moan. Feel feee to ping me