#ONTAP REST API via netapp_ontap the Python Package Index (PyPi)

1 messages · Page 1 of 1 (latest)

stable aurora
#

I'm a NetApp GTS covering Bloomberg and trying to post this for my customer. My customer is having issues using the netapp_ontap Python Package Index (PyPi) and they are seeing requests sent to the wrong cluster, in a multi-cluster envirnonment. Here's some feedback from the customer: After I turned on the debug mode, we could see the following logs, the request was sent to nnapdh1-m and the error message came back with a different host nnapdh2 and the uuid in the url does not exist, any idea.
Debug output: 2023-11-17 10:40:58.639
2023-11-17 15:40:58,639 INFO volume.py:116 - Getting request to validate volume bvs_i19 in svm nnapmsg3-svm device nnapdh1 platform NETAPP
2023-11-17 10:40:58.640
2023-11-17 15:40:58,640 INFO netapp_opmgr.py:961 - Checking if volume, bvs_i19, exists in svm nnapmsg3-svm
2023-11-17 10:40:58.640
2023-11-17 15:40:58,640 INFO netapp_opmgr.py:880 - Finding volume bvs_i19 svm nnapmsg3-svm in host nnapdh1-m
2023-11-17 10:40:58.694
2023-11-17 15:40:58,694 DEBUG connectionpool.py:547 - https://nnapdh1-m:443 "GET /api/storage/volumes?svm.name=nnapmsg3-svm&name=bvs_i19 HTTP/1.1" 200 215
2023-11-17 10:40:58.767
2023-11-17 15:40:58,767 ERROR device.py:134 - Failed when checking if volume, bvs_i19, exists in svm nnapmsg3-svm
2023-11-17 10:40:58.767
2023-11-17 15:40:58,767 ERROR volume_processor.py:34 - Internal error. Call to device-manager library returned: Failed when checking if volume, bvs_i19, exists in svm nnapmsg3-svm(" Caused by <class 'netapp_ontap.error.NetAppRestError'>", "Caused by HTTPError('404 Client Error: Not Found for url: https://nnapdh2-m:443/api/storage/volumes/3bbfa45a-44aa-11eb-acdc-00a098edf6bb'): entry doesn't exist")

When the same script is run in a single cluster environment, no issues like above.

Per NetApp Support, it was confirmed that the GET VOL request was being sent to the wrong cluster. NetApp Support guidance was to open a discussion here for further support.

hollow siren
#

As additional proof that the netapp_ontap library is sending the GET request to the wrong cluster, below is the APACHE-ACCESS.GZ log from cluster nnapdh2 node 01 showing the 404 response that the volume was not found. ONTAP is behaving properly, as the volume UUID it was sent does indeed NOT live on cluster nnapdh2, but rather on cluster nnapdh1 where the GET request was intended to be sent.

https://smartsolve.netapp.com/#cdv?refasup=2023111710060113&section=APACHE-ACCESS.GZ&view=vertical&asupids=2023111710060113

Node nnapdh2-01, ASUP section APACHE-ACCESS.GZ

10.32.32.10 pii_encrypt/MbCdoQVeI/5yabtuBQj2oJpSuHGhVtsHSiiPJN7jrXg=/pii_encrypt 4294967295 admin [Fri Nov 17 15:40:58.719168 2023 +0000] "GET /api/storage/volumes/3bbfa45a-44aa-11eb-acdc-00a098edf6bb HTTP/1.1" 404 96 46207 - 0 + 169.254.52.81 -

The UUID above (ending in df6bb) is associated with volume bvs_i19 that lives on node nnapdh1-09 (not on cluster nnapdh2).

stable aurora
#

Hello team.. Still hoping for assistance on this issue. For more detail. here is a snippet of the code the customer is using

#

It was good talking to you too! Below is the example of our code snippet calling the API. We've tried both netapp_ontap 9.7.3 and 9.13.1 with the same code snippet, but got same error.

from netapp_ontap import config, HostConnection
from netapp_ontap.resources import Volume, Snapshot

def test_method(params):
with self.conn = HostConnection(
mgmt_if, username=username, password=password, verify=False
):
volume = Volume.find(**{"svm.name": self.svm_name}, name=self.volume_name)
logger.info(volume)

with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
print("start")
executor.map(test_method, processed_vols)

And for the "processed_vols" above, it includes volumes from 2 different clusters. Sometimes the volumes from cluster A would be routed to cluster B's REST API, then returned 404.

hollow siren
#

Based on the "with HostConnection" usage above in a multithreaded environment (ThreadPoolExecutor), can someone confirm if this is the same issue identified in our netapp-ontap documentation?

https://pypi.org/project/netapp-ontap/
Host connections
• Use the connection object as a context manager with the with keyword. Note: This method is not recommended if you are using multiple threads connecting to different hosts. This issue is currently being worked on and should be fixed in future releases.

If so, the communities link states to use ".set_connection" as a workaround to the with statement.

Also, any idea when this is going to be fixed in the library ?

#

Feedback from customer Bloomberg on trying the workaround:

we did tried the .set_connection() workaround, while no better to the case. Probably we didn't apply the best practice of the workaround, could NetApp engineers provides a best practice example of the workaround?

maiden stratus
#

I’m fairly certain one of my colleagues observed and wrote something up concerning this same behavior. I’ve sent him a message.

#

Never mind, my colleague was already engaged through another channel. LoL

dusk plover
#

Ah then I just may chime in being the first one that hit the multithreading issue... The thing is, when considering multithreading rest api python client, do please steer away from the 'with restconn:' syntax. It just does not work for multi threaded python clients. Instead, create the host connection, and then, associate it with your object using .set_connection() or for example, as the customer above tried to find the volume: volume = Volume.find(name=volname, connection=restconn).
Also note the customer example is using self.conn which (without knowing the rest of the coded context) may be risky in multithreaded clients, but that is just usual multithreading fun, not ontap rest library related...

uneven lagoon
#

@stable aurora I'm an engineer on the team that owns this library so I can certainly try and help here. I think what Robert said above is important. I see that they're using self.conn. Is it possible that object is being referred to or changed by other threads that are running at the same time? That would cause errors like what they're seeing.

hollow siren
#

Thanks Robert Simac and Twesha. Here is the sample code that this customer attempted with the workaround of not using the "with":

We also noticed this, while I tried the other way like below but got a similar error, this time we multi-threading hits the API, instead of get some 404 error, some of the requests even won't get response. For example, I sent 120 requests with 30 threads, only got 20 response. Other requests got nothing back. Below is the code I used instead to replace the "with":

def test_method(params):
try:
config.CONNECTION = HostConnection(
mgmt_if, username=username, password=password, verify=False
)
volume = Volume.find(**{"svm.name": params[2]}, name=params[3])
if volume:
counter.append(0)
print(f"volume: {params} has response: 200")
else:
counter.append(0)
print(f"volume: {params} has response: 204")
except Exception as err:
print(err)
counter.append(0)

print("--------------------new connection------------------")
with concurrent.futures.ThreadPoolExecutor(max_workers=30) as executor:
print("start")
executor.map(test_method, processed_vols)

If this is a known issue, what replacement or remediation implementation should we use to avoid the 404 or no response issue?

dusk plover
#

Thanks @hollow siren for the sample. It is very likely the config.CONNECTION is to blame for repeated multithreading problem. The thing is... the config.CONNECTION is the 'global' and 'static' variable and each and every thread executing the test_method() is overwriting the very same memory address holding the config.CONNECTION. Our python netapp-ontap documentation is partially to blame here because it suggests the use of config.CONNECTION as preferred way of setting the connection. Instead, each and every HostConnection should be associated to the proper local variable and then associated with the Volume or other objects by means of set_connection() thus ensuring each thread has its own separate connection and no globals are used whatsoever. I will follow up with functional multithreading code example, within this thread, most likely early next week...

uneven lagoon
#

If you set config.CONNECTION, that connection will be used for all actions on resources that do not set their own using set_connection. This is fine in a single cluster context. But, if you have multiple clusters and threads that are setting config.CONNECTION, it could lead to indeterminate behavior.

dusk plover
#

Below is the simple python program demonstrating the multithreaded use of ontap rest python library, and the use of set_connection() and connection as argument. This code is also emphasizing avoidance of global and config variables while multithreading. This code is intended only for learning and demonstration and should not be used in production as such. I am attaching it as file to avoid formatting issues.

dusk plover
uneven lagoon
#

Thanks Robert. I think we can enhance the netapp_ontap documentation to talk about multithreading and include some examples, maybe even this one you shared.

hollow siren
#

Hello. Customer Bloomberg has implemented the suggested workaround and stated that although it works, it introduces a pausing mechanism that introduces up to a 10 second delay when a volume is encountered that truly does not exist.

Due to this, they would like the ETA of when a permanent fix will be implemented in this library for this multi-threaded host connection issue. Thanks !

dusk plover
#

Hmm I don't see how the use of with connection syntax can improve or prevent such delays on particular api call... I mean, this '10 sec delay' issue looks somewhat unrelated... Maybe they can post their code snippets here for review and comments, it may be something else they are now encountering...

stable aurora
#

@uneven lagoon Hello Twesha. Customer would like to know when the fix for this will be available, outside of the workaround that has been provided. We're speaking with this team tomorrow, so I'd like to provide an update then. Can you update us on when the fix will be available?

stable aurora
#

@uneven lagoon Hi Twesha. I'm speaking with my customer shortly. Can you provide an update?

stable aurora
#

@uneven lagoon Hello Twesha. We met with the customer yesterday and they are still asking for the eta for the perminent fix for this. We're meeting with them today and will be again. Please provide an update asap. Thank you

stable aurora
#

Hello team. Can someone provide an update for this, please?

uneven lagoon
#

@stable aurora, I'm going to discuss this with the team internally and get back to you tomorrow AM (01/30). Additionally, please free to reach out to ng-ontap-rest-python-lib@netapp.com for anything netapp_ontap related as NetApp engineers are more active there and we will get back to you faster than on Discord.

uneven lagoon
#

Could we get more details about what the customer's workaround was? Maybe the code snippet? I'm not sure where the 10 second delay is coming from, but it would likely not be related to the library itself.

uneven lagoon
#

@stable aurora We do have a work item to track this issue internally and will prioritize it when possible. The netapp_ontap library releases are aligned with ONTAP releases and the timeline for a fix will depend on other release priorities. We will share a defect ID with a public report that can be shared with the customer for tracking purpose.

stable aurora
# uneven lagoon <@1174400573169221688> We do have a work item to track this issue internally and...

Hello Twesha. Thanks for the response. The customer offered some further detail to our GCE... See below: Re. this particular issue – I want to clarify one thing. The “10 Second Delay” was put into the workflow by Bloomberg on purpose – for the following reason: they were still getting some number of “false negatives” when doing a large number of volume lookups. The 10 sec delay and then running through the script again seems to resolve some (or maybe all) of the false negatives, on the second cycle. Just wanted to clarify that the “workaround” did not cause the 10 sec delay – it seems more like the workaround still sometimes returns false negatives on vol lookups and the 10 sec delay seems to resolve it. Of course, this is not the ideal end state.

#

Hope that helps clarify. Let us know if you need any more detail. We can work with the customer to see if they can provide a snippet, as you requested.

uneven lagoon
#

@stable aurora it would definitely help to see what they're doing or what work around they implemented. I would hate to share a defect and public report with them that doesn't actually fix their problem.

uneven lagoon
stable aurora
#

@uneven lagoon Hi Twesha.. We've received the customers code snippet. I'll forward it to the NG you provided above and reference this convo. Thanks!

hollow siren
#

To close off on this thread, Engineering has fixed this issue under defect CPCL-118, and will be available in netapp-ontap 9.15.1, which will release near the ONTAP 9.15.1RC1 release. Thanks to all who worked on the workaround and fix !

latent cloak
#

Hello all, I am having a rough time attempting to mock FileInfo methods. Are there any examples of mocking the methods properly within the documentation?

While importing, we see netapp_ontap.resources import Volume, QuotaReport, NfsClients, FileInfo
adding a print statement to confirm successful mock:
In fileInfo: <class 'netapp_ontap.resources.file_info.FileInfo'>

I am unfamiliar with how this is working considering the import does not explicitly call file_info.py

I'm guessing there's some magic in the resources.init.py file. I am use to leaving this file empty, but this one has 1062 lines in it. I can only guess it's doing something that makes resources.file_info.FileInfo available for resources.FileInfo…

#

Once I am able to mock the instantiation file = FileInfo(volUuid, path=filePath) this will likely make a whole lot more sense. 🙂

maiden stratus
#

@latent cloak - Please create a new post instead of reviving this old thread with unrelated content.

latent cloak
#

My apologies... I am out of practice with discord.