#AFF A800 move network card to new PCIe slot

1 messages · Page 1 of 1 (latest)

magic trench
#

Hi

We have a 2 node (switchless) AFF A800 cluster running ONTAP version 9.14.1P8.
PCIe slots 2 & 3 are populated with 25GbE Network cards.

We are planning to expand the cluster with additional disk shelf (NS224) with NVMe drives. As per Hardware Universe, the preferred PCIe slots for NVMe adapters (storage expansion) are 5,3. In order to add the new cards, we’ll have to move the network adapter in slot 3 to slot 4 (supported in Hardware Universe).

The network ports from cards in slot 2 & 3 are in one single ifgrp.
Current ifgrp “a0a” : e2a, e2b, e3a, e3b

Can someone please review below procedure to perform this activity:

  1. update lifs to auto-revert false on nodeA
  2. Storage failover auto giveback false on nodeA and nodeB
  3. Migrate all lifs from nodeA to nodeB
  4. Remove ports e3a & e3b from ifgrp - nodeA
  5. Storage failover takeover of this nodeA from nodeB
  6. Halt nodeA
  7. Remove power supply/network/HA interconnect cables - nodeA
  8. Open nodeA enclosure
  9. Move network card from slot 3 to slot 4
  10. Add NVMe adapters in slot 3 & 5
  11. Close nodeA enclosure
  12. Reconnect HA interconnect/network cables - nodeA
  13. Connect network cables to slot 4 (previously connected to slot 3)
  14. Power up nodeA - only CFO aggregate to giveback from nodeB
  15. Ensure nodeA joins back to the cluster
  16. Add network ports e4a, e4b to ifgrp “a0a” on nodeA (MAC address should remain the same)
    Do we need to do anything on the network switches?
  17. Ensure ifgrp is healthy
  18. Giveback SFO aggregate to nodeA
  19. Move lifs back to nodeA

Run same activity on nodeB

livid kernel
#

Looks good to me. Maybe after step 15 ensure ports e4a & e4b still have the correct MTU. But I guess ONTAP would warn you.

olive whale
#

This was already posted in Reddit

I would modfy a bit since there a few known elements

  1. ⁠Do not disable auto-revert on LIFs
  2. ⁠Do not disable auto-giveback
  3. ⁠Remove ports e3a & e3b from ifgrp - nodeA
  4. ⁠ssh admin@service-processor-node-a -> system console (so you can break the boot)
  5. ⁠Storage failover takeover -of-node NodeA (lifs will migrate)
  6. ⁠As nodeA reboots, press control-c to stop boot process (on ssh-to-sp login)
  7. ⁠Remove power cords from Top node (should be nodeA)
  8. ⁠Remove network/HA interconnect cables - nodeA
  9. ⁠Open nodeA enclosure,
  10. ⁠Move network card from slot 3 to slot 4
  11. ⁠Add 100G CX6 cards into slots 3 & 5
  12. ⁠Close nodeA enclosure (note-> it will auto-power on even with two power supplies!)
  13. ⁠Reconnect HA power/interconnect/network cables - nodeA
  14. ⁠Except -> DO NOT connect network cables to slot 4 (previously connected to slot 3)
  15. ⁠Ensure nodeA joins back to the cluster,
  16. ⁠Add network ports e4a, e4b to ifgrp “a0a” on nodeA (MAC address should remain the same)
  17. ⁠Plug cables into e4a/e4b now. (Otherwise you will need to fiddle with broadcast-domains)
  18. ⁠Wait up to a minute and ensure ifgrp is healthy
  19. ⁠Run same activity on nodeB

I do it this way for a few reasons. Let the controller do its thing (auto giveback, auto-revert LIFs). Remove the ports from the ifgrp before doing a takeover. If you plug in the network cables before/during boot, it will auto-create Broadcast-domains Default-1 and Default-2. You will need to delete those to add the ports to the ifgrp. You are supposed to remove the power cords from the node you are working on. The X800 platform has 4 power cords. Two are needed for operation. Four are fully redundant.