#Expanding with new SG nodes?

1 messages · Page 1 of 1 (latest)

tawny vine
#

We have a customer with a 3 node SG with 3 x SG6060 (E-Series in Duplex + Compute Node) (8TB disks) EC 2+1 . The customer would like to expand with two nodes and convert to EC 4+1.
Despote an EOS in December 2024 we cannot quote the SG6060 that match their existing setup, so we have to use the newer nodes, wither the SG6160 which is similar to the SG6060 or the SG5860 which has the compute node build in, with the disadvantage of a simplex E-series setup... This setup is used for mainly backup workloads. Would it make sense to go for the SG5860 and save a bit of cache? Or maybe even just expand the 3 existing nodes with an additional disk shelf 🙂 Suggestions are very welcome...

pseudo kelp
#

I would simply go ahead and add the new SG6160 systems into the existing grid.

tawny vine
river mantle
#

Changing the EC Scheme will take time. It can take a long time. I've got a customer doing that now and this one will take over 200 days (many, many PBs of data).

Simplest expansion, if capacity is all that is needed and performance is fine right now, is to add expansion shelves. The capacity is available to the new nodes in minutes and you dont need to touch existing object data.

But if they want to switch to 4+1 because of the reduced overhead then I guess that makes sense. but will take a lot of time and compute/storage resources whilst the grid is doing it.

Also note that best practice for number of nodes for EC is your scheme, plus 1 more node. With 5 nodes and using 4+1 you can't satisfy the ILM Placement for new writes if a node is down, so everything will fallback to dual commit until it can be placed later. And dual commit is 100% overhead instead of 50% with 2+1 or 25% with 4+1.

tawny vine
#

Is there an estimator of EC conversion? Because if converthing the existing 7-800GB takes half a year, I think expanding the shelf it the option 😉

somber pulsar
#

especially the in-memory cassandra loves DRAM and SSD Read Cache, so don't go with the smaller SG5860 nodes

#

pricing is mainly capacity-oriented, so always go for more nodes

#

you can take a look at how long your ILM scanner takes to get a rough estimate how long converting all the data might take

somber pulsar
#

HWU has all the gory details, obviously (if you need the exact CPU type on SG6060, I can look it up on monday):
SG6060 - Xeon Skylake (2 sockets * 20 cores @2.4GHz) and 192GB DRAM
SG5860 - Xeon D1718T (4 cores @2.6GHz) and 64GB DRAM
SG6160 - Xeon Gold 5318Y (2 sockets * 24 cores @2.1GHz) and 256GB DRAM

Long story short: You will not be happy if you mix SG5860 with your existing SG6060 nodes

tawny vine
# somber pulsar HWU has all the gory details, obviously (if you need the exact CPU type on SG606...

Yeah I agree that the 5860 is off the table. We are now waiting for price quotes on SG6160 as well sa three DE460C shelfs to be added to the exiting setup. And I think it will end up with the latter option, because you actually get more space out of it where you go from 1.1PB to 2.2PB and with the two additional nodes, even with the EC4+2 we do not get about 1.8PG out of it.. and we need to convert everything 😉 It also looks like 3 shelfs are cheaper than two new SG6160 nodes.. but lets wait and see... we have been wrong about NetApp pricing before 😉

somber pulsar
#

Would be great If you told us all what your final upgrade decisions will be👍

river mantle
#

I just deployed 12 x SG6160 nodes (6 per site) for an existing customer that was runinng Baremetal Grid Nodes, moving everything from EC8+2 to EC4+1 per site. The SG6160's dont even break a sweat. Their CPU usage and Disk Read/Write Latency are phenomenal. I love that the SSDs for caching have been moved to the internal drive bays of the Compute Server, so you actually get all 60 drive backs in the E4000 Shelf usable for object storage.

tawny vine
past hamlet
# river mantle I just deployed 12 x SG6160 nodes (6 per site) for an existing customer that was...

I envy you
I am decomissioning some VM nodes being replaced by hardware (still bare metal) and the last vm node is stuck due to some data still being managed by EC profile that is not in use anymore; and I cannot deactivate it because of it.
So support is having me run a script which has been running for a few months now on one node.
At this rate; assuming the script fixes the issue; i wont be able to decomission the last node until late next year. (also assuming it will take as long on the rest of the nodes)

Currently im migrating the last vm over to ssd's but it doesnt look like its helping so far. (most of the storage devices have been migrated to ssd but a handful are left)

river mantle
#

I assume you're running the ec_leak script? I've had to run that on most decommissions unfortunately.

past hamlet
#

yep

river mantle
#

VM nodes max out, supported config, at 16 x 39TB VMDKs. How much data is remaining on those nodes? I've recently started using new Storage Grades, Storage Pools, and ILM Policy updates to evacuate data from storage nodes before initiating a decommission. This doesn't really help now, but I'm definitely using this method wherever possible as ILM has a higher priority within the Grid than a Decommission and completes faster.

We had success using the ec_leak script with playing around with the ILM scan rates and deleteQueueWorkers in the API Advanced settings (best to get assistance from Support before touching these). We found that unless a Virtual Storage Node has 192GB of RAM the default for deleteQueueWorkers is 1, which is way too low. The setting stays at 1 until the virtual node has 192GB RAM when it jumps to 1000 (the same value as a SG6060 appliance; the SG6160 is even higher again) and made a huge difference to the ec_leak rate. We increased all of the Virtual Storage Nodes to 64GB RAM and found they could run the deleteQueueWorkers at 1000 without a performance drop. We kept close eye on the CPU IOWait and Disk Read Latency metrics and kept tweaking until we saw metrics rise to unacceptable levels and then backed off, to find that balance.

past hamlet
# river mantle VM nodes max out, supported config, at 16 x 39TB VMDKs. How much data is remaini...

Interesting! this node has 14x 8TB disks, 8TB was the max size for disks the hypervisor could create.
CPU is very high due to iowait, which is one of the reasons im moving it to SSD. But so far not seeing any perceivable differences for the rangedb's that have been migrated.

I might try setting the RAM to 192G to see if that helps; this is the last vm on this host so there should not be any resource contention.

Can you elaborate on using storage grades, storage pools to evacuate data?

I was thinking this morning of adding a smaller storage node to the cluster with minimum required storage, to satisfy the "unused" EC policy and decomission the older node. I'd probably still need to run the script afterwards but at least we could get it off the old hardware.
But I don't know if that would work?

river mantle
#

By default all storage nodes are assigned the "default" storage grade, and the default storage pools use "all storage grades" as their filter. So any new node added to a grid site is added to that pool, so is immediately used in ILM placement. You can't manually select the nodes in a pool, but what I do is this:

  • Edit the Storage Grade on the new nodes I want to move all data to (can be anything, only the grid admins see it, it just has to be something different). This doesnt break the existing storage pools using "all storage grades"
  • Create a new Storage Pool for each Grid Site, and select Storage Nodes and the new Storage Grade (only the new nodes will be added to this new storage pool.
  • So far nothing has changed ... ILM is working as before
  • Create new ILM Rules (can be the exact same as before for the most part) but just change the storage pools used to the new ones
  • Apply the ILM Policy

The ILM Scanner will now start picking up objects and placing them as per the new ILM Rules. Its faster than using a decommission, and out of about 30+ decommissions I've run over the years, it doesnt need support calls. Most decommissions I've run need some level of involvement with Technical Support because they always fall over.

past hamlet
somber pulsar
river mantle
#

Yeah a lot of customers are coming up to that stage of the lifecycle for their grids, so anything we can do to improve that process of evacuating nodes before we run the decommission is great. You can also engage Technical Support and there are a few levers they can pull to adjust scanners and worker queues to improve things. Always worth asking the question as we've seen some great improvements moving data around using various settings/configs depending on the customer's Grid environment.

tawny vine
tawny vine
#

So just the last update on this... we just added the new shelfs to the storage grid... finally the packaging of the drives are now better so it takes "no time" to load the 60 drive slots... (on other systems I have installed each disk was packaged seperately which made the installation a pain...) The most time on this installation was spend waiting for the nodes to boot into maint.mode, and expanding the storage... a online expansion option would be nice here... but the process was vary smooth... the node picked up the new shelf and gives you the option to expand the node storage... then after expansion it reboots as a normal node... very nice 🙂

river mantle
#

@tawny vine awesome update, thanks for the feedback.

I remember installing over 8,000 HDDs for a supercomputer back in 2019 and every drive for a DE460C shelves was individually sealed in an anti-static bag. That broke me hahaha