#Gateway or not?

1 messages · Page 1 of 1 (latest)

lone sand
#

As you know, it is possible to write data directly to any of the storage nodes, and/or you can setup gateway nodes. In a local setup I would think that a DNS-RoundRobin setup writing directly to the nodes, would make the most sense? Rather than pushing data through a gateway? Am I right in my assumptions, or would a gateway node optimize anything? This is still a small setup with 3 nodes and EC2+1. nodes are SG5760 so 58 disks per node (and a few SSDs for cache)...

amber garden
#

I think you will get into trouble if one of your appliances is down because then S3 access through that node will not be possible and client access will fail... I would always use gateway nodes (or any other form of loadbalancer) in front of the grid nodes

#

gateway nodes have a floating IP that's always available

lone sand
#

But DNS-RR is really much like a load balancer, only thing is that the traffic reaches a node directly, which must be better... I am of cause aware that this has technically nothing to do with load balance, because you just pick one of the nodes by random more or less.. but because we only have a few backup servers dumping data into the grid, I would think it would be more efficient this way, instead of a gateway which can become a bottleneck? And if a node is down, I guess you could just remove it from the DNS record... but to be honnest I don't even know how often a backup application like Veeam actually will pick another IP from the DNS, maybe it just picks one as the first dns lookup, and uses that until the server is rebooted... Maybe I'm not giving the gateway nodes enough credit? Should you not worry about congestion of the gateway node? Does it just pass the traffic along without processing anything?

red mango
# lone sand But DNS-RR is really much like a load balancer, only thing is that the traffic r...

There is TTL for DNS entries and they are often several minutes or more so it's likely a lot of connections would fail while a storage node is down.
Load balancers - be it StorageGRID gateway nodes or external 3rd part load balancers will do the job properly and detect an outage pretty much immediately and stop sending traffic to a down node.
Most common way of implementing load balancers is to set them up as gateways. Some can do "Direct Server Return" but that is not supported by StorageGRID. Also that is only really efficient for GETs (like if you are a webserver). For PUT it's only the ACK back after writing that would be a direct return.
One thing StorageGRID gateways bring is high availability - you use a virtual IP that can fail over between gateway nodes.
StorageGRID gateway nodes are also relatively inexpensive to try out if you can run them virtualized. Just the cost of running a couple of extra VMs.
Out HW gateway appliances are very performant and can front a lot of storage nodes before having to add more gateways.

amber garden
#

usually the OS will cache the first DNS name it gets and never expire it until a reboot. And since the data in the LB never reaches the disk it won't be a bottleneck unless you only connect it with 1 GbE link or something like that

red mango
amber garden
#

if you don't want to spend the money for a dedicated SG gateway node, at least use a proper load balancer like HAproxy or similar. Simple to set up, and even as a VM have a couple hundred megs per second throughput which shouldn't be a problem (depending on your backup volume of course)

amber garden
red mango
#

ONTAP Fabricpool can actually use all IP addresses in a DNS response and load balance between - won't help you with a down storage node without load balancers but with multiple vitual IPs on your load balancers it's pretty cool

amber garden
#

Most applications are written under the (obviously wrong) assumption that DNS is static 🤷‍♂️

left swallow
#

HAProxy is a really simple solution to setup and it doesn't require a lot of ressources. A linux VM with 10G connectivity, 4 cores and a few GB of memory will push 10Gbit/s

lone sand