How does backstage work on horizontal pods scaling? | Backstage | Page 1

manic wave Aug 22, 2025, 1:47 PM

#

Hi all, I'm reading throughout documentation and I wanted to know more about how does Backstage handles entity ingestion and processing on horizontal scaling using kubernetes.

I did some very simple test where I tried ingesting through the app and I saw both of my existing pods seems to be ingesting. The other pod stopped ingesting when the other one finished it. I was only able to say because looking at the logs, they have different duration of burst and different timestamp. If I concluded wrong, let me know.

I want to know how should I approach this kind of setup, its best practices, and other stuff I needed to know.

river void Aug 22, 2025, 1:52 PM

#

All of the builtin things in Backstage are built with horizontal scaling ability as part of their design. We leverage that heavily ourselves. All of the nodes collaborate to produce results.

#

How, precisely, they do that depende on the plugin/module at hand.

#

The catalog backend itself has a highly efficient processing queue implementation that dynamically spreads load among workers to handle the processing of entities. They do not overlap, and scaling up does not cause issues; it just makes things go faster

#

Individual providers may choose to use the scheduler service, which likewise is a globally collaborative one that makes sure that hosts take turns running tasks without overlap

#

We could never use Backstage ourselves at Spotify unless this was solidly in place

manic wave Aug 22, 2025, 2:01 PM

#

I see, that's a good affirmation.Thank you!
I was worried about what I found out, maybe that also explains the consistent usage of resources among the pods. I thought it was supposed to only run on one pod then just create more pods when requests starts coming in, so load balancer can do its job seeing one pod is handling all of ingestion and processing

river void Aug 22, 2025, 3:06 PM

#

Once you reach larger scales of the catalog you may be interested in https://github.com/backstage/backstage/tree/master/contrib/catalog/read-write-split

GitHub

backstage/contrib/catalog/read-write-split at master · backstage/b...

Backstage is an open framework for building developer portals - backstage/backstage

#

We do that. It is a huge benefit on big installations but not something to worry about when starting out.

#

It lets your API serving nodes be offloaded to do nothing but just that, leading to lower latency and better predictability

manic wave Aug 22, 2025, 3:21 PM

#

oh wow I am not aware of this

#

this is something to consider for us, since we have large scale of catalog now

river void Aug 22, 2025, 3:25 PM

#

Alright! As the article mentions, this is a little cumbersome right now, we wish it were fewer steps. So we'll want to frameworkify that a bit some time down the line

#

We have 2-3 writer nodes (depending on autoscaling) in general, in one region close to the database master. And then a couple of reader nodes in every region globally, obviously fluctuating by load

#

I spoke a bit about this at kubecon if you want to hear about our journey https://youtu.be/anqWhSnN7sA?si=nddeUzVZa4t_Yy1P

YouTube

CNCF [Cloud Native Computing Foundation]

The State of Backstage in 2025

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Hong Kong, China (June 10-11); Tokyo, Japan (June 16-17); Hyderabad, India (August 6-7); Atlanta, US (November 10-13). Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement o...

▶ Play video

#

20-something minutes in

manic wave Aug 22, 2025, 3:32 PM

#

river void I spoke a bit about this at kubecon if you want to hear about our journey https:...

this is a nice addition! we'll definitely look into this
we're facing issues in terms of handling requests in reads for catalog
and one of the factors that is also affecting us is the processing of entities

river void Aug 22, 2025, 3:40 PM

#

Are you on a recent version of it? We also made a bunch of speedups lately. Including in this very last release!

manic wave Aug 22, 2025, 3:57 PM

#

river void Are you on a recent version of it? We also made a bunch of speedups lately. Incl...

on v1.41.1
there's just this one thing we did not use, enableRelationsCompatibility: true this setting
we had to keep this true because there was an event emitter leak issue
but it seems merged now, just not int 1.41

river void Aug 22, 2025, 4:02 PM

#

Ah right. Yeah I fixed that. Weird compression middleware ☠️

#

It REALLY helped our latency metrics to have that streaming. It takes a shocking amount of time to json serialize large payloads

#

And you start hitting the 750 MB limit after a while

pure vault Aug 25, 2025, 7:38 AM

#

river void The catalog backend itself has a highly efficient processing queue implementatio...

Hello @river void is it possible to monitoring the status of the queue ?

river void Aug 25, 2025, 7:41 AM

#

Sure, do explore all of the otel metrics that it exports

#

for this one you may want to look at catalog_processing_queue_delay* and catalog_stitching_queue_delay*

river void Aug 25, 2025, 8:12 AM

#

Those should both be at fractions of a second, except in exceptional circumstances

#

If they ever go high and stay high, you're in trouble because your system is underpowered and can't keep up

#

It can be OK for short bursts (like when ingesting very large new datasets etc) but under normal operation

#How does backstage work on horizontal pods scaling?