We have done a basic setup with a test keycloak instance. Few testusers and groups have been created.
When starting I see the keycloak provider logs, it shows the users and groups.
In the refresh_state table I see these users and groups as well. They have a value for unprocessed_entity however the processed_entity column stays empty. Nothing in the error column either. This happens for all rows in refresh_state table.
The final_entities table is empty. There are no errors. With knex on debug loglevel I don't see any queries doing an update on processed_entity column.
I'm trying to get a better understanding how this refresh_state table flow works and what could explain why they never get processed. And obviously ultimately find what the issue is.
#processed_entity column stays empty for all rows, none in final_entities, no errors in logs
54 messages · Page 1 of 1 (latest)
That's odd. Never heard anything like it. Sounds like the processing loop isn't doing its thing at all.
Are the entries having a next_update_at column that is some time near now or in the past?
Only thing that comes to mind is, what if there's some weird time zone thing where the update column is set far in the future so the system thinks there's nothing to do yet
You could try setting them all to like 2024 or so to see if that kickstarts things
Yes the entries do have a recent next_update_at time. These times also seem to be updated which, if I understand correctly, happens once the row is looped over.
the next_stitch_at and next_stitch_ticket are also null.
Can you set the result_hash column to the empty string? I just want to check if that's what actually happens - that the input DID change, but it's being rejected because the engine doesn't think its hash changed
the result hash is already empty.
Oh. Hmmm
That doesn't sound like it succeeded at all then
This is a bit of a mystery I must say
Never seen anything like it
But it does advance next_update_at regularly, and doesn't log any errors while doing its processing
Correct
Do you have any custom processors?
I thought maybe there's a processor that actually never finishes - just hangs
If that happens, it'll probably keep retrying over and over while leaking more and more memory
How would I be able to figure that out? Note that I'm on developerhub. Not sure if that has custom processors.
Is there a way to get a full query that is executed on the database? My guess is some query is retrieving all entities and then loops over them but the query returns an empty resultset (without throwing an error).
In the debug output I do see the user being processed:
Processing user:default/testuser ...
Followed by the next entity... without anything else.
I mean that does seem like it just gets stuck. Starts - but doesn't conclude
The log line gets emitted in here https://github.com/backstage/backstage/blob/58b6b933feafe27505617dbcebea724a13de1321/plugins/catalog-backend/src/processing/DefaultCatalogProcessingEngine.ts#L155
then immediately goes into a try/catch whose catch clause does a logger.warn https://github.com/backstage/backstage/blob/58b6b933feafe27505617dbcebea724a13de1321/plugins/catalog-backend/src/processing/DefaultCatalogProcessingEngine.ts#L494
Isn't this something that you can reproduce when running locally?
what do you mean with locally?
I don't know what your setup is like, but typically to get started, people run our create-app script to get a repo created that you then develop within. I meant being in that repo and starting the backend there from code
on a developer machine
you mentioned developer hub - i don't know how that is set up
Developerhub is setup with an Operator in Openshift.
What we did notice is that this setup does seem to work with a "local" database.
so when we connect to postgres locally it works. when we do the same with a "remote" postgres (EDB) it fails. No errors are shown on the db side either. Really bizar issue
I was expecting when I set the logging to trace that I would see a SQL update for the processed_entity column. However even in the setup that works there's no such insert/update sql to be found in the logs for processed_entity column. So can't make any conclusions out of that. The insert is visible when I add a user to keycloak for the refresh_state. The select for processing based on the next update time is showing. But nothing that updates the processed_entity column. (altough the user has been added in the working setup and is in the final_entities table)
I'd like to be able to see the exact query that's being executed to update that column.
It may be possible to pass the DEBUG env var ro the running container to get knex to noisily log everything it does https://knexjs.org/faq/#how-do-i-debug
but it may be a bit of a torrent
I already have that going on. It's from these logs that I see the select/update/insert into the refresh_state table. However at no point did I see a query that updates the processed entity column.
e.g.
knex:query update "refresh_state" set "next_update_at" = now() + interval '140.1608230426838 seconds' where "entity_ref" in ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) trx703```
It's just very odd that in a working setup I do see the processed_entity column has a value, however the actual query is still not visible in the logs. I don't know why this query would not be visible and the others are.
might be related to https://github.com/backstage/backstage/issues/31753. we face the same thing.
to be clear: we don't have the processed entities yet. So it's failing before the stitching.
So I found the root cause. For some reason the databases created have DateStyle Redwood by default. This returns NOW() and CURRENT_TIMESTAMP as 24-DEC-25... instead of 20251224...
I changed the DateStyle to ISO and it works
Seems to go wrong here: plugins/catalog-backend/src/processing/DefaultCatalogProcessingEngine.ts
logger.info(`DEBUG: nextUpdateAt ` + JSON.stringify(item));
const seconds = -item.nextUpdateAt.diffNow().as('seconds');
promProcessingQueueDelay.observe(seconds);
processingQueueDelay.record(seconds);
}```
The nextUpdateAt field was always "null" (note that it does go into the conditional and prints the item). Oddly at no point there was a visible error for this.
Wooow
That was pretty unexpected - thanks for going the extra mile and figuring that out
What db was this?
Do you happen to know what we might have done to circumvent this? Where is even that default specified
Database is EDB (basically postgres). SHOW DateStyle; shows the current style. No idea why it was set like that.
called datestyle in the postgres.conf config file
Here's the row that creates pg databases.
I wonder if there's another command (or options to that command) that would help. I have no experience with edb but it's unexpected to me that it comes back in string form in the first place honestly
Maybe we could follow up with a ALTER DATABASE "..." SET datestyle TO "ISO, DMY", if I'm reading the docs right
Pretty wild that the default isn't ISO honestly - I suspect you may have a centrally modified postgresql.conf that had this modded