#Order of catalog processors and database migrations in new backend system

17 messages · Page 1 of 1 (latest)

sharp crypt
#

Hey y'all. After a migration to the new backend system I have 2 issues I cannot solve.

  1. I want to use addProcessor within' custom extension for catalog, the code goes as on attached picture, please don't mind red highlight, its just for post reasons.
    There are places where for example Validator2 should get the data that is processed with Validator1. In such cases when the Validator1 data is not yet processed I get the errors saying it couldn't find the data. Is there any solution to run those processing in specific order and wait for the task to be completed OR rerun the processing on some failure?

  2. Beside the processing I have another module to run the db migrations:

import { coreServices, createBackendModule } from '@backstage/backend-plugin-api';

import { applyDatabaseMigrations } from '@internal/plugin-plugin-catalog-backend-backend-module-dv-views-module-backend';

export default createBackendModule({
  pluginId: 'catalog',
  moduleId: 'migrations',
  register(env) {
    env.registerInit({
      deps: {
        database: coreServices.database,
      },
      async init({ database }) {
        const client = await database.getClient();
        if (!database.migrations?.skip) {
          await doSomeDBMigration(client);
        }
      },
    });
  },
});

when I import the extensions to the index.ts files they are places one under another, the custom-extensions goes first and migrations goes second. Yet still migration fails due to not having the catalog items ready. As above is there any option to run those in some order?

knotty ocean
#

Plugins and modules are all started up concurrently for performance reasons; that way a single misbehaving thing cannot block the startup of everything else. They should be written in such a way as to be resilient against that fact

#

Regarding processors etc: At catalog startup, all of the registered processors get built into one single chain. Every time an entity gets processed, ALL of those processors are run in the order that they were registered, without exception. Then when they all have had their shot at doing what they need, the resulting entity is saved again.

sharp crypt
#

Thanks @knotty ocean! Are there any available exaples on how to avoid such problems?

knotty ocean
#

It would all be case specific. For your migration, it may not make much sense to have it done in a separate module. Why doesn't the backend that needs it, do the work itself?

sharp crypt
#

I think the most concering issue I'm facing now are processors. What I mean is that I have defined multiple entities that are fetched. Most of them are components with annotations creating relations to contracts. In validators I check if the contracts connected to components are valid, but usually the are still not present in the app. That results in failure of processing.

That's why I asked before is there is some way to wait until everything is in place in order to get the all data right and process it respectfully.

#

I might be missing something as I'm new at Backstage

knotty ocean
#

The catalog is eventually consistent. Clusters of machines asynchronously collaborate to do work. You basically cannot put strict validations on fields. Not just for technical reasons, but for human user experience reasons as well

#

My suggestion is, stop throwing in processors for things you deem to be bad relations

#

Just leave them be

#

In the core catalog model, relations are meant to be "lazy" and intentionally permitted to be dangling

#

But we keep track of misconfigurations still, by external means, and also let users know that they have issues with their metadata by showing popups on their entity pages nudging them to fix their files

#

This may sound odd, but it's a hard learned lesson from reality

#

If you put hard stops on validation, things will grind to a halt and you will have constantly frustrated and confused users

#

Especially since a thrown exception HALTS processing, and often the type of error you are talking about, arises for reasons that are NOT the entity author's fault

#

Let's picture a hypothetical scenario. I make an entity. It has a relation to a domain. My entity is super important and I build systems that query the catalog and those expect that entity to have up to date information in it. Everything is fine, but a year down the line someone else at the company slightly rearranges the domains behind my back. Suddenly validation of my entity starts exploding, again behind my back. Changes to my yaml file are no longer reflected in the catalog because they never make it past validation, but I never notice. After several more months systems start failing as an effect of this

#

So the insight to gain here is, we only throw on HARD errors - errors that are so catastrophic that the data is completely garbage and would probably cause readers of the data to not understand it at all. Those cases are much much fewer, than the type of soft errors where an operator could go and fix it up in their own time later on