#Event publishing catalog

120 messages · Page 1 of 1 (latest)

steel star
#

Hi all,

In backstage event approach has been redesigned from version 1.24.0 onwards, so the documenation for event approach is correct for legacy systems? I am trying to create catalog using webhook events using my custom plugin but it is not creating anything using events.publish(). @clear pike could you please help me on this

clear pike
#

Can you share more info about your setup?

#

the new implementation is backward compatible.

There are two options:

  1. you only use EventBroker everywhere.
  2. you have a mixed setup

(2.) is more likely. As at the old setup, you need to have the shared context.
DefaultEventBroker will create a DefaultEventsService instance internally.

However, you can also create an instance externally and pass it as option when you create the DefaultEventBroker instance.

For the new backend system, you don't really need to worry about it.

For your own custom plugin that still require EventBroker you can have a dependency to the eventsService and create a DefaultEventBroker as part of your module/plugin initialization code.

For the old backend system, you should create the shared instances of EventBroker (if still needed) and EventsService for the plugin env.

#

without more details, I can't make more specific recommendations though

steel star
#

Hi @clear pike thanks for the quick response. I didn't use the EventBroker as when I called it is showing as deprecated.

#

I will share my plugin code and the steps I followed till now

steel star
#

app config.yaml file

#

events:
http:
topics:
- github
modules:
github:
AzureSbq:
azureSbqConsumingEventPublisher:
connectionString: ${SERVICEBUS_CONNECTION_STRING}

#

catalog:
providers:
github:
# the provider ID can be any camelCase string
providerId:
organization: 'consumer-tech' # string
catalogPath: '/catalog-info.yaml' # string
filters:
branch: 'main' # string
repository: '.*' # Regex
schedule: # same options as in TaskScheduleDefinition
# supports cron, ISO duration, "human duration" as used in code
frequency: { minutes: 30 }
# supports ISO duration, "human duration" as used in code
timeout: { minutes: 3 }

#

my plugin I am printing the message and it is showign payload(terminal)

#

currently I am using backstage new approach in backend

#

import { createBackend } from '@backstage/backend-defaults';
import { eventsModuleGithubEventRouter } from '@backstage/plugin-events-backend-module-github/alpha';

const backend = createBackend();
backend.add(import('@backstage/plugin-app-backend/alpha'));
backend.add(import('@backstage/plugin-catalog-backend/alpha'));
backend.add(
import('@backstage/plugin-catalog-backend-module-scaffolder-entity-model'),
);
backend.add(import('@backstage/plugin-catalog-backend-module-github/alpha'));
backend.add(import('@backstage/plugin-catalog-backend-module-msgraph/alpha'));
backend.add(import('@backstage/plugin-events-backend/alpha'));
backend.add(import('@backstage/plugin-scaffolder-backend/alpha'));
backend.add(import('@backstage/plugin-auth-backend'));
backend.add(import('@backstage/plugin-auth-backend-module-microsoft-provider'));
backend.add(import('@backstage/plugin-auth-backend-module-github-provider'));
backend.add(import('@backstage/plugin-techdocs-backend/alpha'));
backend.add(import('@internal/plugin-events-backend-module-azure-sbq/alpha'));
backend.add(eventsModuleGithubEventRouter());

backend.start();

steel star
#

hi @clear pike could youplease review and suggest the changes

clear pike
#

I'm having a look

#

btw: you can use code formatting e.g. using three ` at the beginning and end.

#

I will assume that the GithubEntityProvider setup and config works and that you have full refreshs from time to time leading to ingested entities.

#

the formatting and the indentation of your config is a bit off. Code formatting would help to check it better.

#

What I see is that you use events.http.topics with one item github. This will activate the HTTP-based ingestion, creating an endpoint that could be used as destination at GitHub webhook subscriptions. Received events would be published using the topic github.

#

Not sure whether you actually need and want that. Just for your awareness in case this was not clear.

We have this disabled. However, it depends on your circumstances and use cases.

#

Also, I see that you use events.modules.AzureSbq.azureSbqConsumingEventPublisher as config root for your module configuration.

There is no issues with that. However, I would recommend to use the pattern events.modules.{module-name} which is azureSbqConsumingEventPublisher in your case.

Again, it does not cause any harm if you do it differently, though.

#

Due to the formatting, it is not very clear whether events.modules.github is configured properly. If you have no value for it, you can remove it.

It is used by the eventsModuleGithubWebhook module that you don't use as of the code snippet. The event router module does not need any config.

I would suggest to remove it completely not cause any issues with the Yaml file, or cause confusion.

#
  import { eventsModuleGithubEventRouter } from '@backstage/plugin-events-backend-module-github/alpha';

- backend.add(eventsModuleGithubEventRouter());
+ backend.add(eventsModuleGithubEventRouter);

Not sure if this is already doing the trick for you.

If the event router is not set up properly, it will not route the events from the generic topic github to the more specific topics like github.push that are then used by subscribers like the GithubEntityProvider.

The rest of your index.ts seems alright.

#

You have the logger available at your publisher.


- console.log(config.connectionString);
+ logger.debug(config.connectonString);

// ...

- console.log('eventPayload', eventPayload);
+ logger.debug(...);

Maybe, with a different log level.

#
- metadata['X-GitHub-Event'] = 'push';
+ metadata['x-github-event'] = 'push';

The event router reads the metadata in lowercase.

This could result in events not being routed to the sub-topics.

#

The AwsSqsConsumingEventPublisher implementation is a bit more generic compared to your AzureSbqConsumingEventPublisher. Might plan to change this still or not. The name does not really suggest it to be tied to GitHub webhook events.

At the AWS SQS implementation, we use message attributes and pass them on as metadata fields of the event params.

Not sure which options you have with Azure SBQ.

You could consider adding a wrapper to the event payload if there is no other option.

#

Besides it creating a single instance vs potentially multiple at AWS SQS, your implementation looks fairly similar.

#
              /**
               * (Required) Queue-related configuration.
               */
              queue: {
                // [...]
                /**
                 * (Optional) Wait time when polling for available messages.
                 * Default: 20 seconds.
                 */
                waitTime: HumanDuration;
              };
              /**
               * (Optional) Timeout for the task execution which includes polling for messages
               * and publishing the events to the event broker
               * and the wait time after empty receives.
               *
               * Must be greater than `queue.waitTime` + `waitTimeAfterEmptyReceive`.
               */
              timeout: HumanDuration;
              /**
               * (Optional) Wait time before polling again if no message was received.
               * Default: 1 minute.
               */
              waitTimeAfterEmptyReceive: HumanDuration;
            };

from the config.d.ts of the AWS SQS implementation.

The timeout value is also crucial and needs to be big enough.

Currently, you have it hardcoded to timeout=300 seconds, waitAfterEmptyReceive=1 minute=60 seconds. There is no queue.waitTime configurable. Likely, there is still such a value at Azure SBQ.

#

Let me know, if the suggested changes helped you.

steel star
#

Ok I will update my code with your suggestion & test. I will get back to you about the result. Thank you so much

steel star
#

@clear pike from my module I am calling this.events.publish() with github webhook payload (push event) so it will immediately update in catalog listing right. in my plugin I enabled logger and it logging the payload and calling the this.events.publish()

#

but not updating anything in catalog listing

#

Also noticed after the core upgrade and follow the configuration now rate limit error is occuring

#

I am trying to build the webhook approach to overcome the rate limit error

clear pike
#

a backstage upgrade itself shouldn't really cause rate limits to show up I guess. However, I lack details about your setup to really be able to make an opinion here. At least, I'm not aware of a change that would cause higher API rates. 🤔

#

overall, it sounds like the webhook event to event publishing in Backstage using Azure SBQ works, right?

#

What you see is the event being published, however, you miss the impact it is supposed to have at the GithubEntityProvider, right?

#

At the config for the provider you have shared, I see a few filters. These would also apply to event-based updates.

#

also, could you share how you send the event? What do you see in the logs?

#

in case you didn't do the adjustments suggested above, please try first with them applied. Might be that these were causing the events not to end up at the right topic, or similar.

steel star
#

Sure I will send you the detailed pluign and approach tomorrow

steel star
#

Hi @clear pike , good morning.. I noticed few progess in the ingress but small query. Using this event approach(AWS SQS plugin) whether the catalog entity will update into the list immediately or it has to go through the next scheduler. This is the response in my terminal

#

1] head_commit: {
[1] id: '5b0be01d4b04ab07214379836d34f4b80f1373e9',
[1] tree_id: '46e7abd8fa3d8e343f75cbd817ecba784236a298',
[1] distinct: true,
[1] message: 'Update catalog-info.yaml',
[1] timestamp: '2024-05-15T13:41:18+05:30',
[1] url: 'https://github.com/consumer-tech/bkstg-event-webhook/commit/5b0be01d4b04ab07214379836d34f4b80f1373e9',
[1] author: {
[1] name: 'Jiyo Mathew',
[1] email: '[email protected]',
[1] username: 'jiyo-x-mathew_corpnet4'
[1] },
[1] committer: {
[1] name: 'GitHub',
[1] email: '[email protected]',
[1] username: 'web-flow'
[1] },
[1] added: [],
[1] removed: [],
[1] modified: [ 'catalog-info.yaml' ]
[1] }
[1] }
[1] 2024-05-15T08:11:26.711Z catalog info Processed Github push event: added 0 - removed 0 - modified 1 t

#

2024-05-15T08:11:26.711Z catalog info Processed Github push event: added 0 - removed 0 - modified 1 target=github-provider:providerId so it passed to the next target is this correct or something wrong in my configuration

clear pike
#

based on the log messages, it seems like the event-based update worked fine

#

modified 1

#

basically, it executes
this.connection.refresh({ keys: [...] })

steel star
#

but it is not updating immediately in catalog, I felt it's updating after the next scheduler from github provider. Is it call all github repos or only webhooks one

clear pike
#

The event-based handling only calls refresh for the updated ones identified through the event.

#

The refresh takes uses keys to identify the changes. You could verify at your DB that such keys are used. However, as the origin is the same entity provider, I would assume it is the case.

#

Based on the previous description, the issue seems not to be within the event setup.
Also, it seems that refresh was executed for related keys.

#

refresh schedules the next refresh of the Location entity to be "now". Hence, it is supposed to happen nearly immediately.

#

I would suggest to increase the time between the general full refresh runs to see whether the update is caused by that or by the event-triggered refresh.

#

also, you could increase the logging for the GithubUrlReader. However, not sure how much it logs and if it can really help you. Otherwise, (local) debugging might help as well.

#

Maybe, there are others with more insights on the GithubEntityProvider or the internal refresh loop, etc.

#

I cannot really see an issue based on the information I have.

steel star
#

Ok, thank you I wil test and confirm

steel star
#

Hi @clear pike , actually app config has this configuration providers:
catalog:
github:
# the provider ID can be any camelCase string
providerId:
organization: 'consumer-tech' # string
catalogPath: '/catalog-info.yaml' # string
filters:
branch: 'main' # string
repository: '.*' # Regex
schedule: # same options as in TaskScheduleDefinition
# supports cron, ISO duration, "human duration" as used in code
frequency: { minutes: 60 }
# supports ISO duration, "human duration" as used in code
timeout: { minutes: 3 }

clear pike
#

I assume you refer to the frequency of 60 minutes, right?

steel star
#

Yes

#

so this will import all repos under this org without any event trigger right

clear pike
#

be aware of that this is not the config for the general processing interval that refreshes Location entities (refreshing loop)

#
catalog:
  processingInterval: ...
#

e.g.,

catalog:
  processingInterval: { hours: 2 }
steel star
#

so no need of schedule: frequency

clear pike
#

you still need it. Those are two different things that get affected here

#

however, both have an impact on what you want to test

steel star
#

My first question I don't want to import all repos base don teh default refresh. I want to do only using webhooks

#

to avoid the rate limit

#

can we have a quick call to understand

clear pike
#

sure, that's why we use the event-based updates.

#

However, usually you still need the initial state, the initial full refresh

steel star
#

Now i observed it created catalogs in my dev instance without sending teh event

clear pike
#

currently, there is no way to disable the full refresh entirely. You can only increase the timespans between the runs.
It may also be good to have from time to time to auto-heal on missed events, etc.

#

e.g., you could consider something like

          frequency: { cron: '0 10 * * SUN' }

for a weekly full refresh.
Or you do it monthly or even less often.

steel star
#

ok so you mean shgecule frequency will be onc ein a day like taht to increase limit and rest all will trigger based on the webhook?

clear pike
#

maybe some more context:

steel star
#

frequency: { cron: '0 10 * * SUN' } I have to add this only in app-config.yml and no other settings in server right

#

Timeout I set as 3minute is that ok or need to change that as well?

clear pike
#

The GithubEntityProvider and similar ones take care of discover any will use the APIs by the related system to find catalog files and register them.

For that purpose, they create Location entities with the target URL pointing to one (or more) catalog file(s).

Depending on the what is possible with these, they usually follow one of the following practices:

  • create Location entities only for existing catalog files
  • create Location entities for potential catalog files marked as "optional"

Additionally, they schedule a refresh for these immediately (process the Location entities).

The frequency of the full discovery (checking all repos, etc.) is managed by

catalog:
  providers:
    theProvider:
      providerId:
        schedule:
          frequency: ...
          ...

Additionally, you have the processing loop that will process Location entities. As part of that, the catalog files gets fetched and processed and contained entities will get created.

The frequency of the processing of Location entities can be controlled using

catalog:
  processingInterval: ...

(default is 30 mins I think)

The (re-)processing for individual Location entities can be scheduled which is used by the event-based approach. E.g., if we identify that an existing catalog files was modified (or potentially modified).

Also, you can trigger the refresh manually via the UI at the entity pages (refresh button). This only works for entities that originated from a Location entity.

Both parts will cause API requests and will consume budget from the rate limit.

#

re timeout:
The timeout is for the task itself. It should finish within this timespan or it will be stopped/cancelled. 3 mins should be fine usually.

However, you have the flexibility to adjust this as of your needs. E.g., if you have a lot of data to process, you might need to increase it, etc.

#

PS: Usually, the processing of Location entities includes a check for whether the file changed or not (e.g., via the ETag). API requests should be minimal.

However, for big setups this may still be an issue.

#

Neither of these you can disable fully. You can only execute it less often.

clear pike
#

for the test, I would suggest to set really high values for both, create an change causing an event, and see whether the change gets reflected timely

dull marlin
#

Hi! You reference the "documentation for event approach". Where can I find that documentation?

clear pike
#

@dull marlin usually, each entity provider has a section "with event support" and "without event support"

#

a more holistic/overview documentation still has to be created. I will take care of that.

copper timber
#

I also have the same problem regarding bitbucket events, since the catalog *(which is not very large), gives a 429 error for too many requests.

Although we still don't have documentation.
@clear pike Could you guide me on how to configure it in a backend in a new system and latest version?
Thank you so much

#

In my company this error is causing a holocaust

clear pike
#

Bitbucket Server or Bitbucket Cloud?

copper timber
#

Bitbucket cloid

dull marlin
clear pike
#

For simplicity, I assume you use the HTTP ingress to receive webhook events. (There are alternatives like AWS SQS.)

Events will be published under a topic like github or bitbucketCloud.

An event router is used to put them to more specialized topics like github.push. See the docs of the related event router implementations.

For GitHub, there is also support for webhook secret verification. So far, only for the HTTP ingress, not for AWS SQS.

The entity provider implementations use the events service to subscribe to the topics they are interested in like e.g. github.push.

They will take care to handle the incoming events and react accordingly.

#

You can have other subscribers, too, of course.

#

Currently, the events service is in-memory only. Means, publisher and subscriber have to be in the same cluster instance. This is a known limitation and there are plans to lift that.

This could be an issue if you have multiple specialized clusters.
In that case, you would need to put the webhook event receivers at the same cluster (instance) as the entity provider (or other subscribers).

#

Besides that, if you want to tackle rate limit issues, you need to reduce the load of non-event-based activity.

The latest discussion in this thread was actually about that part (full refresh and processing loop).

dull marlin
#

Ok... so if I understood correctly, lets say an event is published under a topic bitbucketCloud.repo:updated . Then, under the hood, the bitbucket cloud implementation handles the catalog refresh by itself? With no more configuration than this?

  providers:
    bitbucketCloud:
      yourProviderId: # identifies your ingested dataset
        catalogPath: /catalog-info.yaml # default value
        filters: # optional
          projectKey: '^apis-.*$' # optional; RegExp
          repoSlug: '^service-.*$' # optional; RegExp
        schedule: # same options as in TaskScheduleDefinition
          # supports cron, ISO duration, "human duration" as used in code
          frequency: { minutes: 30 }
          # supports ISO duration, "human duration" as used in code
          timeout: { minutes: 3 }
        workspace: workspace-name```
clear pike
#

as long as the specific event is supported by it, yes

#

and it depends whether you use the old or new/current backend system

#

BitbucketCloudEntityProvider supports bitbucketCloud.repo:push currently

#

however, more topics could be added

#

I use the Bitbucket Cloud one actively as well, btw

dull marlin
clear pike
#

btw: I would recommend to change the schedule for the full refresh and the processing interval if you have events enabled (e.g., the webhook subscription, etc.)

#

e.g., we opted in for a weekly full refresh (be aware: this is discovery of catalog files) and a higher value for the processing interval (re-processes Locations/catalog files; apply globally).

Due to our mixed setup and the global impact, we couldn't use a similarly high config for it. Kind of a compromise 🙂

copper timber
#

Ok, sorry for the ignorance, but then you only need to configure a webhook that invokes a backstage http endpoit or a queue in AWS. And internally @backstage/plugin-catalog-backend-module-bitbucket-cloud is responsible for refreshing those catalog elements?

and of course increase the frequency to weeks, maybe months...

clear pike
#

you could decide for running the full discovery run even less frequent

clear pike
#

If you use the old backend system still, you would need to do some manual wiring as well. These steps are explained in the respective docs, though.

copper timber
#

Yes, in my case (I think) I have it correctly configured in the app-config.yml

events: http: topics: -bitbucketCloud

clear pike
#

you can try to send an event manually. If it succeeds you know for sure 😁

#

I've added SQS as we didn't want to open it up to the "public".
Also, you can use it as a buffer and consume as fast as your cluster can manage and potentially scale based on the amount of messages, etc.

HTTP ingress the default and most straightforward option, though.

gleaming terrace
#

Assuming it's a typo, but just noting that there needs to be a space between - and bitbucketCloud in that config

verbal turtle
#

Hello, i might have missed examples in the documentation for events, but i still dont really see how to use that event in order to do some actions.

Like, if by changing a component in my gitlab repository, i wanted backstage to push some stuff in a certain gitlab project, how would i do that ? Is it similar to actions in Template kind ?

clear pike
#

@verbal turtle you can use the EventsService to subscribe to the topic you are interested in (like events from GitLab) and then perform the action you want.

Currently, we have Entity Provider implementations that perform a refresh for affected entities on refresh.

dull marlin
# clear pike BitbucketCloudEntityProvider supports `bitbucketCloud.repo:push` currently

Hello again, @clear pike . The requirements of my project changed and I'm looking at this again. pullrequest:fulfilled is not currently implemented, right? I see that when I configure the repo webhook to trigger on this, I recieve the event in the backend logs, but no action is performed on it.
How difficult would be for me to implement this new subtopic?

tropic horizon
#

I am trying to access the cataog-info.yaml file from the develop branch of my repositories in bitbucket