Automatic discovery of service dependency | Backstage | Page 1

south chasm Sep 7, 2022, 6:22 PM

#

Is there any option to achieve automatic descovery of services and related metadata? Currently we need to add catalogue info in a yaml file in the repo and I was thinking if it would be possible to add the same metadata to the deployed service and backstage could build the service catalogue from there ?

abstract bear Sep 7, 2022, 8:38 PM

#

@south chasm this is possible by creating an entity provider. The entity provider would connect to some service that would give you information about deployed services including their metadata. You can read about it here https://backstage.io/docs/features/software-catalog/external-integrations

External integrations · Backstage Software Catalog and Developer Pl...

Documentation on External integrations to integrate systems with Backstage

abstract bear Sep 7, 2022, 10:25 PM

#

Are you able to mark this topic as answered?

south chasm Sep 7, 2022, 11:32 PM

#

trying to figure out how to do that

rich vault Sep 8, 2022, 12:56 PM

#

in the three-dots menu at the top, you can choose "edit tags"

neon sentinel Sep 12, 2022, 2:02 PM

#

sorry if this in unwelcome necro: is there any prior art or reusable plugin here? it seems like a common need to get data from a github repository and use that to stick labels or annotations or even relations on some related component(s)

rich vault Sep 12, 2022, 2:09 PM

#

Not that I know of. But it's pretty easy to grab an octokit client and start making requests

#

Do bear in mind that you'll be making advancements toward your rate limits as you start polling for this additional data

abstract bear Sep 12, 2022, 3:26 PM

#

neon sentinel sorry if this in unwelcome necro: is there any prior art or reusable plugin here...

Can you say more about this? Do you mean, you want to read files on file system and emit annotations, labels and relationships? For example, package.json or something like that?

neon sentinel Sep 12, 2022, 5:49 PM

#

abstract bear Can you say more about this? Do you mean, you want to read files on file system ...

one thing I'm thinking of is the tech insights plugin -- IIUC, it would ideally be driven by data in the repo and/or pulled out of the repo by the build pipeline and/or pulled out of wherever the thing is deployed, so that it stays up to date instead of creating a data entry task that either isn't done regularly or is done in bad faith if you force people to do it.

In general, I feel like most of the data in backstage shouldn't be separately maintained because I think deriving from authoritative sources is how I can force accuracy of the data ... that or I at least need to be able to audit and report the discrepancy.

IIUC, the approach without a component provider would be to have something like an org-wide github app that owns the part of CI/CD where I want to to force whatever component labels/annotations on the yaml in the repo to be consistent with reality according to whatever definitions I'm responsible for

abstract bear Sep 12, 2022, 6:50 PM

#

In general, I feel like most of the data in backstage shouldn't be separately maintained because I think deriving from authoritative sources is how I can force accuracy of the data ... that or I at least need to be able to audit and report the discrepancy.
I agree with you.

IIUC, the approach without a component provider would be to have something like an org-wide github app that owns the part of CI/CD where I want to to force whatever component labels/annotations on the yaml in the repo to be consistent with reality according to whatever definitions I'm responsible for
I want to make sure I understand. A solution to keeping YAML files up to date could be to have a GitHub App that ensures that all yaml files have correct annotations and labels?

one thing I'm thinking of is the tech insights plugin -- IIUC, it would ideally be driven by data in the repo and/or pulled out of the repo by the build pipeline and/or pulled out of wherever the thing is deployed, so that it stays up to date instead of creating a data entry task that either isn't done regularly or is done in bad faith if you force people to do it.
I agree with you. Do fact retrievers do what you're suggesting here?

neon sentinel Sep 12, 2022, 8:06 PM

#

I want to make sure I understand. A solution to keeping YAML files up to date could be to have a GitHub App that ensures that all yaml files have correct annotations and labels?
yeah, if some data in the component yaml files is derived from the files in the repo (eg for tech insights, what version of whatever python library is in requirements.txt or a lock file or whatever) then one way I could ensure correctness is to populate that data by hooking into updates to the repo, in this case with a github app that receives the webhooks and pulls down the content for the HEAD for the branch that got pushed. I imagine that's what the backstage github discovery integration does when it finds component yaml ... I just want to effectively achieve the equivalent of looking at more data at that step and using that information to populate more of the component yaml.

... So I guess I either

fork that github provider code, maybe send back PR's
or don't use it at all and just sort of do my own thing to get repo data into component yamls on push
or figure out how the github provider and my thing can both update the same components,
or have discrete subgraphs of components that I manage vs what the github provider manages?

#

I agree with you. Do fact retrievers do what you're suggesting here?
I suppose ... the overall question I guess comes down to:

if an entity provider is already looking at the data that a fact retriever would look at, then would I ideally enhance that provider to just stick that data in a label? Should I aspire to have fact retrievers that only need to consult labels/annotations/relations on components? Is this overly idealistic/non-pragmatic?

#

thank you for your attention here, I realize I'm in a bit of a rabbit hole

rich vault Sep 13, 2022, 6:58 AM

#

Wouldn't it be optimal if this per-repo knowledge was maintained and cached by an external party with an API? Then it would benefit several backstage plugins as well as potentially the rest of the org.

#

Then this knowledge base could easily be shifted to be driven by merges instead of endlessly polling repos that rarely if ever change in practice

neon sentinel Sep 13, 2022, 3:19 PM

#

I guess it depends on to what degree backstage aspires to have an API. With a first class API, I'd be happy to just extend the backstage model as needed to capture the data I'm interested in and just gather it all up into backstage and then consume it from there. The details of how the github provider works seem kind of orthogonal.

abstract bear Sep 13, 2022, 5:02 PM

#

Wouldn't it be optimal if this per-repo knowledge was maintained and cached by an external party with an API?
If it's a common enough use case, it might make sense for a Backstage plugin to provide this functionality so companies don't need to create it from scratch. It doesn't necessarily need to be part of Backstage core, but it might make sense for something to exist that's reusable.

if an entity provider is already looking at the data that a fact retriever would look at, then would I ideally enhance that provider to just stick that data in a label?
The pattern of reading files from a repositories and annotating entities in the catalog are currently pretty much limited to reading catalog-info.yaml files. You can make something from scratch yourself, but it'd have to be pretty custom.

ornate sequoia Sep 14, 2022, 7:41 AM

#

I read the thread and I think that what I'm doing is sort of related. https://discord.com/channels/687207715902193673/1016996099648671746
So far in my plugin I've taken the same approach used by @backstage/plugin-todo and of course instead of reading ToDo from the code I retrieve info from packages.json

#Automatic discovery of service dependency