#Automatic discovery of service dependency
19 messages · Page 1 of 1 (latest)
@south chasm this is possible by creating an entity provider. The entity provider would connect to some service that would give you information about deployed services including their metadata. You can read about it here https://backstage.io/docs/features/software-catalog/external-integrations
Are you able to mark this topic as answered?
trying to figure out how to do that
in the three-dots menu at the top, you can choose "edit tags"
sorry if this in unwelcome necro: is there any prior art or reusable plugin here? it seems like a common need to get data from a github repository and use that to stick labels or annotations or even relations on some related component(s)
Not that I know of. But it's pretty easy to grab an octokit client and start making requests
Do bear in mind that you'll be making advancements toward your rate limits as you start polling for this additional data
Can you say more about this? Do you mean, you want to read files on file system and emit annotations, labels and relationships? For example, package.json or something like that?
one thing I'm thinking of is the tech insights plugin -- IIUC, it would ideally be driven by data in the repo and/or pulled out of the repo by the build pipeline and/or pulled out of wherever the thing is deployed, so that it stays up to date instead of creating a data entry task that either isn't done regularly or is done in bad faith if you force people to do it.
In general, I feel like most of the data in backstage shouldn't be separately maintained because I think deriving from authoritative sources is how I can force accuracy of the data ... that or I at least need to be able to audit and report the discrepancy.
IIUC, the approach without a component provider would be to have something like an org-wide github app that owns the part of CI/CD where I want to to force whatever component labels/annotations on the yaml in the repo to be consistent with reality according to whatever definitions I'm responsible for
In general, I feel like most of the data in backstage shouldn't be separately maintained because I think deriving from authoritative sources is how I can force accuracy of the data ... that or I at least need to be able to audit and report the discrepancy.
I agree with you.
IIUC, the approach without a component provider would be to have something like an org-wide github app that owns the part of CI/CD where I want to to force whatever component labels/annotations on the yaml in the repo to be consistent with reality according to whatever definitions I'm responsible for
I want to make sure I understand. A solution to keeping YAML files up to date could be to have a GitHub App that ensures that all yaml files have correct annotations and labels?
one thing I'm thinking of is the tech insights plugin -- IIUC, it would ideally be driven by data in the repo and/or pulled out of the repo by the build pipeline and/or pulled out of wherever the thing is deployed, so that it stays up to date instead of creating a data entry task that either isn't done regularly or is done in bad faith if you force people to do it.
I agree with you. Do fact retrievers do what you're suggesting here?
I want to make sure I understand. A solution to keeping YAML files up to date could be to have a GitHub App that ensures that all yaml files have correct annotations and labels?
yeah, if some data in the component yaml files is derived from the files in the repo (eg for tech insights, what version of whatever python library is in requirements.txt or a lock file or whatever) then one way I could ensure correctness is to populate that data by hooking into updates to the repo, in this case with a github app that receives the webhooks and pulls down the content for the HEAD for the branch that got pushed. I imagine that's what the backstage github discovery integration does when it finds component yaml ... I just want to effectively achieve the equivalent of looking at more data at that step and using that information to populate more of the component yaml.
... So I guess I either
- fork that github provider code, maybe send back PR's
- or don't use it at all and just sort of do my own thing to get repo data into component yamls on push
- or figure out how the github provider and my thing can both update the same components,
- or have discrete subgraphs of components that I manage vs what the github provider manages?
I agree with you. Do fact retrievers do what you're suggesting here?
I suppose ... the overall question I guess comes down to:
if an entity provider is already looking at the data that a fact retriever would look at, then would I ideally enhance that provider to just stick that data in a label? Should I aspire to have fact retrievers that only need to consult labels/annotations/relations on components? Is this overly idealistic/non-pragmatic?
thank you for your attention here, I realize I'm in a bit of a rabbit hole
Wouldn't it be optimal if this per-repo knowledge was maintained and cached by an external party with an API? Then it would benefit several backstage plugins as well as potentially the rest of the org.
Then this knowledge base could easily be shifted to be driven by merges instead of endlessly polling repos that rarely if ever change in practice
I guess it depends on to what degree backstage aspires to have an API. With a first class API, I'd be happy to just extend the backstage model as needed to capture the data I'm interested in and just gather it all up into backstage and then consume it from there. The details of how the github provider works seem kind of orthogonal.
Wouldn't it be optimal if this per-repo knowledge was maintained and cached by an external party with an API?
If it's a common enough use case, it might make sense for a Backstage plugin to provide this functionality so companies don't need to create it from scratch. It doesn't necessarily need to be part of Backstage core, but it might make sense for something to exist that's reusable.
if an entity provider is already looking at the data that a fact retriever would look at, then would I ideally enhance that provider to just stick that data in a label?
The pattern of reading files from a repositories and annotating entities in the catalog are currently pretty much limited to reading catalog-info.yaml files. You can make something from scratch yourself, but it'd have to be pretty custom.
I read the thread and I think that what I'm doing is sort of related. https://discord.com/channels/687207715902193673/1016996099648671746
So far in my plugin I've taken the same approach used by @backstage/plugin-todo and of course instead of reading ToDo from the code I retrieve info from packages.json