The GithubEntityProvider and similar ones take care of discover any will use the APIs by the related system to find catalog files and register them.
For that purpose, they create Location entities with the target URL pointing to one (or more) catalog file(s).
Depending on the what is possible with these, they usually follow one of the following practices:
- create Location entities only for existing catalog files
- create Location entities for potential catalog files marked as "optional"
Additionally, they schedule a refresh for these immediately (process the Location entities).
The frequency of the full discovery (checking all repos, etc.) is managed by
catalog:
providers:
theProvider:
providerId:
schedule:
frequency: ...
...
Additionally, you have the processing loop that will process Location entities. As part of that, the catalog files gets fetched and processed and contained entities will get created.
The frequency of the processing of Location entities can be controlled using
catalog:
processingInterval: ...
(default is 30 mins I think)
The (re-)processing for individual Location entities can be scheduled which is used by the event-based approach. E.g., if we identify that an existing catalog files was modified (or potentially modified).
Also, you can trigger the refresh manually via the UI at the entity pages (refresh button). This only works for entities that originated from a Location entity.
Both parts will cause API requests and will consume budget from the rate limit.