#OTel live progress
1 messages · Page 1 of 1 (latest)
Yeah, it's a custom implementation by @fierce minnow. We call on end in on start: https://github.com/dagger/dagger/pull/9718/files#diff-9b42608174681163d57b055b73549d2a7cfdc631586f90744087baf751aa2153R116-R130
Yep you guessed it. There are a few pitfalls, but it's holding so far:
- For various reasons you'll end up with different timestamps in place of EndTime, sometimes you get year 1764 or something, sometimes you get Unix epoch, sometimes you get 0001, etc. - so your best bet is to always just test "if endTime < startTime"
- We also support sending data to traditional services/tools like Jaeger and Honeycomb (via standard
OTEL_*env vars), which understandably get very confused by this, so we have an exporter in the pipeline that strips out any unfinished spans before passing the payload downstream - For a while we also supported live updates to span attributes/events/etc. but that was much too fiddly (have to wrap the entire span API) especially since each SDK would have to implement it, so now we just send on start/end. If we ever need incremental updates to spans we'll just use OTel logs instead.
I see, thanks for the view under the hood, was just curious while working on some live/real-time stuff and thought of Dagger. Would love if OTEL backends actually supported something like this.
Yeah I've been pretty shocked since the beginning that OTel didn't address this 😅 It was pretty whack seeing orphaned data in Jaeger until the root span finally arrived and told you what any of it was
@fierce minnow when dagger runs a tool in a container, and that tool emits otel traces , will those traces also be shown in real time?
if they're just using off-the-shelf OTel SDK running in a regular non-module container: no
if they're in a module whose SDK supports live streaming (Go, Python): yes
if they're using the telemetry package exported by our SDKs to auto-write OTel (Go, and I think Python): yes
here's what it looked like for Python to add support: https://github.com/dagger/dagger/commit/976cd0bf4be8d1cacbc3ee23a7ab057e8868ac2d
samesies, and it's far from just a CI or dagger problem, in fintech at least we'd have awful trace gore all the time because of long-running batch jobs running on temporal
"who's calling this function if the batch job isn't running?!!"