#Telemetry logs

1 messages · Page 1 of 1 (latest)

lime estuary
#

@ebon shoal is there a distinction in cloud between a log exported through otlp and just stderr/stdout output?

#

It seems to me that the logs are using a ConsoleHandler that's printing them to stderr, is there an easy check to differentiate?

#

Maybe on honeycomb?

ebon shoal
#

not 100% sure what you mean - do you mean process stdout/stderr vs. a project explicitly using OTel APIs to emit logs?

#

we use special log attributes to distinguish stdout/stderr and EOF (process exited; no more logs), but we also use those internally in places where we want a 'virtual' stdout/stderr (telemetry.SpanStdio)

#

those attributes are stdio.stream (1 = stdout, 2 = stderr) and stdio.eof (true, sent with an empty body)

ebon shoal
lime estuary
#

Yeah, it's the Python SDK. I'm trying to improve the default logging configuration as they're not showing by default. The OTel SDK hooks into Python's logging stdlib, so no special handling should be needed, like calling an OTel specific API as a user/SDK. Logs only show if I call logging.basicConfig() which sets up a StreamHandler which writes to sys.stderr by default. Same log records should be sent through the OTel exporter though.

#

Wondering about this part in dagger.gen.go:

    // Direct slog to the new stderr. This is only for dev time debugging, and
    // runtime errors/warnings.
    slog.SetDefault(slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{
        Level: slog.LevelWarn,
    })))

So is this writing logs to stderr, and without it logs don't show as well?

ebon shoal
#

that comment might be misleading - looks like copypasta from here in the same commit (this goes way back to the switch to OTel):

               // Direct slog to the new stderr. This is only for dev time debugging, and
               // runtime errors/warnings.
               slog.SetDefault(telemetry.PrettyLogger(errWriter, slog.LevelWarn))
#

(which makes sense, "new stderr" is weird since it's using raw os.Stderr below)

#

doing some code anthropology to see if it's relevant at all

#

seems like a red herring - just sane defaults for slog thinkies - by default it delegates to the log package and I think at one point we had to point that to io.Discard because of some noisy package we couldn't control, so maybe this was to work around that

#

I would expect writes to stdout/stderr to make it to the function's logs in Cloud/OTel

lime estuary
#

Yeah, stdout/stderr do show, but what if the only handler is otel and not streaming to stderr?

ebon shoal
#

As long as it gets sent to the configured exporter, those should make it to the UI too

lime estuary
#

But is there a visual difference between a log sent through the otel exporter and just print("something")?

ebon shoal
#

not at the moment

#

so if you want to keep things simple I think you could just send everything to stdout/stderr like normal

lime estuary
#

I want to make sure otel logs are working. I'll double check where the logs are being sent.

lime estuary
#

@ebon shoal, can HoneyComb show OTel logs or just spans?

ebon shoal
#

I think they shipped a new logs feature but we "intelligently" avoid sending logs unless it's explicitly configured, since at the time most OTel providers didn't support logs

lime estuary
#

Do you have a suggestion for checking if the log exporter is actually sending anything?

ebon shoal
lime estuary
#

Don't see them anywhere, but the OTel SDK detects the OTEL_ vars and sets it up. It sets the handler:

Root Logger:
  Level: WARNING
  Handlers:
    - opentelemetry.sdk._logs._internal.LoggingHandler
      Level: NOTSET
      Formatter: [CUSTOM] %(asctime)s - %(name)s - %(levelname)s - %(message)s

But unless I also have a StreamHandler that writes to stderr I don't see anything anywhere:
Root Logger:

  Level: WARNING
  Handlers:
    - logging.StreamHandler
      Level: NOTSET
      Formatter: [SIMPLE] %(asctime)s - %(name)s - %(levelname)s - %(message)s
    - opentelemetry.sdk._logs._internal.LoggingHandler
      Level: NOTSET
      Formatter: [CUSTOM] %(asctime)s - %(name)s - %(levelname)s - %(message)s
ebon shoal
ebon shoal
lime estuary
#

A debugger would come in handy now.

ebon shoal
#

and you're testing with a warning-level log?

#

might be nothing but it seems weird to refer to _logs._internal thinkies - seems like that's not meant to be directly touched?

lime estuary
#

It's just because it's experimental, but that's not imported by me. It's the opentelemetry dependency that's adding its handler, I just printed it's module path to debug.

lime estuary
#

Hey @ebon shoal, I got it working. I was missing the trace id propagation. It shows in UI, problem is it's being turned into a stdio.stream, losing all the attributes that Python is setting, including not correctly showing inside a custom span. I know you're well aware, just going to share what the Python OTel SDK is doing.

#

I made a simple test, adapted from OTel’s logging example:

import logging
import os

from opentelemetry import trace

from dagger import telemetry

os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = "http://localhost:4318"
os.environ["OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"] = "http://localhost:4318/v1/traces"
os.environ["OTEL_EXPORTER_OTLP_LOGS_ENDPOINT"] = "http://localhost:4318/v1/logs"

telemetry.initialize()

logger = logging.getLogger("myapp.area1")
logger.error("The five boxing wizards jump quickly.")

tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("foo"):
    user_id = "user-123"
    logger.error("I have custom attributes.", extra={"user_id": user_id})

telemetry.shutdown()
#

I get the following from the collector:

LogRecord #1
ObservedTimestamp: 2025-06-06 19:11:41.891551 +0000 UTC
Timestamp: 2025-06-06 19:11:41.891517952 +0000 UTC
SeverityText: ERROR
SeverityNumber: Error(17)
Body: Str(I have custom attributes.)
Attributes:
     -> otelSpanID: Str(6f16d4255748e857)
     -> otelTraceID: Str(8bd06b56b772be51c1b3de1e86df9365)
     -> otelTraceSampled: Bool(true)
     -> otelServiceName: Str(dagger-python-sdk)
     -> user_id: Str(user-123)
     -> code.file.path: Str(/Users/helder/Projects/dagger/sdk/python/test.py)
     -> code.function.name: Str(<module>)
     -> code.line.number: Int(20)
Trace ID: 8bd06b56b772be51c1b3de1e86df9365
Span ID: 6f16d4255748e857
Flags: 1
        {"resource": {}, "otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "logs"}
#

The span for the log matches the custom span:

InstrumentationScope __main__                                                                                                        Span #0
    Trace ID       : 8bd06b56b772be51c1b3de1e86df9365
    Parent ID      : 02babf9e9b2db90e                                                                                                    ID             : 6f16d4255748e857
    Name           : foo                                                                                                                 Kind           : Internal
    Start time     : 2025-06-06 19:11:41.891502 +0000 UTC
    End time       : 2025-06-06 19:11:41.891573 +0000 UTC
    Status code    : Unset
    Status message :
#

That log has useful information. If I do the same in a dagger function, in a custom span:

    @function
    def simple(self) -> str:
        with trace.get_tracer(__name__).start_as_current_span("simple test"):
            logger.error("this is an error!")
        return "ok"

I lose all the attributes, including the right span id:

2025-06-06T18:53:43.796Z    info    ResourceLog #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.26.0
Resource attributes:
     -> dagger.io/engine.name: Str(bfd00992b7c6)
     -> host.name: Str(bfd00992b7c6)
     -> service.name: Str(dagger-engine)
     -> service.version: Str(v0.18.9)
ScopeLogs #0
ScopeLogs SchemaURL: 
InstrumentationScope dagger.io/engine.buildkit 
LogRecord #0
ObservedTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2025-06-06 18:53:43.514528368 +0000 UTC
SeverityText: 
SeverityNumber: Unspecified(0)
Body: Str(2025-06-06 18:53:43,511 ERROR [error_py.main] - this is an error!
)
Attributes:
     -> stdio.stream: Int(2)
Trace ID: dcf5dcb0d7a25292799e4ab9dc860dc0
Span ID: 452f972e97246be7
Flags: 0
    {"resource": {}, "otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "logs"}
#

Let’s say the user has a syntax error. In that case it’ll fail very early before even dagger.connection() so I can’t use dag to report the error. Would be best to be able to leverage normal Python logging through OTel.

lime estuary
#

Hey @ebon shoal 👋 I've come back to this, do you mind reviewing https://github.com/dagger/dagger/pull/10532, especially around the handling of dagger.Error and OTel? I need to get this merged asap before I go on PTO next week since it's the base for the Python changes for self-calls that I need to leave Yves with. This way he can rebase and I can add Python to his PR.

#

I know I need to update the telemetry golden tests. What command do you use?

ebon shoal
#

dagger call test telemetry --update -o . i believe

ebon shoal
#

maybe it's more helpful in other scenarios, or maybe i'm just not used to python? but that feels like a lot of noise, and mostly internal plumbing being shown

lime estuary
# ebon shoal hmm this looks a bit noisy - https://dagger.cloud/dagger/traces/20dc9f4be7d48c76...

This particular traceback is for an api error. You should see a part of the traceback referencing your module's code. It tells you which lines triggered it. You could have any number of API queries in a function, even calling out other methods in the same object, and nesting too. That's why I kept it, because it can help pinpoint which part of your code triggered it.

It will get better at a later date because I want to suppress frames in the traceback from the dagger-io library. Ideally you'd see only the part of the traceback that relates to you. The problem is it's not trivial so I either need to fork some stdlib code or use a third-party library, although they format things a bit differently and I'd prefer to stick to the standard format. Could also reduce a bit by removing the chained exception, basically not showing above that part that says "the exception was a direct cause ...", just kept it for now because you can see the error location (path) and that may be also helpful. At some point I may add that to the error message or a small note.

lime estuary
ebon shoal
#

got it

lime estuary
#

@ebon shoal, pushed fixes for the tests. Reduced noise a bit more in the telemetry test.

ebon shoal
#

ah nice - yea that looks nicer

lime estuary
#

First line of traceback is now user code.

ebon shoal
#

approved!