#Proposal for PubSub metadata propagation - requires community feedback!

1 messages · Page 1 of 1 (latest)

jagged crow
#

I've had this on my mind for some time, and it keeps cropping up in GitHub issues and discord.

So...Metadata propagation within PubSub is confusing because it intersects with built-in metadata behaviours, which creates ambiguity across the board.

I think there are 3 use-cases (possibly more, please let me know) that require 3 slightly different solutions, but all need to be considered and addressed as a group to make the whole thing worth-while and consistent, leading to a better, more predictable developer experience across all PubSub components.

See the attached Miro image for the use-cases and proposed solutions.

3 use-cases are on the left, read the Yellow blocks first to understand the requirement. Then move to the right hand column to see the proposed solution in the green block.

If I get enough positive feedback/support for this, I'll turn it into a proper proposal in the dapr/proposals repo.

Thanks folks 🙂

jagged crow
#

Btw, I'm sure there are other use-cases that I've not covered, please don't hesitate to let me know your thoughts! 🙂

sinful lodge
#

This particular request has been open for a very long time. It appears to be covered in the use cases but what I'd need to gain full-scale adoption at my organisation is the propagation of all Kafka headers as HTTP/gRPC headers to my app. We have many teams pushing many different streams of data to Kafka all with various headers. Consumers of these events are dependent on the headers based on their own use cases in order to do what they need. Right now, this is blocker for the majority of teams in adoption of Dapr PubSub despite lots of interest being expressed in it.

To be a bit more specific, we pass things such as:

X-CorrelationId
X-IdempotencyId
merry skiff
#

Metadata is already propagated actually - unless I'm not understanding this issue

#

If you manually provide the cloud event and define custom properties in there.. those are retained and sent to the subscriber

#

these custom properties are called cloud event extensions if I understand the spec correctly

#

In the case of Python I implemented support for this in dapr-ext-grpc package (the app helper SDK for gRPC)

#

https://github.com/dapr/python-sdk/blob/22840403ef167c15465dd6caae0ee8c6fca1e49c/ext/dapr-ext-grpc/dapr/ext/grpc/_servicier.py#L187-L192

To avoid any clash between cloud event extensions, and things you send as arbitrary metadata headers, I gave all the arbitrary metadata headers a prefix of _metadata_

These properties are exposed via the Extensions dictionary of the actual Cloud Event object (provided by the official cloud event SDK)

GitHub

Dapr SDK for Python. Contribute to dapr/python-sdk development by creating an account on GitHub.

#

I think I'm the only SDK maintainer who implemented this.. but the runtime really does propagate custom cloud event properties, and also custom metadata attributes.

#

You don't have to use the SDKs - the data exists in the gRPC proto as you can see from the code I linked.

#

Anyways - I'm not looking further into this issue myself. Just wanted to state what I believe exists in runtime and what I added to the Python SDK for receiving events via gRPC.

jagged crow
# sinful lodge This particular request has been open for a very long time. It appears to be cov...

So am I right in saying that if Dapr allowed you to pass x-correlationId / x-idempotencyId via metadata when publishing the message, and then receive those headers on HTTP at the subscriber side, that would meet your requirements? (That would be the 3rd use-case in my illustration)

OR

are you looking to keep your publishers non-daperized? (You put x-correlationId directly as Kafka message headers) but on your daprised subscriber you wish to receive these as HTTP headers?

sinful lodge
sinful lodge
jagged crow
#

Yep I fully understand.

You need to be able to have Dapr subscribers interop with your system as it exists today (raw messages, originating from non-Dapr publishers), and also iterate over all the existing Kafka headers (regardless of if those headers are Kafka built-in headers or your own custom ones)

this extends beyond Kafka, to all brokers

jagged crow
# merry skiff Metadata is already propagated actually - unless I'm not understanding this issu...

Depends on the component as far as I can tell - for example, for Kafka Publishing a message with user-provided arbitrary metadata (via Dapr publish) will propagate and arrive as HTTP headers to the subscriber. Which is great.

The same is not true when using RabbitMQ, unfortunately. And here lies one problem, this behaviour should be consistent.

However, even if it did, there still remains the issue that propagating headers without any indication of the source of those headers is ambiguous - Hence why I’m proposing the component prefix and the passthrough prefix as I described in my illustration.

Btw This proposal is not related to the passing of custom cloud event fields or the propagation of those custom cloud event fields.

merry skiff
#

Metadata in Pubsub were always designed to control component behavior - that was never intended to be a mechanism to passthrough data.

In Dapr Cloud Event is the main transport mechanism - here it is possible to create a cloud event manually add add arbitrary properties which then will be treated as cloud event extensions. That's the way you can passthrough custom metadata to your applications.

#

Raw events are not a primary use case, and passing through metadata specifically for raw events is unsupported. It could be added, but this needs to be designed as a new feature.

I don't support creating component specific approaches here.

#

It's also important to be very very clear what we are talking about:

Propagation from Dapr -> Dapr is very different than Dapr -> external consumer

merry skiff
#

So in the case of Azure Service Bus there is a property called ApplicationProperties on each pubsub message which can be used for custom properties to be stored.

What does Kafka offer in this regard?

#

Does Kafka offer a standardized way to send custom metadata / attributes which themselves don't reside on the main message body / payload?

jagged crow
# merry skiff Raw events are not a primary use case, and passing through metadata specifically...

I agree it may not have been considered a prime use-case, however as @sinful lodge has described, it's essential when trying to integrate existing systems which are not publishing CloudEvents into newer systems that want to use dapr. If the headers can't be copied verbatim, this makes adoption of Dapr significantly harder.

I can sympathise fully with Simon here, and I think its super important to make the adoption of Dapr as frictionless, particularly when interoperating with existing non-dapr systems.

jagged crow
sinful lodge
# jagged crow I agree it may not have been considered a prime use-case, however as <@449612585...

In light of this, when my company started looking at Dapr, we just presumed any headers on any broker would come through. That felt like the most synonymous thing given Dapr’s idea of “it just works.”

We were quite surprised when we found not all headers were published as HTTP headers. Felt like an oversight.

For now, my team has worked around the problem but as mentioned, for wider adoption, n-many headers needs to be supported regardless of the broker.

merry skiff
#

Nothing "just works" in any product 🙂 It all depends on how complex your use case is. That being said, if you can figure out how one needs to send these key value pairs and how they can be retrieved using the sarama library, then we can definitely make it work for Kafka.

The raw event type is a deviation from Dapr's main transport mechanism. Using raw events Dapr has no way to create a standardized abstraction across all pubsub components today. So here it would require component specific changes - and those are new feature requests specifically to enable the raw event use case passthrough of metadata.

jagged crow
#

I don’t think it was an oversight. Just a need that has emerged 🙂

merry skiff
#

I don't think we will be able to provide a general solution for the raw events situation

#

it will depend on the individual pub sub component's capabilities -- for Kafka it sounds like we might be able to add this.

#

If someone hurries up and looks up the Kafka documentation for this and opens an issue in github.com/dapr/components-contrib.. maybe just maybe we can even get this into Dapr 1.14 still. No promises.. but it wouldn't be too hard. But it needs to also exist in the sarama library

jagged crow
#

So here it would require component specific changes - and those are new feature requests specifically to enable the raw event use case passthrough of metadata.

Maybe I’m oversimplifying things but raw is related to just the message payload right?

If custom attributes can be retrieved from the published metadata dictionary and supplied directly onto the underlying brokers ability to hold custom data per message (like we see with Kafka headers, azure service bus message properties, AWS SQS/SNS ‘message attributes’ - that simplifies things, right? Rather than trying to create a new wrapper type which contains both custom data and the raw payload?

dire geyser
#

Maybe my Discord-fu isn't up to par, but do you have a link to a higher resolution picture? I can't read anything in it.

jagged crow
dire geyser
#

Substantially better - thank you

small saddle
#

+1 for this feature. Many of our existing messaging services use Kafka and we've implemented custom headers to propagate additional information, including the OpenTelemetry trace fields (traceid/traceparent). While Dapr works well subscribing to these topics using the raw payload feature, it will be years before all these topics transition to true cloud events so right now we're losing that trace propagation. Instead if the Kafka headers on the subscribed incoming message were translated to HTTP headers and passed to the subscribing application (ASP.NET Core) we could implement middleware that would extract these headers and propagate traces and other information. Would be a huge win.

jagged crow
#

Thanks for the feedback @small saddle

Can I ask, are your publishers dapr-ised? as in, are they using dapr directly to publish messages to a topic (via the Dapr SDK or the Dapr HTTP API)

Or are the publishers publishing directly to kafka via language native client libraries? (with therefore no awareness of Dapr)

small saddle
# jagged crow Thanks for the feedback <@796786508064489473> Can I ask, are your publishers d...

Primarily the latter at the moment - we're using the native Confluent Kafka libraries to produce messages and we've wrapped those in our own .NET libraries that inject custom headers (and extract them on consumption). However we're looking at using Dapr with Kafka for Python/Java applications that would consume these same messages (non-cloud-events) and being able to pass these Kafka headers through as HTTP headers on Dapr subscribe would help tremendously.

jagged crow
#

@small saddle in your use-case, you would specify the custom kafka header as x-dapr-passthrough.<your-custom-propery-name and then this would be delivered directly to the subscribing application as a HTTP Header named x-dapr-passthrough.<your-custom-property-name - you could then chose to strip the prefix so the HTTP header arives as just <your-custom-property-name>

Does all that sound feasible to you?

small saddle
#

OK so at the moment we have a header on Kafka message called traceparent (no prefix). Is the proposal that when Dapr consumes this message it will add a prefix to this and pass it to the application subscriber handler as an HTTP header called x-dapr-passthrough.traceparent? And you could choose to remove that prefix by specifying stripPassthroughPrefix: true such that the HTTP header name becomes traceparent?

jagged crow
# small saddle OK so at the moment we have a header on Kafka message called `traceparent` (no p...

Very nearly almost!

You would have to modify the publish so that traceparent property is prefixed with x-dapr-passthrough - > x-dapr-passthrough.traceparent

My proposal is about being explict about the context of these custom properties, so that dapr knows what to do with them. Adding the x-dapr-passthrough prefix gives a clear instruction to dapr not to mess with the property, or strip it, or perform any custom behaviour in the pubsub component.

#

And you could choose to remove that prefix by specifying stripPassthroughPrefix: true such that the HTTP header name becomes traceparent?
This is correct

small saddle
#

Ok that won't work for us unfortunately. We have hundreds of publishers in Production, we're not going to be able to change them. Instead our requirement is that Dapr is able to simply read the Kafka headers and pass them through as HTTP headers. I don't mind if it adds a prefx (we could remove that ourselves in our handlers) but we can't rely on the headers already having a known prefix. That's too limiting a scenario for integrating with existing implementations.
e.g. if there's no known Dapr prefix on the headers, then Dapr could simply add something like x-dapr-kafka or something (i.e. provider specific). There's no need really to remove the prefix - I can see why you're looking at an end-to-end solution that does - but extracting them from Kafka and passing them through in some way is really important.

small saddle
#

@jagged crow Apologies I'm being a doofus - Dapr is already passing through the Kafka header attributes. The reason I couldn't see traceparent alongside the regular Dapr ones like __key and __partition is because the ASP.NET Core middleware is already extracting it and using it to propagate the OpenTelemetry tracing 😏 So no need to derail this thread with that requirement - I'm sure it'll also work for Python & Java.
I've just re-read your proposal though and it would definitely help us with other migration scenarios, so it gets a thumbs up from me! 👏