Deep, often changing data structures, what Marten facilities could we use? | Critter Stack .NET | Page 1

latent musk May 7, 2025, 7:54 AM

#

We're modeling MartenDB projections with deep, often changing data structures and are concerned about potential table fragmentation and bloat.

Being newer to event sourcing, we're wondering if this is primarily a data modeling challenge in how we design projections.

Let's say, a SingleStreamProjection<Library> (creating a table mt_doc_library) where each row contains data for a single library, with nested authors and all data related to the books. Readers may also add their own highlights/underlines to any page of any book, which are also stored under the single Library entity.

I'd be interested in whether we could model this better using the facilities MartenDB provides?

Thank you.

floral vigil May 7, 2025, 8:28 AM

#

What's the relationship with Library, is it one per tenant/user?

#

I'd suggest changing the model to be LibraryEntry so you can query the projections as a list rather than a monolithic entity. It'd make updates significantly more performant.

latent musk May 7, 2025, 8:41 AM

#

Thanks for the suggestion, the library is a monolithic collection/blob that has several users writing to it, and it is not tied to a tenant.

latent musk May 7, 2025, 8:58 AM

#

Here's a simplified version of the projection's data model,

public record Library
{
    public required Guid Id { get; init; }
    public required string Name { get; init; }
    public required ImmutableDictionary<BookId, Book> Books { get; init; }
    public required ImmutableDictionary<HighlightId, Highlight> Highlights { get; init; }
    public required ImmutableHashSet<User> UsersWithAccess { get; init; }
}

Do you mean we could make e.g. BookEntry that would contain e.g. projections of a single book in this library? Effectively refactoring the immutable dictionaries out into "Entries". 🤔

cold elbow May 7, 2025, 9:26 AM

#

Working in this same project as @latent musk, to be clear, the Library itself can have data fields in addition to the Name in example above, like Address, ConstructionYear, etc.

zenith hound May 7, 2025, 11:35 AM

#

Agree with @floral vigil that its almost always very beneficial to have seperate projections (Library, Book Highlight) and then on retriaval use something like included document feature of Marten to get out what you need, unless you really have a high traffic readmodel that literally needs to show the Library and all books, all highlights and all users without any form of paging etc.

floral vigil May 7, 2025, 11:40 AM

#

Sorry, missed the replies. Always try to avoid projections that are effectively unbounded in size, they're a massive performance trap. If your model becomes a root object with unbounded lists of related entities (in the sense you'd consider them related in a normalized relational model) it's usually a sign that those entities should be separate projections.

storm nymph May 8, 2025, 7:29 AM

#

floral vigil Sorry, missed the replies. Always try to avoid projections that are effectively ...

When normalizing like this, should it also be separate streams? Or just separate projections? I could imagine if we can keep it in a single stream I imagine its easier to determine which version of the related entities we want when time traveling backwards to any given version?

#

If an (inline?) projection only cares about certain types of events in a stream, is it internally optimized so that it doesn't update the JSONB when events that are not relevant to the projection at hand are added to the stream?

zenith hound May 8, 2025, 7:41 AM

#

storm nymph When normalizing like this, should it also be separate streams? Or just separate...

Do you want to time travel in the projection to have a history view or what is the use case?

storm nymph May 8, 2025, 10:38 AM

#

zenith hound Do you want to time travel in the projection to have a history view or what is t...

A history view. Performance of the time traveling isnt as critical as it'd most likely be a more rarely used feature basically being able to look at the state in any given past version, including the related documents at that same point

storm nymph May 8, 2025, 1:59 PM

#

Is making multiple projections out of the same stream a thing? Im thinking converting the subcollections into separate projections mostly to reduce table dead tuple bloat. Typically these collections are queried all together with the root object anyway, so I'm not sure if theres a need to have separate streams or individual projections of the nested entities.

zenith hound May 9, 2025, 5:45 AM

#

Definetely a thing, typically a stream is about capturing the facts that happen i.e. the write side of CQRS whereas the projections are the read side, its very common to have many different views on the same stream. Some even take it to extremes and duplicate projection in vertical slices to reduce coupling.

Regarding the timetravel if its a very rare use case and performance is not crucial sometimes you can just query the aggregate directly for this sort of thing....

storm nymph May 9, 2025, 7:52 AM

#

zenith hound Definetely a thing, typically a stream is about capturing the facts that happen ...

From a modeling perspective I feel like having one stream would still be desirable as typically we care about the version of the root object and everything nested at the same time. For timetraveling just doing an ad hoc aggregation is what I had in mind.
But for the separate projections for the nested entities, is it possible to somehow project individual entities without projecting the whole collection?
Say in the earlier example I want to know what Book entities are projected out of the stream. If I project it as any Enumerable<Book> I would still have a relatively large JSONB at hand. Can I somehow project these into individual documents even if the stream is the same?

zenith hound May 9, 2025, 8:19 AM

#

By all means have one stream. And yes it’s possible to project a subset of data from the stream. Lots of reading about this in the docs and shown in sample applications

#Deep, often changing data structures, what Marten facilities could we use?