#Deep, often changing data structures, what Marten facilities could we use?

1 messages · Page 1 of 1 (latest)

latent musk
#

We're modeling MartenDB projections with deep, often changing data structures and are concerned about potential table fragmentation and bloat.

Being newer to event sourcing, we're wondering if this is primarily a data modeling challenge in how we design projections.

Let's say, a SingleStreamProjection<Library> (creating a table mt_doc_library) where each row contains data for a single library, with nested authors and all data related to the books. Readers may also add their own highlights/underlines to any page of any book, which are also stored under the single Library entity.

I'd be interested in whether we could model this better using the facilities MartenDB provides?

Thank you.

floral vigil
#

What's the relationship with Library, is it one per tenant/user?

#

I'd suggest changing the model to be LibraryEntry so you can query the projections as a list rather than a monolithic entity. It'd make updates significantly more performant.

latent musk
#

Thanks for the suggestion, the library is a monolithic collection/blob that has several users writing to it, and it is not tied to a tenant.

latent musk
#

Here's a simplified version of the projection's data model,

public record Library
{
    public required Guid Id { get; init; }
    public required string Name { get; init; }
    public required ImmutableDictionary<BookId, Book> Books { get; init; }
    public required ImmutableDictionary<HighlightId, Highlight> Highlights { get; init; }
    public required ImmutableHashSet<User> UsersWithAccess { get; init; }
}

Do you mean we could make e.g. BookEntry that would contain e.g. projections of a single book in this library? Effectively refactoring the immutable dictionaries out into "Entries". 🤔

cold elbow
#

Working in this same project as @latent musk, to be clear, the Library itself can have data fields in addition to the Name in example above, like Address, ConstructionYear, etc.

zenith hound
#

Agree with @floral vigil that its almost always very beneficial to have seperate projections (Library, Book Highlight) and then on retriaval use something like included document feature of Marten to get out what you need, unless you really have a high traffic readmodel that literally needs to show the Library and all books, all highlights and all users without any form of paging etc.

floral vigil
#

Sorry, missed the replies. Always try to avoid projections that are effectively unbounded in size, they're a massive performance trap. If your model becomes a root object with unbounded lists of related entities (in the sense you'd consider them related in a normalized relational model) it's usually a sign that those entities should be separate projections.

storm nymph
#

If an (inline?) projection only cares about certain types of events in a stream, is it internally optimized so that it doesn't update the JSONB when events that are not relevant to the projection at hand are added to the stream?

zenith hound
storm nymph
storm nymph
#

Is making multiple projections out of the same stream a thing? Im thinking converting the subcollections into separate projections mostly to reduce table dead tuple bloat. Typically these collections are queried all together with the root object anyway, so I'm not sure if theres a need to have separate streams or individual projections of the nested entities.

zenith hound
#

Definetely a thing, typically a stream is about capturing the facts that happen i.e. the write side of CQRS whereas the projections are the read side, its very common to have many different views on the same stream. Some even take it to extremes and duplicate projection in vertical slices to reduce coupling.

Regarding the timetravel if its a very rare use case and performance is not crucial sometimes you can just query the aggregate directly for this sort of thing....

storm nymph
# zenith hound Definetely a thing, typically a stream is about capturing the facts that happen ...

From a modeling perspective I feel like having one stream would still be desirable as typically we care about the version of the root object and everything nested at the same time. For timetraveling just doing an ad hoc aggregation is what I had in mind.
But for the separate projections for the nested entities, is it possible to somehow project individual entities without projecting the whole collection?
Say in the earlier example I want to know what Book entities are projected out of the stream. If I project it as any Enumerable<Book> I would still have a relatively large JSONB at hand. Can I somehow project these into individual documents even if the stream is the same?

zenith hound
#

By all means have one stream. And yes it’s possible to project a subset of data from the stream. Lots of reading about this in the docs and shown in sample applications