#Using request-scoped data layer for transactional updates

48 messages · Page 1 of 1 (latest)

polar viper
#

Our API is gaining the requirement that certain models in our Mongo database can only be updated in a transaction, due to side effect DB operations that need to happen each time.

All the operations that occur to these various models should be inside the same transaction, scoped to the API request that triggered the updates.

In order to give these operations access to the same transaction, I was thinking of making the data access layer for these models "request scoped" with an injected "TransactionManager" that automatically starts a transaction per request and commits it before the response is finalized.

Is this a potentially bad pattern? I know there are drawbacks to using request-scoped services and making the data layer request-scoped has the knock-on effect of making the majority of the providers in this application request-scoped as well.

chrome drum
#

You will essentially make the whole application request-scoped, which comes with a whole load of considerations. AsyncLocalStorage https://docs.nestjs.com/recipes/async-local-storage might be a better option for propagating transactions

#

This is something that comes built-in into MikroORM or Sequelize btw, but for others it's not that hard to build a solution yourself

polar viper
polar viper
devout nebula
# polar viper correct

Mongoose won't scale well, like Scott showed me time ago. It is possible to make a multi-tenant application with mongoose request scope providers. But think that you will create all those modules for every connection, maybe for every request

#

With some models and some users, it will be painful

#

Nevertheless (with just 4 collections, and a dozen of users) I have it implemented and working in production

#

A solution is typegoose or another library that allows you to create the models before

#

Another solution is AsyncLocalStorage like papooch suggested

polar viper
#

the actual model compilation wont be request scoped, the "data" layer would basically be some request scoped services that import the already-compiled models

devout nebula
devout nebula
#

Request scoped providers bubble up, of course, and not down. But the models are relevant only in relationship with the DB, not the compilation

polar viper
#

just to make sure we're on the same page, what I'm suggesting would look something like

@Injectable({ scope: Scope.REQUEST })
export class UserQueryService {
    constructor(
        @InjectModel(User.name) private UserModel: Model<User>,
        private transactionManager: TransactionManager,
    ) {}

    async someQueryWithTransaction() {
        this.UserModel.findOne({}, { session: transactionManager.session })
    }
}```
#

so from my understanding, the injected UserModel would be the same across requests, while the transactionManager is different per request

#

so the model itself doesnt have any dynamic injection or multi-tenant stuff

#

its just mapped to a single DB single collection

devout nebula
#

¨Mongoose is very particular that you are certain that the model you are working with is 1-to-1 with the database and collection you are using¨, and I suppose you are creating a connection to the DB dynamically, aren´t you?

polar viper
#

no its established on application startup

#

or through whatever mechanism the Mongoose nest module uses in forRoot

devout nebula
#

Ah, are you creating all the connections for all your tenants at the very beginning?

polar viper
#

theres only one connection shared for all DB access, we don't have it partitioned by tenant like that

#

which isnt the greatest but basically all the data for all the different tenant projects is in the same collections, and all the queries search based on a "project" id within the document

devout nebula
#

That may be different

#

But look at the Durable providers nevertheless and at the AsyncLocalStorage, though

polar viper
#

yeah i definitely will

#

our current (and likely future) level of traffic on this API is. very low, like less than 1 request per second

#

so Im not overly worried about the overhead of instantiation I think? it's more about the drawbacks that come with request scoping things

devout nebula
#

It´s hard to manage, but not impossible, that´s what I can say about it

autumn briar
#

@polar viper
What requirements do you have that make transactions necessary? I'd question them from the start.

My opinion is, transactions in MongoDB are a band-aid enhancement to make the nay-sayers of RDMBS-land shut up about transactional reads and writes not being possible in MongoDB.

In the end, if you must have transactions in less than rare occasions, you probably need a different database. And, it sounds like you are putting a lot of merit into MongoDB transactions. Not something you really should be doing, if you ask me.

If you use transactions a lot and you need consistency across both reads and writes, then the very fast database you once had will be no more, as you must have majority read and write concerns across all replicas (in a replica set, which you must have to have transactions).

So, if you want, explain the requirements for transactions and we'll see if we can get you to a better place without them. 🙂

#

Oh, and if you base your code on MongodDB transactions:

  1. You are coupling your code highly to MongoDB. Not a total issue if you know you'll stick with MongoDB as the datastore. But if you aren't 100000% sure, it's a bad decision.
  2. You'll never ever be able to move to microservices. A system architecture in which MongoDB thrives. 🙂
#

I think the 2nd point is the most important, especially if you need to scale your app horizontally in the future.

polar viper
#

Hey @autumn briar thanks for all that information. I'll try to explain my use case and see if it holds up. This will be our first experience with Mongo transactions so appreciate the cautionary advice

basically we have a data model where there is a top-level document which owns many related documents. those owned documents can be potentially "moved" to have a relationship with a different top level document (can only be owned by one at a time) making the idea of deeply nesting the data in one collection difficult / impossible

however, there is also a requirement that a single API call should be able to update the parent document and its related owned documents in an all-or-nothing write to the database. At the same time, we want to layer on an "audit" system which can store a record of the historical snapshots of those complete documents, as well as a "metadata" record of which documents were changed in each particular operation / API call (basically a list of references to those historical snapshots)

we want those snapshots and that metadata document to be written in the same transaction as the actual changes so that there is a guarantee that the "audit log" of changes is always accurate to the current data stored in the database. For that reason, we stopped considering approaches like piping a change stream from the database to a separate service that could store those "snapshots" after the fact.

The current idea is to use Mongoose "post" hooks to automatically write those snapshot documents within the same transaction any time something changes, without the API business logic having to worry about it each time.

As for the idea of coupling to MongoDB, I think we're pretty far down that road already as our API code heavily relies on Mongoose to perform updates. I had briefly considered switching to something like Prisma but at our current company size and traffic volume it seemed like way too much lift.

#

we are also a very low-volume B2B application where the APIs don't have a super strict performance requirement, so we're sort of optimizing for "correctness" at this stage over performance

autumn briar
#

Thanks for the great explanation.

How big and how many "related documents" are there? Can there be an infinite number? Or is the number of related documents finite? How many is "many". Like a few to a dozen and what size would the documents be together with the parent? More than 16 MB? I'm sure you've looked over this as you mention "deeply nesting the data in one collection", which is absolutely possible. But, the issue isn't the nesting being in one collection, but rather in one document, which is the only atomic write MongoDB can make without transactions.

however, there is also a requirement that a single API call should be able to update the parent document and its related owned documents in an all-or-nothing write to the database.

What is the requirement for the all-or-nothing writes to the documents? Just want to be sure the requirement fits the justification of needing a transaction and I'm not understanding the need completely yet.

polar viper
#

The related documents number is definitely finite, and usually in the range of not more than 10 or 20. When I said nested in a single collection I did mean a single “document”, but the reason I was suggesting it was difficult to nest the data together rather than model it with relations is because the “owned” documents can also be accessed and queried separately via their own APIs. Our front end can display a paginated list of those documents in one view (across all the “parent documents “ ), and also show just the “owned documents” when viewing a single parent document’s page.

The requirement for all or nothing updates comes from the nature of the product. It’ll probably be helpful to just describe what these documents are.

We are a feature flagging provider, and the top-level document is essentially a “rule set” which contains some segmentation rules that define what users should get what values of flags. Those “flags” are the owned documents, and a rule set can hold multiple flags. It then defines “variations” which are permutations of values to apply to all those flags.

The all or nothing update requirement comes from the fact that the definition of those flags and the related rules and variations all need to be changed at once to ensure a consistent output when a user changes the configuration of a particular rule set (which can also involve changing the definitions of the flags, everything is controlled via a single form submission)

The flags are kept separate because they are treated as distinct entities which are independent of the rule set they currently belong to. They can be moved to a different rule set, or removed from them entirely and kept in the system for later use. They also have their own section on the UI where all flags in the project can be viewed together.

Hopefully adding the specifics didn’t confuse matters too much.

#

so basically the UI presents a big form to the user that can involve changes to both the rule set and flags at the same time, with a single “save” button. The expectation when the form is saved is that the new flagging configuration is immediately reflected by the system serving our flagging SDKs and that there can be no “partially applied” update where perhaps the rule set changed while a flag did not

autumn briar
#

Hmm.... ok. I'd say, if you can fit all the rules/flags and the parent document in one UI page, then the data behind it isn't going to blow up a document. Put it all in a single document. You'll thank me later. 😄

I'm confused though about the "flags are kept separate because they are treated as distinct entities which are independent of the rule set they currently belong to." This sounds like a many-to-many relationship. Is that the case? I'd be wary of this model, for this criteria. I'd suggest thinking like this. (Rule set -> flags) -> flag info, where Rule set and flags are in a document. You can put the state of the flag in the document, but the info somewhere else. Not sure, as I'm making assumptions on what is being stored, but it seems like it can still be effective, unless you need to show flag info on the same page too, and even then, you said the system isn't majorly active. It should still work well. 🙂

#

With this, you can have atomic saves for the rule set and flags. No transactions needed. If a flag changes, you change it in the info collection. Also no problem. Removal of flags is going to be a bulk update no matter how you do it, so no problem. Addition of flags only needs to be atomic to the document of the rule set. Case closed. 😄

#

Again, my mental picture is vague of your future system, but I'm pretty certain you can and should do your rule set and flags in a single document.

polar viper
#

there's basically a one (rule set) to many (flags) relationship, but the flags can have their parent ruleset reassigned, or can exist with no ruleset defined. The flag data stores the unique identifier to use in code to evaluate that flag (eg. new-shop-page) and also the data type of that flag (boolean, string etc.).

the values to set the flag to are stored in the variations of the ruleset. The UI form allows you to change the type of the flag as well as associate a new flag that was previously not associated to any ruleset, or remove an existing association (all of this is stored in the flag documents). At the same time you can modify the values the flag is being given and the segmentation rules being applied (stored in the rule set document)

So I think its modelled similarly to what you're thinking, in the sense that "flag info" is stored separately while "Flag values and state" are stored in the rule set. However, the "flag info" is also used as part of the flagging configuration output to determine the data type and identifier of each flag, since the ruleset relates to them by document _ID alone

All that aside though, there is still the secondary problem of how to reliably write to a "History" collection with snapshots of all this data as its changed. We're trying to guarantee that those snapshots are always consistent with the data that has been modified

autumn briar
#

Sounds like a good use case for an event store/ event sourcing.

#

Also, this use case sounds like it could get along fine with eventual consistency and with any very very very rare occasions of data inaccuracy. What would be the consequence if a flag is missing or not configured as intended? I'm certain everything is false initially. So, changes are always to the positive and missing those changes would mean the feature simply isn't turned on. Far from a catastrophe. 😄

polar viper
#

the flags can represent any arbitrary value so it's not always a case of going from "thing is off" to "thing is on". Sometimes a change could be "change variation B's definition so that the traffic-split flag goes from 0.2 to 0.5, but simultaneously stop serving that variation to a subset of traffic that should not receive that split percentage

autumn briar
#

Ok. I'd still venture to say transactions are overkill. 🙂