#Redis Pipeline Support
107 messages ยท Page 1 of 1 (latest)
@signal wagon - Pipelining is for sending redis commands in batches, not sending data in batches. What do you need this for anyway? I've never seen a Nest dev require this.
I have a scheduler/cron task that is fairly heavy and I am wanting to optimize the time it takes to store to Redis.
Cache data
That is regularly calculated and sent to Redis
Is it just one command to store the data?
Many.
Hmm... then I'd imagine you'd need to work with the client itself. Right?
Sure, if there's no direct support in the cache-manager and the cache-manager-redis-store
Yeah, so "cache manager" and the redis store packages are for caching and not really for batch processing. You'd need to come up with something custom with the client/ driver for sure.
Well this is cache data
I'd just like to batch the cache stores in a single batch, rather than as separate connections
What are you batch processing in a cron for your cache?
Just storing keys.
Caching data.
I have a couple of thousand of individual keys that are set
i.e. await this.cacheService.set(cacheKey, data); in a loop where cacheKey is composed and data is dynamically set.
Part of what process?
i have a separate nest.js deployment just for scheduled tasks
scheduler/scheduler.service.ts, and they run at @Interval()'s
Yes, but what are the cacheKeys used for at what point in the process?
the keys are used to store data to each key in redis, and then they are looked up by other nest.js deployments
i.e. to avoid hitting the database for commonly accessed data (that changes relatively frequently)
And at what point do you attach data to a key?
in this scheduled job
Ah, you changed the code above I see.
Yeah
It was pseudocode
Sounds like cache-manager doesn't have pipelining support.
So, I'd put forward that this isn't how cache should work anyway. My understanding is, you'd normally do the following:
- Process starts.
- Cache is checked for data.
- No data, then data is pulled from the database.
- Data is cached, with a TTL. TTL should be long enough for lowering the database load, but also not to end up with stale data, IF you intend to not have a mechanism to refresh the cache.
- Then it is back to 1.
"Priming" a cache with all results isn't needed. You are only "loading" from the database once (which you'd need to do too for the priming anyway).
Right, that's normal scenarios.
And that's fine, for some cache needs.
In this case, I have a highly realtime need for cached data primed and regularly updated.
I am dealing with millions of inbound data every day, and there are some heavy queries where Redis caching of specific data saves a lot of load on the DB cluster.
"regularly updated" data isn't a normal for a cache.
That's not true.
There's a lot of uses for Redis, it's not just "page fragment" caches or "api endpoint output" caches or basic JSON data caches
It's fine, I can simply extract this job to a go/rust/dart microservice, I was just trying to keep it in Nest.js for simplicity.
How can regularly updated data be croned in a batch process to a cache?
How do you insure you aren't ending up with stale data?
A batch can have many SET statements to Redis, and there's significant performance increase. There's no need to serialize or even parallelize the SETs with new connections and lots of roundtrips.
It's not stale because it's regularly updated (remember, it's a scheduled/cron job)
That doesn't answer my question.
I want to firehose it into Redis. I don't care if it's occasionally stale. It updates very frequently.
Of course, a TTL is set, so that if something goes wrong, the data goes away.
The cron/schedule job is very simple -- it pulls data from DB, it shoves into Redis. Other apps can pull from Redis (and then only hit DB when it doesn't exist). It's pretty straightforward.
It's working fine now. It's just slower than I'd like. I'd like to shave off a lot of overhead by pipelining it, since it's thousands of set() calls.
pipelining is truly beneficial for this scenario (I've done it many times before in other frameworks)
Sorry, but "updates very frequently" kills the whole premise of caching in my mind. If you update frequently, that means you batch the database for these "heavy queries" and that also puts a considerable load on the database too and "regularly"? You might be caching data for users who rarely even need the service, so wasting resources possibly?
Sounds like you actually need a more performant database. ๐คท๐ปโโ๏ธ Or, you need a way to capture "heavy queries" to get faster replies, via something like database views, which are also a type of caching.
But, if you insist on making this happen, I'm sorry. I don't know of a Nest way to make it happen. And usually when there isn't a solution already made for your problem out there, it probably means it is either a very niche and yet legit problem or your approach is wrong. ๐คท๐ปโโ๏ธ I'll just accept it is the "it's a niche but legit problem" on this one, as I don't have the worldly experience to consider it a bad approach. ๐
This package is a bit older, but it looks like it can offer direct client access. https://www.npmjs.com/package/nestjs-redis
It's alright. I'm going to do it in a go microservice.
Not looking to bend Nest.js to do something it doesn't do easily.
That package works with ioredis, which has pipelining. ๐
Yeah, but if I'm writing non-framework code and trying to shoehorn it into the service layers, I might as well just do it in a microservice and be done with it.
It's not so much an issue with the database. I mean, it is, in that I have billions of rows of data (terabytes of data) and it's just not sane to be hitting it for this data all the time.
You can only tune/optmize the DB so much, before you have to just stop trying to hit it for all requests.
For scale purposes, hitting a keystore in memory is so much faster than trying to let the database burn IO, mem, and CPU to hit giant indexes and burn the planner constantly.
I'm not questioning the reasoning for caching. I'm questioning your methodology of using the cache. ๐ But, like I said. I don't have the worldly knowledge to actually say it is a poor methodology. I've just never heard of doing a cache priming batch service that regularly updates a cache. That nestjs-redis package gets you close to using a redis client in a Nestjs manner for your service for sure. ๐
Oh yeah, Redis is used for stuff like this all the time.
Another process I know of for this kind of thing is called a data river.
Made a quick test in go. This will work fine.
Cool. Now you just need to set up the microservice.
Sure, but the drag will be on gathering the data to store from the database, no?
That's already happening.
Currently there's a 12-15 second cost to all the set()'s with the data to Redis without pipelining.
Once I pipeline it, I expect it'll be sub 1 second
How is the data for the caching being already gathered?
From the database.
There are highly tuned queries.
Retrieving a batch of results, then iterating on the results.
Pretty basic stuff.
So, a middleware?
Not in the nest sense, but in the integrations sense.
You are building a middlware.
Right now it's queried in the scheduler job in nest.js. But I'll implement the same queries in this new microservice.
No, it's a microservice, not a middleware.
It's a microservice functioning as a middleware. Sure thing. But, a simple one. ๐ Ok. I get the process now. And yeah, NestJS covers a lot of bases, but not this one. ๐
๐
If I may ask, what is the overall application being served? Is it something to do with IoT?
Oh. Cool! I'm a private pilot (too?).
I can imagine ingesting all that data is a concern for your database too.
Oh indeed. Spent a lot of time there.
That's a Dart service for the aggregator. Highly optimized, low memory, very tuned. Each ingest is broken out into separate service instances, etc.
Writes to the DB cluster's primary node, while reads go to the replicas.
there we go
Pretty substantial savings!
That's of course a very minimal struct though... will have to expand it out.
Awesome! It's an awesome hobby.
We have a pretty thriving Discord if you're interested.
Looks like it is more about commercial flights. So, nothing I've been directly involved with as a private pilot. It all sounds new to me. ๐
Not just commercial flights. Any airframes that have transponders.
So there are definitely corporate, military, etc
Not a lot of little private cessnas of course
Not too shabby... and that's on my dev system reaching across a VPN to production servers, so deployed in production natively should be even faster.
did you json marshal those structs with the Go's internal encoding/json package?
yeah
give go-json a try! it's a drop-in replacement of encoding/json that performs better with large datasets.
also, I'm not sure how frequently you execute those pipelines, but increasing the go-redis connection pool option will also decrease the waiting time for busy connections to be ready ๐๐ผ
Really great tips, thanks ๐
And thank you @rigid quail for the back and forth earlier as well
Production definitely faster for all things of course. Still need to add more data to the JSON structs (nested details), which will bring the DB time back up, but necessary. And the go-json improvement will benefit there of course.