#Spring scheduling, long tasks with a short interval

14 messages · Page 1 of 1 (latest)

sage oriole
#

I am building a smallish data analytics platform on real-time data. My clients want to be able to get updates every minute, even if the calculations take 3 minutes per run.

How could I do that?

low finchBOT
#

This post has been reserved for your question.

Hey @sage oriole! Please use /close or the Close Post button above when you're finished. Please remember to follow the help guidelines. This post will be automatically closed after 300 minutes of inactivity.

TIP: Narrow down your issue to simple and precise questions to maximize the chance that others will reply in here.

pulsar copper
#

what kind of calculations are you runnning?

#

cant you just store computed results in database every 3 mins?

grave beacon
#

Schedule an action that will add a task to MQ each minute, then create a few workers subscribed to that MQ. Results also save to the MQ and create another process that will consume the results from it. Also remember about saving the timestamp in DB, so you won't accidently overwrite newer results.

sage oriole
sage oriole
urban cedar
# sage oriole I am building a smallish data analytics platform on real-time data. My clients w...

Couple thoughts come to mind.

  1. I'd recommend some sort of precomputed cache (Maybe even DB layer) of common algorithmic values so you can do optimizations like this https://math.stackexchange.com/questions/1153794/adding-to-an-average-without-unknown-total-sum. If you timebox the computations (ex, calculate the sum until today at 12:00), then you can order the tasks in a reasonable manner. I'm thinking stuff like the sum, average, etc. This can help reduce your task time overall.

Otherwise by timeboxing the schedule, you really are only risking having too many tasks for the machine, but if we're already in the cloud shrugging

low finchBOT
#

💤 Post marked as dormant

This post has been inactive for over 300 minutes, thus, it has been archived.
If your question was not answered yet, feel free to re-open this post or create a new one.

sage oriole
#

I have thought of some things of my own.

Basically most of the data is a timeseries. I can mirror the database to a postgreSQL database with timescale.

It is great at computing things over time windows and the full dataset.

And I am going to mirror it to Apache spark. Apache spark should be better at doing the forecasts and complex and recursive calculations.

I know that this is a shot load of overhead, but I would argue it would by far be the fastest option.

urban cedar
#

That is good

pulsar copper
sage oriole
#

I'll have a look at it. It looks promising