#Spring scheduling, long tasks with a short interval
14 messages · Page 1 of 1 (latest)
⌛ This post has been reserved for your question.
Hey @sage oriole! Please use
/closeor theClose Postbutton above when you're finished. Please remember to follow the help guidelines. This post will be automatically closed after 300 minutes of inactivity.
TIP: Narrow down your issue to simple and precise questions to maximize the chance that others will reply in here.
what kind of calculations are you runnning?
cant you just store computed results in database every 3 mins?
Schedule an action that will add a task to MQ each minute, then create a few workers subscribed to that MQ. Results also save to the MQ and create another process that will consume the results from it. Also remember about saving the timestamp in DB, so you won't accidently overwrite newer results.
Future projections, forecasts, recursive calculations and statistical functions over a huge dataset.
My dataset somewhere in the range of multiple 10^9 and 10^12 rows.
I'm using spring cloud already. Can't I just spawn a bunch of tasks for spring cloud workers?
Couple thoughts come to mind.
- I'd recommend some sort of precomputed cache (Maybe even DB layer) of common algorithmic values so you can do optimizations like this https://math.stackexchange.com/questions/1153794/adding-to-an-average-without-unknown-total-sum. If you timebox the computations (ex, calculate the sum until today at 12:00), then you can order the tasks in a reasonable manner. I'm thinking stuff like the sum, average, etc. This can help reduce your task time overall.
Otherwise by timeboxing the schedule, you really are only risking having too many tasks for the machine, but if we're already in the cloud 
💤 Post marked as dormant
This post has been inactive for over 300 minutes, thus, it has been archived.
If your question was not answered yet, feel free to re-open this post or create a new one.
I have thought of some things of my own.
Basically most of the data is a timeseries. I can mirror the database to a postgreSQL database with timescale.
It is great at computing things over time windows and the full dataset.
And I am going to mirror it to Apache spark. Apache spark should be better at doing the forecasts and complex and recursive calculations.
I know that this is a shot load of overhead, but I would argue it would by far be the fastest option.
That is good
i wonder if you would benefit from https://questdb.io/
I'll have a look at it. It looks promising