Spring scheduling, long tasks with a short interval | Java Community | Help. Code. Learn. | Page 1

sage oriole Sep 18, 2023, 3:31 PM

#

I am building a smallish data analytics platform on real-time data. My clients want to be able to get updates every minute, even if the calculations take 3 minutes per run.

How could I do that?

low finchBOT Sep 18, 2023, 3:31 PM

#

⌛ This post has been reserved for your question.

Hey @sage oriole! Please use /close or the Close Post button above when you're finished. Please remember to follow the help guidelines. This post will be automatically closed after 300 minutes of inactivity.

TIP: Narrow down your issue to simple and precise questions to maximize the chance that others will reply in here.

pulsar copper Sep 18, 2023, 3:56 PM

#

what kind of calculations are you runnning?

#

cant you just store computed results in database every 3 mins?

grave beacon Sep 18, 2023, 6:06 PM

#

Schedule an action that will add a task to MQ each minute, then create a few workers subscribed to that MQ. Results also save to the MQ and create another process that will consume the results from it. Also remember about saving the timestamp in DB, so you won't accidently overwrite newer results.

sage oriole Sep 18, 2023, 7:42 PM

#

pulsar copper cant you just store computed results in database every 3 mins?

Future projections, forecasts, recursive calculations and statistical functions over a huge dataset.

My dataset somewhere in the range of multiple 10^9 and 10^12 rows.

sage oriole Sep 18, 2023, 7:46 PM

#

grave beacon Schedule an action that will add a task to MQ each minute, then create a few wor...

I'm using spring cloud already. Can't I just spawn a bunch of tasks for spring cloud workers?

urban cedar Sep 18, 2023, 7:58 PM

#

sage oriole I am building a smallish data analytics platform on real-time data. My clients w...

Couple thoughts come to mind.

I'd recommend some sort of precomputed cache (Maybe even DB layer) of common algorithmic values so you can do optimizations like this https://math.stackexchange.com/questions/1153794/adding-to-an-average-without-unknown-total-sum. If you timebox the computations (ex, calculate the sum until today at 12:00), then you can order the tasks in a reasonable manner. I'm thinking stuff like the sum, average, etc. This can help reduce your task time overall.

Otherwise by timeboxing the schedule, you really are only risking having too many tasks for the machine, but if we're already in the cloud shrugging

low finchBOT Sep 19, 2023, 1:00 AM

#

💤 Post marked as dormant

This post has been inactive for over 300 minutes, thus, it has been archived.
If your question was not answered yet, feel free to re-open this post or create a new one.

sage oriole Sep 19, 2023, 3:22 PM

#

I have thought of some things of my own.

Basically most of the data is a timeseries. I can mirror the database to a postgreSQL database with timescale.

It is great at computing things over time windows and the full dataset.

And I am going to mirror it to Apache spark. Apache spark should be better at doing the forecasts and complex and recursive calculations.

I know that this is a shot load of overhead, but I would argue it would by far be the fastest option.

urban cedar Sep 19, 2023, 3:25 PM

#

That is good

pulsar copper Sep 19, 2023, 4:18 PM

#

i wonder if you would benefit from https://questdb.io/

QuestDB | Fast SQL for time-series

QuestDB is an open source time-series database. The database is column-oriented and optimized for high-speed ingest and blazingly fast SQL analytics.

sage oriole Sep 19, 2023, 6:36 PM

#

I'll have a look at it. It looks promising

#Spring scheduling, long tasks with a short interval