#ChronoGrapher - One Unified Scheduler, Unlimited Power

1 messages · Page 5 of 1

fervent lark
#
fn simulate_duration(interval: TaskScheduleInterval) -> Duration {
    let now = SystemTime::now();
    let next = interval.schedule(now).unwrap();

    next.duration_since(now).unwrap()
}

@formal sedge seems kind of hacky, though i do get the reason

#

also i've split the parsing to a TimeLiteral for parsing individual literals

#

this way. we can reuse the logic in future macros

fervent lark
#

but

#

basically

#

since we have Duration which is consistent, its better to trust it vs just doing the schedule method and getting the duration from there

#

its this close...

fervent lark
#

im removing Vec<T> allocations from timing wheel and even in the SchedulerEngine

#

in an effort to reduce temporary objects and improve performance

fervent lark
formal sedge
fervent lark
#

in main_cg.rs

#

`struct MyTaskFrame;

#[async_trait]
impl TaskFrame for MyTaskFrame {
    type Error = Box<dyn TaskError>;

    async fn execute(&self, _ctx: &TaskFrameContext) -> Result<(), Self::Error> {
        yield_now().await;
        COUNTER.fetch_add(1, Ordering::Relaxed);
        Ok(())
    }
}

pub async fn benchmark_chronographer() {
    println!("LOADING TASKS");
    let t = tokio::time::Instant::now();
    let scheduler = Scheduler::<DefaultSchedulerConfig<Box<dyn TaskError>>>::default();

    const EXEC_TIMES: usize = 6;
    const TASKS_ALLOCATED: usize = 450_000;

    let spread_millis = 1000.0 / ((TASKS_ALLOCATED * EXEC_TIMES) as f64);

    let mut millis = 0f64;
    for _ in 0..TASKS_ALLOCATED {
        millis = (millis + spread_millis).rem_euclid(1000.0);

        let task = Task::new(
            TaskScheduleInterval::duration(Duration::from_millis(millis.round() as u64)),
            MyTaskFrame
        );

        let _ = scheduler.schedule(task).await;
    }

    scheduler.start().await;

    println!("STARTED {}", t.elapsed().as_secs_f64());
}
#

then in main.rs

#

mod main_cg;
mod main_tokio;

pub static COUNTER: LazyLock<AtomicUsize> = LazyLock::new(|| AtomicUsize::new(0));

pub async fn benchmark() {
    let mut last = COUNTER.load(Ordering::Relaxed);

    let mut file = OpenOptions::new()
        .append(true)
        .create(true)
        .open("tasks_per_sec.csv")
        .unwrap();

    writeln!(file, "time_sec,tasks_per_sec").unwrap();

    for i in 0..=50 {
        tokio::time::sleep(Duration::from_secs(1)).await;

        let delta = COUNTER.swap(0, Ordering::SeqCst);

        println!("{}", i);
        writeln!(file, "{:.2},{:.2}", i, delta).unwrap();
    }
}

#[tokio::main(flavor = "multi_thread", worker_threads = 16)]
#[allow(clippy::empty_loop)]
async fn main() {
    benchmark_chronographer().await;
    benchmark().await;
}
fervent lark
#

guys...

#

Honestly imma take a beak

#

i really can't work for some reason, i have pumped some stuff

#

but im just not as productive as i was

#

that doesn't mean ofc work won't be done

placid drum
#

@fervent lark can you tell me about ConditionalFrame i am writing doc for with_condition

fervent lark
#

if it returns true

#

if it returns false, then depending on whenever you specified to return an errror or just success it will do that

#

optionally you can specify a secondary TaskFrame to execute upon a falsey value

fervent lark
#

should be mentuioned the predicate function is not a boolean

placid drum
#

okay

fervent lark
#

btw we do need to move quick

#

its 19th March

#

less than one month

fervent lark
#

they are a bit cumbersome to mantain

#

ferrisThink i don't have any other solution in mind, so this is the best we have

fervent lark
#

i have a very hacky idea for CRON expressions for the cron! macro

#

extract the tokens, convert them to string and pass it down to TaskScheduleCron::from_str to create the instance

#

mapping any errors from Result into compile-time ones

#

then take the fields of the CRON, and pasting them into the macro i presume

#

that way we have one source of truth

fervent lark
#

@placid drum your with_fallback method is perfect

#

like 10/10 perfect

#

This part feels eh

/// ... the result depends on how the underlying [`ConditionalFrame`] is configured; it
/// can either return an error, or just resolve as a success without executing the task. In the context of `with_condition`,
/// it acts as a no-operation and returns a success by default upon a falsey value.
#

It bores the user with many details

#

on with_dependency you don't list the other method

#

fixed and merged the PR

placid drum
placid drum
#

@fervent lark you change almost everything

#

you can comment on pr so i can improve furthur

formal sedge
#

feels bit hacky

#

but it doesnt seems that there is other way to do it

fervent lark
#

if i were to comment, well i'd have to wait for you to change it then wait for me to notice and review then merge

#

wwhich takes time

fervent lark
formal sedge
#

Iteration 495198: Memory: 986 KB (Delta: 0 KB)
Final memory: 986 KB
Total leaked: 0 KB
Memory growth was minimal.

fervent lark
#

im also considering another approach for handling Tasks

#

which (probably) will naturally delete

formal sedge
#

The leak is gone

#

Or i didnt see an API update

placid drum
fervent lark
#

it will naturally be gone after the TaskHandle approach im gonna take

formal sedge
#

But it is already deleted

fervent lark
#

ok

#

wait wdym its already deleted?

#

it still exists

#

Task doesn't notify the hook registry to remove the hooks

formal sedge
#

Maybe my bench is not up to date

fervent lark
#

this is complicated... Im trying to set up an epoch-based storage solution

#

the idea is as follows

#

the SchedulerTaskStore is a fixed array of shards. The number of shards that live in a store are (cpus * 4).next_power_of_two()

#

each shard has its own buffer inside (a Vec<T> wrapped in an RwLock) and a free list. (The entire buffer is wrapped in an Arc<T> for reasons explained below)

#

Each element is a slot in the Vec<T> managed by epoch-based GC

#

When a new Task is pushed, the store checks which shards have a free slot by linearly searching (walks through) all shards, optionally we could use a global SegQueue by im not sure about that. If it doesn't find any then it picks a random shard to allocate increase its buffer size via RwLock to fit in an extra slot for the Task (which is kind of expensive but quite rare so it can be avoided)

#

if it does find though a free slot, then it replaces the Slot's contents with the Task. Each Slot has a generational counter which just counts how many times the slot has been reallocated to prevent ABA-related problems (even though epochs can manage it for us, combing both approach can lead to better throughput theoritically). The generational counter is incremented every time a slot is reused (from free it becomes owned)

#

then when a Task is allocated on the specific BufferShard, the generation and index is returned. Then we create a EphemeralTaskRef<C> which stores the index and generation PLUS a Weak<BufferShard> (that was why we needed the Arc<T>)

#

and its returned for the user to clone and use as much as they want

#

for reading slots, we simply use this shard reference, peek in the buffer with the index to get its contents, check for the generational counters matching (if not then its invalid) and simply return the contents

#

for removing a Task it marks the slot as free and pushes it on the free list

fervent lark
#

i've made use of hazard pointers over epoch based GC

fervent lark
#

man im quite close

#

i just have a couple of issues 🫩

#

wait wha

fervent lark
# fervent lark i just have a couple of issues 🫩

one of the issues is TaskFrameContext, i need some effecient way to be able to send commands to the TaskHandle and let it execute but not expose the TaskHandle inside the TaskFrameContext as it would require C: SchedulerConfig everywhere

#

one of the ideas is to use a command queue-based system but this can get expensive if i manually create one myself and delete it, alternatively each Task could have its own queue pre-made but still expensive

#

the idea is to use the workers to do this for me perhaps

fervent lark
#

bois...

#

these diagrams are for explaining the TaskHandle based approach

fervent lark
fervent lark
#

man the TaskFrameContext is one of the bigger problems

#

and ofc il be doing this stuff alone...

fervent lark
#

this is pain in the ass

#

completely

placid drum
#

@fervent lark did latest commit have some issue ?

#

like i am not able to build

fervent lark
#

which is why i put a [HEAVY WIP]

#

@placid drum btw

placid drum
placid drum
fervent lark
fervent lark
#

and we need well help bc of how hard the problems are with this migration

#

well not we, mostly me, @jolly frost doesn't respond since he has a busy schedule

#

and the only other active contributor is you really

fervent lark
# fervent lark

honestly the docs can wait, this one is quite important change imo

placid drum
fervent lark
# placid drum i'll try to look into code

look in the diagrams and if you need anything let me know, should i explain the problem and the 2 solutions and then the difficulties of the TaskHandle solution or things are suffecient?

placid drum
fervent lark
#

ok

placid drum
#

@fervent lark can you give me issue ?

fervent lark
placid drum
#

anyway i was going to delete comment but reply

fervent lark
#

if you mean gh issues, well

placid drum
#

thanks

fervent lark
#

look on the alternatives

#

but its not in full detail

#

il write up in more detail the approach

placid drum
fervent lark
# placid drum that would be wonderful

ok so the problems as you understand is basically How do we reference and manipulate a Task instance in outside code. Think like TaskDependency or anything else similar which may require a task instance and use it

#

It should be mentioned in the system, the Scheduler should OWN the Task instance when scheduled and later on (just for other outside code we need a reference mechanism)

#

now the 2 approaches are Task<T1, T2> -> TaskIdentifier. It involves us creating a Task<T1, T2> instance with the TaskFrame / TaskTrigger which has its own TaskHooks, this is typed and can be shared around via &Task<T1, T2>

#

however it has to be delivered to a Scheduler to take an effect, and you have to pass it by ownership. Which then returns a TaskIdentifier

#

you have to carry around the Scheduler and the identifier, and basically you can call Scheduler operations and supply the identifier as an argument

fervent lark
# fervent lark

in the diagram, towards left are the pros/cons of this approach

fervent lark
fervent lark
#

then there are TaskHandles, they are sort of the TaskIdentifier. Basically you immediately ask a Scheduler given a TaskFrame and a TaskTrigger to create a Task instance, then it returns a TaskHandle<C> which allows you to interact with said instance without ever knowing the Scheduler directly.

Not only it allows task-based operations such as getting TaskFrame / TaskTrigger and working with TaskHooks. It also allows Scheduler based operations such as scheduling, cancelling... etc. This handle can be cloned around freely

#

the TaskHandle<C> isn't a trait but rather a struct which contains a TaskRef, this is where the SchedulerTaskStore comes in play. Imagine this, the store declares how the task instance is meant to be interacted with and then the handle wraps this along with the other Scheduler composites and provides the Scheduler based operations on top

#

its a 2 step process

#

again look the diagrams for pros/cons and how its layed out

#

@placid drum is this suffecient to explain the system?

placid drum
#

this should be sufficient

fervent lark
# placid drum ya

also one of the issues tackled in both solutions are What happens if a Task is removed and a reference to it is still kept around, don't worry about this one since its already solved but just know it exists

#

should also mention, focus on the TaskFrameContext problem

fervent lark
#

@placid drum you've come up perhaps with any solution?

fervent lark
#

man for the love of god, if i try to ask for help from the outside community apart from us. They'd need to either dig through the code which most won't, or i'd have to explain everything in a manifesto format which no one reads and ignores or omit details on purpose which then peps ask about the problem. What the fuck is this TaskHandle problem ferrisCluelessest

#

ITS BEEN 2 WEEKS OF BANGING MY HEAD CONTINIOUSLY

fervent lark
#

i really, and i mean really wanna move on from this problem

#

im litterally going insane over one problem

#

like i try this, fails, i try that it fails. Send help

placid drum
#

I'll ask

placid drum
#
fervent lark
placid drum
fervent lark
#

ye i checked it out

#

at least this guy suggested stuff tbh

#

unlike leaving us blind

placid drum
#

ya

#

i guess he will help out

fervent lark
#

il do need then to explain the problem differently

placid drum
#

solution what is think - can we keep state and then we can access that, but we have to pass that down the line like state in axum, what do you think ?

fervent lark
#

give a sec

placid drum
#

ya sure take your time

fervent lark
#

hmmm il need to edit it tbh

#

i gotta admit, i like this transprency and honesty instead of ghosting

#

as now i can just improve yknow

fervent lark
placid drum
fervent lark
#

ok

#

il work on it when i can

fervent lark
#

its gonna be complicated to figure out how to best describe the issue

#

il try to tackle the problem at a different angle

fervent lark
#

so i've figured out a sort of system

#

the idea is there will be 2 systems for communicating with the Scheduler

#

the command system and the query system

#

everything happens through the SchedulerWorker

#

each worker has its own command/query system

placid drum
placid drum
fervent lark
fervent lark
fervent lark
fervent lark
#

this is fucking hell

#

i've created a new branch

#

to do all sorts of stuff with the handles

#

if we can't really resolve this, then i will honestly delete the branch

placid drum
#

best of luck

#

why there is lot of option to do same thing in rust ahh just learning makes headach

placid drum
fervent lark
#

its so fucking hard to abstract the config out and not require a generic at all

#

just end my misery and let the TaskHandle approach work as i want it to be ferrisPray

#

litterally i beg to stop this misery

fervent lark
# fervent lark

man... I have no other idea than to either embrace this which has problems with TaskHooks or something else ferrisMoyai

#

tbh since we don't make any progress

#

lets forget the TaskHandle based approach and return to Task<T1, T2> with a couple of adjustments

zenith urchin
#

No worries, it wasn’t rude or needy. We have a target to achieve after all. I will submit my thesis on June, so it might not be possible before that because I am falling behind schedule on the thesis right now. I haven’t answered because discord was banned in my country for the last two weeks 😄

fervent lark
fervent lark
#

ok i fucking give up

#

like the problem is hard to explain properly

#

i litterally don't know how to formulate it without writing a manifesto

#

im out of ideas and its honestly a huge roadblock

fervent lark
#

a horrible sign

#

we have like 3.5 years to deliver this project to a presentable shape (which has cloud support and distributed systems stuff, plus other stuff im lazy to mention)

#

honestly il set a deadline for the TaskHandle API, if well we don't get it through April 1st, we'll just ditch it entirely

supple maple
#

is this project still open for contribution? im not sure i have the prerequisite knowledge but based on the original message i am potentially interested, and i have a decent amount of free time at the moment

fervent lark
#

though i do reccomend like a strong foundation in Rust, design patterns and so on in order to be most useful. But the minimal stuff is like 5-6 months of experience in Rust to be somewhat viable for the team

#

currently the most useful thing you can provide is being a design partner, essentially helping with how the API feels and its architecture, then comes unit tests, documentation and implementing some systems

#

from the jist of it, you seem to be somewhat viable for the team

#

Also should note that currently the biggest roadblock is designing the TaskHandle API, which is why rn design partners are heavily needed

supple maple
#

okay, interesting. how should i go about catching myself up?

fervent lark
supple maple
#

i meant, catching myself up in terms of this project specifically

fervent lark
#

well

#

there are some issues you can try to tackle

fervent lark
#

@supple maple il work on some diagrams to better explain the architecture, if you want, just let me know

supple maple
#

that might be useful, for now ill just read through the codebase and examples

fervent lark
#

again lmk if you need any help, im open to questions / guidance

supple maple
#

i guess my primary issue is that its a huge codebase and i only have a vague idea of what a scheduler is (i havent used one extensively in the past). some broad explanations would be useful, if you dont mind

#

like, my understanding from the examples and what ive previously encountered is that you define a bunch of tasks (in the form of functions) and then use the proc macros (if youre using the rust api) to define when and how tasks should run. then you use the scheduler api to make a scheduler and then pass tasks into it

fervent lark
#

haven't warned you yet, but ig you might have seen it from the README, the macros aren't yet ready. So far they are a conceptual idea

fervent lark
fervent lark
#

ok so @supple maple, il get to work on explaining the concepts

#

should i go broad or any specifics you need?

supple maple
#

broad

fervent lark
#

both have 3 components that make them up

#

they can coordinate between each other (with the exception of TaskTrigger and TaskFrame, plus TaskTrigger and TaskHooks but this might change). It should be noted SchedulerEngine has its own component called SchedulerClock but it isn't managed directly by the Scheduler, rather the SchedulerEngine

#

Let's start from the Task side of things, in basic terms a Task is a unit in which the Scheduler manages to execute at a specific time (via its TaskTrigger). Now Tasks are more powerful than just "A function with a schedule" via various patterns

#

TaskFrames are conceptually the code in which you execute, though they are more powerful as they can define workflows. Workflows in simple terms are TaskFrames wrapping one and the other (decorating pattern from OOP). Execution works from top to bottom, each TaskFrame controls itself how to execute nested ones

#

ChronoGrapher already provides its own various TaskFrames (more specifically they are called workflow primitives). Examples may include:

  • RetriableTaskFrame Retries the nested part up to "N" times with a specified delay (or immediate)
  • TimeoutTaskFrame Enforces a time limit on the nested part
  • FallbackTaskFrame Executes the first nested part, if it fails then it executes the second nested part as a fallback. If both fail it fails entirely.
    <...>
#

on their own they are simple, the power comes when chaining these simple primitives with your code

supple maple
#

interesting

fervent lark
# fervent lark on their own they are simple, the power comes when chaining these simple primiti...

NOTE 1# The order matters significantly, chaining retry -> fallback is NOT the same as fallback -> retry. While it may seem an annoynace / nuasance of ChronoGrapher at first, its what allows for basiccally infinite types of workflows. You may want to execute a fallback to catch common errors without retrying or a global fallback for any errors that leaked through

NOTE 2# You can stack multiple retry, fallback or any other workflow primitive even on top of each other, this is especially useful as some workflow primitives have their own scope. For example you can stack a global timeout for the entire workflow and nest a timeout on a fallback to only enforce this time limit on that fallback

supple maple
#

by "retries" do you mean it repeats it or only retries in the case of an error?

supple maple
#

which one?

fervent lark
#

the second

supple maple
#

okay interesting

#

how would you make a task repeat several times (regardless of if it succeeds or not)?

fervent lark
# supple maple okay interesting

then there are TaskTriggers, these allow you to define WHEN you want to execute the Task, they are also just functions but they are supposed to remain more simple and barebones compared to TaskFrames. ChronoGrapher also provides its own primitives such as:

  • TaskScheduleImmediate for immediate execution
  • TaskScheduleInterval for basic interval-based execution
  • TaskScheduleCron for CRON-based expressions (ex. * * * * ? *, follows Quartz style cron)
  • TaskScheduleCalendar for more complex scheduling rules

NOTE: You may notice i use TaskSchedule and not TaskTrigger. There is a TaskSchedule trait which implements the TaskTrigger trait automatically, the main difference is TaskTrigger may sit idle for a while and only announce the time it calculates when it feels like it. Whereas TaskSchedule immediately announces it

  • The former is more non-deterministic and non-immediate (like an API request or monitering)
  • The latter is more mathematical / computational (like an interval or CRON)
fervent lark
#

like repeatedly schedule and then execute?

formal sedge
fervent lark
# fervent lark like repeatedly schedule and then execute?

if you mean this, a Task by default can execute infinitely. The cycle goes:

  • Task is scheduling (figuring out its own time)
  • Task announces the time -> Scheduler sorts it and fills it in a slot of time
  • Scheduler waits around, executing other Tasks in the meantime
  • Task's time is due, so Scheduler starts executing
  • Repeat
fervent lark
#

i gtg, il be back in 15-30 minutes. Do let me know @supple maple what questions you have, any confusions and so on so i can resolve those

supple maple
#

alr, sounds good

#

okay so like if i can say a concrete example

#

lets say the task MyTask just involves printing Hi to the console, and I want to print Hi to the console 10 times at once every hour

supple maple
# fervent lark > **NOTE 1#** The order matters significantly, chaining ``retry`` -> ``fallback`...

let me check my intuition here
let say i have tasks T, U. then retry (fallback T, U) 2 (making up my own notation) would try T, if it failed try U, and if both of those failed, try T again, and if that failed then try U again, and finally if that failed the whole thing fails.

fallback (retry T 2) U would try T, if it failed try T again, if that failed try U, and if that failed the whole thing fails

#

so the first one tries T -> U -> T -> U and the bottom T -> T -> U

fervent lark
#

or you could make your own TaskSchedule that does just that

fervent lark
supple maple
#

okay what if i wanted a task T to run, output some integer N, and then run task U N times

fervent lark
#

if you mean TaskFrames ig just tell frames since its kind of ambigious what you want

supple maple
#

what im getting at is can taskframes like communicate with/influence each other

fervent lark
#

there is an API called Shared Objects, this allows a parent TaskFrame to declare a shared object which is then passed down to its child TaskFrames, this object can be accessed by these child objects in read-mode (with interrior mutability, you can make it mutable as well)

#

the shared object can be in any shape of data, it can be a single integer, a struct, an enum... etc.

#

they work like React's context API

#

NOTE 1# There can be multiple shared objects of different types, for example a TaskFrame T could have a shared object called MyT whereas U could have a shared object MyU. Each shared object has its own scope of where it can be accessed. For instance if i define at the leaf of the workflow a shared object, the parent TaskFrame can't really access it (thinking about it, they can with some hacky workarounds, but its really an anti-pattern).

NOTE 2# If there is a case where there are multiple T, with their own MyT shared objects nesting each other (directly or indirectly). Then only the latest / closest / last shared object is accessible, on other scopes it could be the first, second.. etc.

NOTE 3# An internal detail but worth mentioning, the shared object API is not itself a true API in the sense but a wrapper around TaskHooks. Which means you can replicate the API yourself if you want via TaskHooks, its just a nicer wrapper

supple maple
#

okay interesting

#

you mentioned TaskHooks, what are those?

fervent lark
# supple maple you mentioned TaskHooks, what are those?

these are the backbone of ChronoGrapher when it comes to extensibility (you will see these around on various integrations, extensions and so on. Even in the core they are used internally), il start with the problem:

"Suppose I have a Task with a complex workflow, how can I listen to / observe whats happening when its running?"

They mostly solve this issue where they listen to a specified number of events defined (and attached, there are 2 phases to this), HOWEVER they are far more powerful and allow for patterns such as state management (additional data can be embedded on a Task), markers (same as state management, but with no data, just they mark a Task via their presence), post-error handling (by listening to when a Task ends, you can try to resolve an error with indirect means)

#

consider something like this (taken from the README)

struct PrometheusMetricsHook;

This is a TaskHook used for integrating with Prometheus, its a dummy example and there will be a serious extension on top later down the line.

I want this TaskHook to react to specific events, for our case its 4. We can do:

impl TaskHook<OnTaskStart> for PrometheusMetricsHook {
  async fn on_event(&self, event: OnTaskStart, ctx: Arc<TaskContext>, payload: &OnTaskStart::Payload) {
      // ...Increment the number of running Tasks and update metrics...
  }
}

For listening to when a Task is about to start

impl TaskHook<OnTaskEnd> for PrometheusMetricsHook {
  async fn on_event(&self, event: OnTaskEnd, ctx: Arc<TaskContext>, payload: &OnTaskEnd::Payload) {
      // ...Decrement the number of running Tasks and update metrics...
  }
}

For listening to when a Task is about to end

impl TaskHook<OnTimeout> for PrometheusMetricsHook {
    async fn on_event(&self, event: OnTimeout, ctx: Arc<TaskContext>, payload: &OnTimeout::Payload) {
        // ...Executes when a TimeoutTaskFrame throws a timeout...
    }
}

For listening when a TimeoutTaskFrame reports a timeout error

impl TaskHook<OnHookAttach<OnTaskStart>> for PrometheusMetricsHook {
    async fn on_event(
        &self, 
        event: OnHookAttach<OnTaskStart>,
        ctx: Arc<TaskContext>,
        payload: &OnHookAttach<OnTaskStart>::Payload
    ) {
        // ...You can initialize logic for when it is attached to a OnTaskStart event...
    }
}

For listening to when that hook is attached to a OnTaskStart event.

Then you have the second phase where you attach those events to a Task like so:

let hook = Arc::new(PrometheusMetricsHook);
task.attach_hook::<OnTaskStart>(hook).await;
task.attach_hook::<OnTimeout>(hook).await;
#

every event may contain a payload, this payload can be used to extract info about the event and what happened.

#

there are various patterns to TaskHooks. First of, if you want you can generalize a method to listen to specific events like so:

impl<E: TaskHookEvent> TaskHook<E> for PrometheusMetricsHook {
  async fn on_event(&self, ...) {...}
}

This method will activate for every type of TaskHookEvent, it doesn't know directly the type of event it got activated from. Though it allows for you to attach it onto any type of event.

Though be warned you do have to manually attach the events you want, this is sadly one thing i do have in my mind to fix

However if you want more restrictions to impose on what kinds of events are allowed (helpful for narrowing, there are THEGs (TaskHookEvent Groups). They are just traits that implement TaskHookEvent and may contain a specific payload type shape (but depends).

An example of this are the Task lifecycle event:

// Only the OnTaskStart and OnTaskEnd is allowed, NOTHING ELSE
impl<E: TaskLifecycleEvents> TaskHook<E> for PrometheusMetricsHook {
  async fn on_event(&self, ...) {...}
}
#

Another pattern is Hook-To-Hook communication. TaskHooks can basically communicate with one and another via their own sets of events (yes you can create your own events just like how the core does). The idea is a TaskHook emits an event and some other TaskHook implements the event method to listen to (and ofc gets attached)

TaskHooks can also inspect if there are TaskHooks around, attach their own or detach some specific ones. But its not limited to TaskHooks, ANYTHING can achieve those activities without any restriction (just know the type of TaskHook you want to act upon), even the Scheduler, even TaskFrames... etc.

There is also a conceptual idea, not yet formed. But you may ask yourself:

"What happens if I have 2 or more same type TaskFrames emitting events and I want to narrow which TaskFrame is allowed to emit those events?"

Well introducing (not yet) the Mute & Transform pattern, there is a special TaskFrame called MuteTaskFrame, it allows you to define various events to mute (effectively if any nested part emits those events, it never gets through to the TaskHooks, its a middleman).

Now the interesting thing you can do is take those muted events and capture them (you can run side effects per event). You can emit one or more events back if you want (which is where the transform comes in).

The pattern basically involves being able to run specific events on a certain scope in a filter which will then emit different event(s), or i guess mute it entirely if you want.

To get an example of how this pattern works, suppose the workflow:

TaskFrameA
|- TaskFrameB
|- TaskFrameC
   |- TaskFrameB
   |- TaskFrameD

And then suppose TaskFrameB fires a MyEventB even for TaskHooks to listen, but i don't want the nested B (with the sibling of D) fire, so i can change the workflow to:

#
TaskFrameA
|- TaskFrameB
|- MuteTaskFrame
   |- TaskFrameC
     |- TaskFrameB
     |- TaskFrameD
#

now only the top most TaskFrameB gets to emit freely MyEventB. Depending on how you set up MuteTaskFrame, you could re-emit the event as say MyEventB2 or something along the lines, or you can fully mute it. You can of course put different mutes in your workflow all tuned with their own settings

#

@supple maple you got the idea i presume?

supple maple
#

sorry i was away, give me a few min to catch up

#

yes okay i think i understand

fervent lark
fervent lark
#

btw you have used any job scheduler / distributed task queue / workflow orchestrator before? (ex. Celery, Temporal, Apache Airflow, BullMQ... etc.)

fervent lark
supple maple
#

what are TaskHandles? you mentioned that earlier

fervent lark
# supple maple what are TaskHandles? you mentioned that earlier

its a shift in how things are managed, in the previous model you created a Task object right? This contained your TaskHooks, TaskFrame, TaskTrigger and so on right? Then you deliver this Task onto the Scheduler for it to store it somewhere for scheduling and stuff (fully owned)

#

then the Scheduler returns a TaskIdentifier which using it and the Scheduler, you can trace that Task back and do all sorts of modifications

#

such as cancelling a Task, removing it fully, rescheduling it or immediately executing

fervent lark
fervent lark
fervent lark
# fervent lark bois...

this model has severeal pros / cons, which you can check it out here, btw the problem more precisely is:

"How do I reference and interact with a Task in outside code (may include TaskFrames, TaskHooks... etc.)?"

#

these 2 models are just approaches (the more i think about it, the more it may be better to mix them, keep the intermediate pattern but empower the identifier in some way). Either way this is still an open ended discussion and being implemened (or at least trying to) as we speak

light pierBOT
fervent lark
#

ngl imma pin this to guide other people like you who have questions since ARCHITECTURE.md is somewhat outdated and not enough information is currently availaible such as API docs and the Guidebook (due to WIP status)

supple maple
#

yea thats a good diagram

supple maple
fervent lark
#

the TaskHandle is a middleman, it gives you methods in which you can operate upon the Task. WITHOUT knowing any details about the scheduler side

#

this middleman can be cloned around and freely shared. Currently the idea is

fervent lark
supple maple
fervent lark
#

except i don't pay much of the cost of Arc<T> (unless i want to, depends)

#

mainly the costs are cloning it around and memory wise

fervent lark
#

in summary its a slotmap (custom implementation for high concurrency). And basically the only stuff i store are 2 usizes + a u16 (though may change to a u32)

#

so 10 bytes per key

#

whereas the Arc<T> is 8 bytes per instance

#

wait...

#

actually the u16 will remain as that nvm

#

oh shit, apparently SlotKey is more expensive in memory around 24 bytes

#

but the good thing is if you don't use it, the memory is freed so ye...

#

previously the idea was to use a DashMap<usize, Arc<ErasedTask>>

#

initially with zero entries, the slotmap uses 16 bytes over dashmap's 40

#

the slotmap in terms of allocation, is also quite faster than dashmap

#

actually nvm

#

same-ish speed

#

for blocking_allocate

#

dashmap uses btw ~59.68MB, the slotmap uses ~24.77MB which is nuts

#

and there might be a chance i can compress it even further

fervent lark
#

oh shit...

#

apparently without Arc<T>, my slotmap is heavier

#

ye il fix this

placid drum
#

great recap

fervent lark
#

thx

fervent lark
fervent lark
#

actually nah imma optimise the slotmap

fervent lark
#

tbh fuck the TaskHandle based approach

#

we'll revert the system

#

done

#

@supple maple

#

you can run ChronoGrapher like normal, forget about the TaskHandle appoach and shenanigans (though its being considered to add this back in the future)

#
GitHub

What area has an architectural problem? Please describe. The Task area specifically allocating Tasks to the Scheduler and referencing them elsewhere. Why does this area have this architectural prob...

GitHub

Is your feature request related to a problem? Please describe. Currently FallbackTaskFrame doesn't share the error with the secondary TaskFrame, the error can be indirectly shared via a hacky w...

fervent lark
#

imma work on https://github.com/GitBrincie212/ChronoGrapher/issues/150, the idea is users toggle via compiler features tokio, smol or even use sync. I provide primitives for spawning, declaring functions and so on via macros and depending which feature is set, that section is active which is compatible with what we want

GitHub

Powerful, developer-experience centric, blazingly fast and extensible job scheduler and workflow orchestration platform - Issues · GitBrincie212/ChronoGrapher

#

basically ChronoGrapher's code is being "rewritten" to support other runtimes than just tokio

fervent lark
#

I'm removing the TaskSchedule trait btw

#

its just an alias really and its more confusing than good

fervent lark
#

the code won't be the prettiest thing

#

you will see like lots of macro magic used

#

also 90% of ChronoGrapher is being rewritten to support multiple runtimes

#

actually no

#

there are a few reasons for not doing it apparently

fervent lark
#

man rn i am lazy

#

not april fools joke lol

fervent lark
#

btw @supple maple howz progression going? What you planning to do?

supple maple
#

sorry, some school stuff came up, ill be much more free starting this weekend

#

definitely still interested tho

fervent lark
#

ok

fervent lark
#

mainly rewriting the ARCHITECTURE.md and removing TaskSchedule in favour of TaskTrigger

fervent lark
#

i've finished the ARCHITECTURE.md document

fervent lark
#

btw @placid drum and @jolly frost ig its best to help out on the Task stuff. Honestly since we can't really unify the APIs, its best to split them in two just like before. However, now instead of an identifier and the need to carry the Scheduler around, im gonna mix in the TaskHandle based approach

#

the idea is explained in ARCHITECTURE.md

#

bascially Task<T1, T2> now is a temporary representation, its ineffecient since its just basic storage and not meant to interacted very frequently. Then once it gets stored on the Scheduler, it destructures it revealling all information this container held, reassembles it to its own representation (for optimization) and then stores it. Afterward returning the handle

placid drum
fervent lark
#

but

#

its not updated with the new approach

#

what you need btw in terms of info?

placid drum
#

give me anything i'll somehow manage

#

just forgot i'll look into it

#

i'll ask if needed

fervent lark
#

ok

fervent lark
fervent lark
#

btw

fervent lark
# placid drum thanks

one of the issues we will again encounter is how would the TaskFrameContext communicate with the Scheduler WITHOUT knowing anything about the Scheduler

fervent lark
placid drum
#

well the basic insense, what its is use and other mentioned in doc and ARCHITECTURE.md

fervent lark
#

ok

fervent lark
#

man quite lazy

#

im also thinking about the "Mute & Transform" pattern

#

btw im thinking if there is a better data structure for TaskHooks that results in constant fetch time with the operations i have

fervent lark
#

i do imagine an edge case though on the top of my head

#
let hook1: Arc<MyHook> = Arc::new(...);
let hook2: Arc<MyHook> = Arc::new(...);

// ...

task.attach_hook::<MyEvent1>(hook1.clone()); // MyEvent1 -> [hook1]
task.attach_hook::<MyEvent2>(hook1.clone()); // MyEvent2 -> [hook1]
task.attach_hook::<MyEvent2>(hook2.clone()); // MyEvent1 -> [hook1, hook2]

Since MyHook produces the same TypeId regardless of which instance, if i emit MyEvent1, how do i know which instance corresponds? Its supposed to be hook1 but since i've appended hook2 it would run that thinking its the last instance added onto the event when its not

#

solving this is tricky, SlotMap only allows reading contents from slots, not being able to modify them

fervent lark
#

Guys we gotta get movin more

#

we're supposed to have 16 days to finish the project's core (NOT including the SDKs and other features)

#

but i will need to delay the deadline

placid drum
#

my brain is refuses to braining

fervent lark
fervent lark
#

ladies and gentlemen

#

LADIES AND GENTLEMEN

#

we've done it

#

over 2 million Tasks per second on average

#

2.1x faster than tokio_schedule

#

2.1 million now

#

and its rising

#

for context btw

#

where blue is ChronoGrapher and orange is tokio_schedule

#

the main compromise is now the scheduler composites are no longer object safe, though i do think some weren't and besides its not their intended use case as even if you make them trait objects, you'd need the SchedulerConfig, so... A worthwhile tradeoff

formal sedge
fervent lark
fervent lark
# fervent lark for context btw

there is lots of potential for optimization via async fn, i've used impl Future<Output = ()> + Send which has increased performance but it may be better to switch back to async fn with adjustments to make sure cache friendliness, a smaller state machine footprint... etc.

fervent lark
#

i've optimized more

#

though there still may be room

fervent lark
#

its kinda crazy how i've improved the performancce to be over 2.1x faster than tokio_schedule

#

and over 10x the performance from the initial benchmark

#

i've also got an idea for TaskHooks to fully avoid Arc<T>

#

what if we apply the same handle-based approacch over there as well?

#
let hook = ctx.instantiate_hook(MyHook::new(...))
hook.subscribe::<MyEvent1>()
hook.subscribe::<MyEvent2>()
#

something like this

placid drum
fervent lark
#

yes its still open

fervent lark
placid drum
fervent lark
#

wdym exactly?

placid drum
# fervent lark ?

its just i wanna get started with chronographer but i am not getting enough time

fervent lark
#

oh

fervent lark
formal sedge
fervent lark
#

but the Task based approach is more important imo

formal sedge
fervent lark
fervent lark
#

i mean docs and unit tests

formal sedge
#

ok

fervent lark
#

though this does require a bit more experience than just unit tests and docs

#

but not much

fervent lark
fervent lark
#

also for the slotmap im gonna make an issue

fervent lark
formal sedge
# fervent lark not really

Sry, but how do you run your benchmarks ? Like bench and tests on bin and benches and make test gives nothing. Or it is that bin/ contains the benches ?

fervent lark
#

typically you do cargo test

#

but now you gotta cd into the folder to run the command

formal sedge
fervent lark
#

cargo bench

#

and ofc cd

formal sedge
#

Why does not it gives me any file or info during comp about the benches ?

fervent lark
#

hold on

formal sedge
#

Above contains warnings

#

Why am i that lost 😂

#

Wait

#

Does not it haves something to do with target?

fervent lark
#

i've updated the makefile

formal sedge
#

Let me see

formal sedge
#

Why there is no [[bench]] attr in benchs toml file?

#

It may be why ? Divan docs indicates to indicates benches there

placid drum
fervent lark
#

ig

#

though this does involve the above resolving issue ^

placid drum
#

so which to work on first

fervent lark
#

we have to make it mostly lock-free for most performance (memory and runtime)

placid drum
#

okay i'll work on this one

fervent lark
placid drum
#

ok

#

i'll discuss details with you

fervent lark
#

ok

fervent lark
#

do write a small benchmark script

#

and run it vs DashMap

placid drum
#

okay

fervent lark
#

hold on il give one benchmark script myself

placid drum
#

take your time i'll sleep then work on this one

fervent lark
#

interestingly

#
=== DashMap Initialization ===
MEMORY USED: 0.008192MB
RUNTIME: 72.585µs
=======

=== SlotMap Initialization ===
MEMORY USED: 0.024576MB
RUNTIME: 37.234µs
=======

=== DashMap Allocations ===
MEMORY USED: 17.826816MB
RUNTIME: 207.784582ms
=======

=== SlotMap Allocations ===
MEMORY USED: 12.39276MB
RUNTIME: 221.595047ms
=======

=== DashMap Clear ===
MEMORY USED: 0.0MB
RUNTIME: 5.436083ms
=======

=== SlotMap Clear ===
MEMORY USED: 0.0MB
RUNTIME: 45.899µs
=======

=== DashMap Deallocations ===
MEMORY USED: 0.0MB
RUNTIME: 86.878349ms
=======

=== SlotMap Deallocations ===
MEMORY USED: -7.833848MB
RUNTIME: 274.703377ms
=======
#

it uses slightly less memory but the runtime is a lil more expensive

raven plumeBOT
#

No transcript returned for file ⁨voice-message.ogg⁩ in ⁨17.708546ms⁩ (⁨0.520⁩ second file)

fervent lark
fervent lark
#

we might need to review how slotmap implements its SlotMap, learn from there and adapt it to be concurrent ready

formal sedge
#

Like what are the difference between those 2 ?

fervent lark
#

yes you can use a single Mutex<T> to make it accessible through but this is bad for performance due to contention, a better approach is sharding the slotmap for multiple locks

#

but even then, i think it can be made better

#

also im gonna make a new issue about TaskTriggers

fervent lark
#

im thinking for a TaskTrigger to act as a state machine, switching between other TaskTriggers

#

the idea is for it to be a macro that generates the state machine TaskTrigger implementation

fervent lark
#

im gonna change the structure of my slot map

#

myself

placid drum
fervent lark
#

probably not

#

tbh try to document something

#

the SlotMap has to be mostly lock-free which is where tons of issues begin

placid drum
#

oh okay thanks

fervent lark
#

like using an intrusive linked list, more formally a trieber stack as a linked list

fervent lark
#

actually

#

it may be better to use a sharded lock of the slotmap

fervent lark
#

but not a huge problem though as most of the time you initialize the logic earlier on like a constructor

fervent lark
#

im gonna have issues with TaskHooks...

fervent lark
#

seems like an AI.

EDIT: Of course it is, its not like i can get anything good quality on this world because of fuckass AI

fervent lark
#

maybe its best to keep instances task-local

#

and for TaskHooks that need a shared state across Tasks, they could use a registry-based pattern

fervent lark
#

ye so there will be changes in TaskHooks, from now on, basically everything implements TaskHook<()>

#

which means you can do cursed stuff like this:

let hook = ctx.instantiate_hook(3usize);
#

but meh its an anti-pattern in most cases

fervent lark
#

imma commit my changes

#

on the handles branch

#

@placid drum @jolly frost you can view what approach im going with

fervent lark
#

the handles based approach

placid drum
fervent lark
#

btw i should mention one of the changes not mentioned in ARCHITECTURE.md is how TaskHooks now behave

#

first of everything is a so called "stale" TaskHook, these are meant to contain only metadata / state

#

stale TaskHooks are either TaskHook<()> or via the alias StaleTaskHook

#

second when attaching a TaskHook, you need "ownership" of the TaskHook instance (this can be circumvented il explain later)

#

and once attachement is done, a handle is returned. Now this handle is used to interact with the TaskHook instance

#

you can subscribe an instance to multiple events at a time, or even unsubscribe them

fervent lark
#

you can also detach them from the handle or get their instance as a reference

#

more importantly one change is you can emit an event for ONE TaskHook

placid drum
#

so task side let say i wanna change taskframe can i do that ?

fervent lark
#

or not?

#

if not you can change it as much as you would like, though at some point you will have to deliver it to a Scheduler

fervent lark
#

you can emit an event for all TaskHooks to listen

#

you can fully detach a TaskHook from all of its events by knowing its type (or via a single event)

placid drum
#

interesting

fervent lark
#

you can get an instance of a TaskHook either from its type or if you want more specificity from its type and event

#

NOTE

#

these methods are different

#

suppose we have one TaskHook type and two instances, lets call those A and B, we attach A to an event called MyEvent1 and MyEvent2, then we attach B MyEvent1

#

when you fetch via event, lets say MyEvent2 you get A

#

BUT

#

when you fetch globally you get B

#

both the get hook methods fetch the last instance of a TaskHook's type, what counts as a last instance is their difference

placid drum
#

what make it which is local or global

#

is type ?

fervent lark
#

whereas the local ones are defined by the TaskHookHandle

placid drum
#

make sense also why we get B not A, i am not able to understand..

fervent lark
#

hence we get B

fervent lark
#

if you have say

let task1: TaskHandle<...>;
let hook1: TaskHookHandle<...>;

task1.emit_event::<T>(); // Acts upon the entire container 
hook1.emit::<T>(); // Acts upon only the TaskHook and nothing else
placid drum
#

when we gonna write test cases ?

fervent lark
#

later

fervent lark
#

these are global

#

hence only accessible through TaskHandle

placid drum
#

also

#

i will start writing doc if you have any intermediate level issue open do let me know

#

architectural issue is out off league for now.

fervent lark
#

ok

fervent lark
#

ideally don't touch Scheduler stuff

#

nor Task stuff (not the TaskFrames, just directly)

placid drum
#

okay

fervent lark
#

actually imma tell you what to document

#

base/src/task/trigger/schedule/calendar.rs
base/src/task/dependency.rs
base/src/task/dependency

#

these are some

placid drum
#

okay

placid drum
placid drum
#

ya i am appreciating it, i read it in code the other day.

fervent lark
#

my brain ain't braining much

#

you meant virtual clock, since you are testing things deterministically right?

fervent lark
#

TaskHooks are a huge problem...

#

we'd have to get a bit crafty on how TaskHooks behave ferrisBanne

#

this is gonna be a tricky problem

#

we need the following:

  • Getting the last TaskHook instance GLOBALLY based on TaskHook type
  • Getting the last TaskHook instance LOCALLY from an event based on TaskHook type and the event's type
  • Attaching a TaskHook instance GLOBALLY
  • Detaching the last TaskHook instance GLOBALLY
  • Subscribing a TaskHook instance (from the attachement) LOCALLY from an event type
  • Unsubscribing a TaskHook instance (from the attachement) LOCALLY from an event type
  • Iterate over all TaskHooks based on an event
fervent lark
fervent lark
#

man the registry-based stuff will be extremely hard to make it as optimal as possible

fervent lark
#

honestly if i get the new design to like over 4-5 million Tasks per second, il quit the optimizations since its enough

fervent lark
#

and i want it to be extremely effecient

placid drum
#

good luckk

fervent lark
fervent lark
#

Why is it so so SO hard to design the SchedulerTaskStore with effeciency in mind ferrisForgor

fervent lark
#

WHAT THE FUCK

#

NO WAY

#

@placid drum

#

i reverted to the standard approach, after one change i did

#

holy shit

#

wait

#

i spawned 450k Tasks that execute about 6 times per second

placid drum
#

which is chronographer.

fervent lark
#

is the new benchmark

placid drum
fervent lark
#

we have a race condition

#

as it seems

#

the results are skewed

fervent lark
placid drum
#

how do you benchmark ?

fervent lark
#

450k times 6 should be at best 2.7 million

placid drum
#

isnt this good ?

fervent lark
#

no?

#

it surpassed 2.7 million

#

which means duplicate Tasks ran

#
struct MyTaskFrame;

#[async_trait]
impl TaskFrame for MyTaskFrame {
    type Error = Box<dyn TaskError>;

    async fn execute(&self, _ctx: &TaskFrameContext) -> Result<(), Self::Error> {
        yield_now().await;
        COUNTER.fetch_add(1, Ordering::Relaxed);
        Ok(())
    }
}

pub async fn benchmark_chronographer() {
    println!("LOADING TASKS");
    let t = tokio::time::Instant::now();
    let scheduler = Scheduler::<DefaultSchedulerConfig<Box<dyn TaskError>>>::default();

    const EXEC_TIMES: usize = 6;
    const TASKS_ALLOCATED: usize = 450_000;

    let spread_millis = 1000.0 / ((TASKS_ALLOCATED * EXEC_TIMES) as f64);

    let mut millis = 0f64;
    for _ in 0..TASKS_ALLOCATED {
        millis = (millis + spread_millis).rem_euclid(1000.0);

        let task = Task::new(
            TaskScheduleInterval::duration(Duration::from_millis(millis.round() as u64)),
            MyTaskFrame
        );

        let _ = scheduler.schedule(task).await;
    }

    scheduler.start().await;

    println!("STARTED {}", t.elapsed().as_secs_f64());
}
#

also here is the script

placid drum
#

how do you actually plot ?

#

the values.

fervent lark
#

desmos

#

it just copies the contents to a CSV which i ctrl C + ctrl V it to desmos

fervent lark
#

freezes the program entirely

formal sedge
fervent lark
#

i've been spawning multiple tokio tasks for each Task

#

ig that could contribute to how slow it is

#

very weid

#

actually let me zoom it more

#

i've rewritten the benchmark to basically increase the Tasks by a batch (1000) every second, each scheduling every 10 milliseconds

fervent lark
#

caught a bug

fervent lark
#

caugh a bunch actually

fervent lark
#

btw i realize the timing wheel can be made lock-free

#

actually its lock-free but

#

it does require a central queue though

#

which can be a bottleneck

#

the naive approach is to define SegQueue instead of Vec<T>

#

but

#

its costly in memory, around 327KB for one timing wheel 256b * 256 * 5

fervent lark
#

i will need to make the data structure as minimal in memory as possible

fervent lark
fervent lark
#

seems like im running to multiple issues with concurrent data structures, instead of bloating ChronoGrapher in the utils folder with the data structures, it may be best to create a new crate / project (the aformentioned conckit)

placid drum
#

@fervent lark

#

 

error[E0603]: constant `TIME_FIELD` is private
  --> proc/src/every.rs:5:47
   |
 5 | use crate::utils::time_literal::{TimeLiteral, TIME_FIELD};
   |                                               ^^^^^^^^^^ private constant
   |
note: the constant `TIME_FIELD` is defined here
  --> proc/src/utils/time_literal.rs:82:1
   |
82 | const TIME_FIELD: [&str; 5] = ["milliseconds", "seconds", "minutes", "hours", "days"];
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

#

getting this error for master branch

fervent lark
#

oh ye

#

just make it public and open a PR

placid drum
#

okay t

fervent lark
#

merged

#

btw im thinking about the slot map (this will be availaible as another crate and NOT just in ChronoGrapher though it will be used), here is how i would lay it out:

The slot map is AtomicPtr<[Segment<T>]>, basically a Vec<T> without the length/capacity information (thats stored alongside it). Now each segment is conceptually the same as the above just a Vec<T> stripped of its length/capacity information but the difference is now we are in the slots. Each slot contains a SlotContents and a packed AtomicU32 containing a generational counter in the 30 lower bits + a state in the 2 higher bits. Where:

  • 00 = Free
  • 01 = Reserved
  • 11 = Occupied

SlotContents is a union, that either contains the value or points to the next free slot, this is concluded via the state above. Now an interesting problem arises

"How do we track free slots?"

This is where we can use an Intrusive Linked List. BUT, the change is we don't store it per segment, no no, thats ineffecient. We store a global linked list where the index usize encodes both the segment and the index of the slot relative to the segment. Now another problem arises:

"What happens when 2 slots are free but live in different segments?"

The solution is to encode in the SlotContents where the segment lives. One idea i had is if the free index is out of range then it means it lives in another segment so we just div & modulo by the segment size to find the offset of segments and the slot it lives in. Then from there navigate

The linked list is of course a frieber stack otherwise it would be hella expensive to navigate to the end of the linked list. Another problem arises though with the free slots

"How do we free memory when there are free slots but on different segments?"

The answer is we don't, but what we do is bias the allocation towards segments that are more full, leaving empty-ish ones to slowly drain and once fully drained then remove

placid drum
#

good one

#

i understand half of it, but sound nice nice to me

fervent lark
placid drum
fervent lark
placid drum
fervent lark
#

turns out going lock-free may not be optimal

#

there are very specific cases where going lock-free makes sense for performance

placid drum
#

so this arch is final ?

fervent lark
#

?

placid drum
#

we were changing architecture right ?

fervent lark
#

fixed some bugs

#

and rewrote the benchmark

#

now the benchmark measures throughput over time as Tasks are added (1k per second)

#

the results are ||not supring lol||

#

blue for tokio_schedule, purple for ChronoGrapher

#

ChronoGrapher at worst is slightly faster than tokio_schedule

#

on average its 3x faster

#

and on best case even 4x

#

now granted, as i pointed out to @formal sedge, tokio_schedule is slower since i always spawn a tokio task per Task which can easily get out of hand petty quick

#

also the throughput dropped to 400k, as the previous benchmark wasn't correct, it benchmarked thoughput as Tasks executed instantly

fervent lark
# fervent lark

@formal sedge suprisingly, even if i allocate a batch of 64 tokio Tasks that act as workers and then the schedules get pushed there, it still doesn't improve it

#

tbh i think its a hard cap of 1 million Tasks per second for some reason

fervent lark
# fervent lark

Performance could be improved even more by sharding the Hierarchical Timing Wheel

fervent lark
#

the idea is make the workers poll the engine, instead of just a single SegQueue pushing the commands and having one thread manage ticking (single-threaded)

fervent lark
#

@jolly frost We should look toggether on how best to shard the timing wheel

#

the goal is for ticking to be consistent (exactly ~1ms) but allow workers to fetch Tasks in parallel and process

#

and improve the work stealing code

fervent lark
#

Honestly this is one of those changes that will increase massively performance

fervent lark
#

very suprisingly:

#

i've added 2 more benchmarks

#

tokio_cron_scheduler in green

#

and a pure tokio based interval loop is purple

#

and ChronoGrapher outperforms them all in most scenarios

fervent lark
#

i have omitted some other crates, i tried to benchmark on crates like tsuki_scheduler but their model just doesn't map cleanly

#

their API kinda sucks ngl (no bias but it actually doesn't feel clean), neither is my base API but the macros will be significantly cleaner for sure

#

the frustrating bit is how their docs are unmantained and how i can't use it in a multi-threaded scenario

fervent lark
fervent lark
#

@placid drum tbh, it may be better to touch on docs, if we roll out the handles based approach then we will definitely need to rewrite it but for now should be good

fervent lark
#

and they can be combined for extreme deterministic replays

#

it allows you to record exactly every step the Scheduler took

#

I feel like it can be made even more powerful

#

giving the ability to basically inspect through composites, not just in the Scheduler though this will be hard to get right

fervent lark
molten loom
#

I have zero memory of anything going on here lol

fervent lark
#

im actually sorry

#

didn't meant to mention you, wanted to mention @placid drum but ye

fervent lark
#

ngl man, we do need to finish the core asap

#

we supposedly had like 8 days to finish the core

#

and basically all of us are slacking back (for me personally its just irl stuff and a bit of the burnout effect)

#

i do wish we had more manpower (specifically a bit more of expertises, not asking like a PhD but neither asking for basically blind workers, i do wish we just had more of it) for the project

placid drum
#

my plan -

  1. rename Scheduler -> LiveScheduler (update all refrence)
  2. simulacrum will become part of LiveScheduler

and then processed on @fervent lark should i work on this

fervent lark
#

this is more of a backlog

#

i suggest better do docs

placid drum
#

Ok

fervent lark
# placid drum Ok

also forgot, one issue is the website, we haven't even touched it which we should. Honestly the main force blocking me from writing the guidebook docs is https://github.com/GitBrincie212/ChronoGrapher/issues/141

GitHub

Is your feature request related to a problem? Please describe. Currently FallbackTaskFrame doesn't share the error with the secondary TaskFrame, the error can be indirectly shared via a hacky w...

#

its a hard problem that has to go

#

If we find a way then things can go faster

fervent lark
#

plus it has to be adaptable

placid drum
placid drum
fervent lark
#

what if i make a FallbackTaskFrame2 (not realistic but suppose something else that does require the system the fallback uses)?

#

tbh try the arguments injection based approach

#

the idea is you have an additional field called args, a reference to an associated type

#

then for DynTaskFrame you just add an additional generic

placid drum
fervent lark
fervent lark
#

FallbackTaskFrame would need to tell the first TaskFrame it accepts no arguments, then the secondary has to accept as argument the error type of the primary

#

and

#

consider the fact the secondary TaskFrame may not be direct

#

like say a RetriableTaskFrame may be present

#

or even hell another FallbackTaskFrame like you said

#

one idea is to make the arguments recursive, basically

#

Initially: ()
1st Fallback: (Args, Err1)
2nd Fallback: (Args, Err2) -> ((Args, Err1), Err2)

#

the other TaskFrames woudl transport the arguments

placid drum
#

hmm okay, but need to add docs asap

fervent lark
fervent lark
placid drum
fervent lark
fervent lark
#

i rewrote the issues

#

wrong issue

#

ok now correct

placid drum
placid drum
#

@fervent lark check pr if any suggestion please let me know.

fervent lark
#

ok

fervent lark
#

also why clone the arguments?

#

you need pass by reference

#

for fallback

#

for dynamictaskframe it should require the arguments

#

do wish it was a bit more ergonomic

#

macros i guess will help out with this one

placid drum
#

okay i try to make it little bit ergonomic

fervent lark
#

ye

#

and do fix these issues i pointed out

placid drum
#

yea sure

fervent lark
#

for fallback, you can just require the argument of the second TaskFrame to be (&T::Args, ...)

#

so it would then be

#

&(&T::Args, ...)

#

not the best but definitely more performant and less restrictive

placid drum
#

okay

placid drum
#

@fervent lark adding refrence will cause lot changes regrading lifetime should i do that ?

#

every TaskFrame

fervent lark
#

you probably should

#

its gonna be difficult

#

but do aim for it + making it more ergonomic

placid drum
#

okay i try my best

fervent lark
#

for the macro, i do imagine it would automatically unwrap the arguments from this tuple (sort of like being varargs)

#

but

#

thats just for the macros

placid drum
#

okay

fervent lark
placid drum
#

well lets see till tomorow how much i can do it

fervent lark
placid drum
fervent lark
#

eh its a hard issue even for me

placid drum
#

lol why tf i am doing this ferrisPensive.

#

well i can only see red in codebase

placid drum
#

@fervent lark i need help so right now, erased task cant use dyn taskframe due GAT, should i remove dynamic task frame we use &dyn Any so do dynamic dispatch is possible.

fervent lark
#

why remove it though?

#

oh

#
#[async_trait]
pub trait TaskFrame: 'static + Send + Sync + Sized {
    type Error: TaskError;

    async fn execute(&self, ctx: &TaskFrameContext) -> Result<(), Self::Error>;
}

Thats not GAT btw, just an associated type

#

also the Sized requirement prohibits you to use object safe code

#

plus i want to keep this kind of interface

#

also for &dyn Any, well... downcasting would slow us down a bit, so we have to use some unsafe code

fervent lark
#

btw for the handle-based approach, apparently we can't remove the SchedulerHandle

#

its a part thats needed

placid drum
#

oh okay

#

#[async_trait]
pub trait TaskFrame: 'static + Send + Sync + Sized {
    type Error: TaskError;
    type Args<'a> : Send + Sync;
    async fn execute<'a>(&self, ctx: &TaskFrameContext, args: &Self::Args<'a >) -> Result<(), Self::Error>;
}

#

i was using this approach

fervent lark
#

oh

#

you can unelide the lifetimes though?

#
#[async_trait]
pub trait TaskFrame: 'static + Send + Sync + Sized {
    type Error: TaskError;
    type Args<'a> : Send + Sync;
    async fn execute(&self, ctx: &TaskFrameContext, args: &Self::Args) -> Result<(), Self::Error>;
}
#

im pretty sure like that

#

or if you must use the placeholder lifetime then just add <'_>

fervent lark
#

it might, just might be possible to remove it

#

though only for like the TaskFrame side, the TaskHook side is something more difficult

#

we could take advantage of unsafe code and have it call the TaskHandle indirectly

placid drum
#

@fervent lark


impl has stricter requirements than trait
impl has extra requirement `'a: 'async_trait`

#

 #[async_trait]
pub trait TaskFrame: 'static + Send + Sync {
    type Error: TaskError;
    type Args<'a>: Send + Sync;
    async fn execute(
        &self,
        ctx: &TaskFrameContext,
        args: &Self::Args<'_>,
    ) -> Result<(), Self::Error>;
}

#[async_trait]
pub trait DynTaskFrame<'a, E: TaskError, A: Send + Sync + 'a>: 'static + Send + Sync {
    async fn erased_execute(&self, ctx: &TaskFrameContext, args: &A) -> Result<(), E>;
    fn erased(&self) -> &dyn ErasedTaskFrame;
}

#[async_trait]
impl<'a, T: TaskFrame> DynTaskFrame<'a, T::Error, T::Args<'a>> for T
where
    T::Args<'a>: Send + Sync + 'a,
{
    async fn erased_execute(
        &self,
        ctx: &TaskFrameContext,
        args: &T::Args<'a>,
    ) -> Result<(), T::Error> {
        self.execute(ctx, args).await
    }

    fn erased(&self) -> &dyn ErasedTaskFrame {
        self
    }
}

fervent lark
#

a lot of lifetime hell as it seems

#

ye try to avoid lots of named lifetimes as much as possible

#

if its annonymous / placeholder then its ok

#

but this just gets out of hand

placid drum
#

ya

#

if we remove async trait it would work but will need to write future by hand

#

wtf i am suggesting lol

fervent lark
placid drum
#

any suggetion ?

fervent lark
#

tbh idk

#

im trying to do the TaskHandle approach

#

its difficult af, but the more i analyze it the more i realize it is needed

#

take single-node persistence for example, TaskHooks are in a global registry which means its impossible to persist them, no easy way to inspect the structure of Tasks or let the SchedulerTaskStore add its own metadata on top of the Tasks without using TaskHooks

#

with the handle-based approach, storage is more flexible at the cost of more complexity ofc but definitely worth it

placid drum
#

ya its worth it, regardless of complexity.

fervent lark
#

one of the problems will be how to make TaskFrameContext and TaskHookContext allow communication regardless if a TaskHandle or Task<T1, T2> is used

fervent lark
#

my mind is being f_cked by the complexity of TaskHandles

#

btw the more i see it the more better it may be to simplify both context objects, removing all of their data

#

the only data they would hold is the handle

fervent lark
#

honestly its best to leave TaskHandles

#

and just focus on shipping the product in an alpha stage

placid drum
#

Yes ig

placid drum
#

@fervent lark should i keep working on that or should i write doc ?

fervent lark
#

try to work on it

#

if we get over with it

#

i can continue the guidebook documentation

#

this thing keeps me off from continuing due to its unknown shape

placid drum
#

okay

fervent lark
#

look try your best in order to get this issue done

placid drum
#

yes

#

well i need to tackel this

fervent lark
#

yup

fervent lark
#

man i wish we moved more quickly

fervent lark
#

i think this rocks, i've edesigned a bit the logo

#

nvm i nailed it

#

i think i cooked

#

this is more for marketing side of things, the logo will be simplified for say the website (since its quite complex)

fervent lark
#

final logo

fervent lark
#

im wishing we get more help

#

like we're capped in this size of a team

#

and we move at a snail pace

fervent lark
#

though gotta admit, absolute cinema, we reached 31 github stars

#

though i'd argue help is more important

fervent lark
#

jesus fuck i just woke up and now its like 39 stars, kinda impressive

#

yet... No help which the one thing i need most rn

placid drum
fervent lark
#

the problem is when

#

help should arrive soon since thats the critical stage

#

if it arrives towards the completion of the project, then its kinda useless

placid drum
#

well you are right about this.

placid drum
#

Soon 100

#

@fervent lark about 141 issue what kind of ergonomic you're looking for? Like can you give example it would be nice to have reference.

placid drum
#

okay

fervent lark
# placid drum <@822874616947146793> about 141 issue what kind of ergonomic you're looking for?...

for https://github.com/GitBrincie212/ChronoGrapher/issues/141 tbh i don't have in mind though i do want the use to be simple in the sense:

#[async_trait]
impl TaskFrame for MyTaskFrame {
   type Error = ...;
   type Args = ...; // Maybe use GATs for lifetimes and stuff

   async fn execute(&self, ctx: &TaskFrameContext, args: &Self::Args) -> Result<(), Self::Error> {
        // ...
   }
}
GitHub

Is your feature request related to a problem? Please describe. Currently FallbackTaskFrame doesn't share the error with the secondary TaskFrame, the error can be indirectly shared via a hacky w...

placid drum
#

did you checked my last pr ?

fervent lark
#

i did

#

and i told you about it

#

you had stuff like Clone which is restrictive

placid drum
#

i am thinking of making it static what do you think

#[async_trait]
pub trait TaskFrame: 'static + Send + Sync + Sized {
    type Error: TaskError;
    type Args<'a>: Send + Sync;
    async fn execute(&self, ctx: &TaskFrameContext, args: &Self::Args<'_>) -> Result<(), Self::Error>;
}

#[async_trait]
impl<T: TaskFrame<Error: Into<T::Error>>> DynTaskFrame<T::Error, T::Args<'static>> for T {
    async fn erased_execute(&self, ctx: &TaskFrameContext, args: &(T::Args<'static>)) -> Result<(), T::Error> {
        self.execute(ctx, args).await
    }

    fn erased(&self) -> &dyn ErasedTaskFrame {
        self
    }
}
fervent lark
#

heck no

#

the GAT will be basically useless

placid drum
#

ya thought so

#

you want arg to have its own lifetime ?

fervent lark
#

well the lifetime of execute method

placid drum
#

and so that we dont need to clone ?

placid drum
fervent lark
#

kind of starting to feel your pain though, i've attempted to do this myself just in case i crack it

#

lifetime parameters or bounds on method erased_execute do not match the trait declaration [E0195]
lifetimes do not match method in trait

placid drum
#

i dont have clear picture like if it have life time 2nd execute from task frame isnt it will drop when moving to fallback frame ?

fervent lark
#

?

placid drum
#

i was thinking about error anyway

#

its not related

fervent lark
#

hm async is not the problem

placid drum
#

i guess i will become mad someday

fervent lark
#

lol

#

one way to solve this is via Arc<T> but this is just disguisting

#

both ergonomically and performance-wise

placid drum
#

Ya arc will add extra overhead

#

Can't we just pass memory address ferrisCluelesser

#

It will need lifetime so no memory sharing

fervent lark
#

thats basically asking for a UB to happen

fervent lark
#

honestly

#

we might have to go with the route of 'static

fervent lark
#

btw im going to strip down the information the context gives

placid drum
fervent lark
placid drum
# fervent lark they are quite useless

Hmm I read about Kafka architecture yesterday they use offset not any I'd to track the messages

Can't we do same for task it will same extra overhead?

fervent lark
#

hm, explain it a bit more

#

elabrorate

placid drum
#

Like we give option to put I'd for task, instead of that can't we create in serial number as task. If user needed he can use that to get task

formal sedge
#

error[E0599]: no method named into_erased found for struct DynamicTaskFrame<T> in the current scope
--> tests/src/task/frames/dynamic_taskframe_test.rs:35:11
|
35 | frame.into_erased().run().await?;
Is it normal ?

placid drum
#

It's error so, it should not be normal

formal sedge
#

I think i should transform it into a task

#

yh that was that

fervent lark
#

i don't think i get it

placid drum
fervent lark
#

you gotta fix your spelling man, its kinda hard to understand

placid drum
#

I'll improve

#

Also should I do that issue with static?

fervent lark
#

done

placid drum
#

Ok I'll do and add pr

fervent lark
#

tbh

#

its too complex

#

we need to rethink TaskFrames

fervent lark
#

This is a mindfuck on so many levels

fervent lark
# fervent lark we need to rethink TaskFrames

i want simplicity, like the execution tree it should feel like stacking LEGO on top of each other while being familiar in typical programming, no like DAGs in the mix and this is really hard

#

@placid drum

#

ig do docs

#

this shit is too complex

#

focus on scheduler stuff for now

fervent lark
#

retry and fallback are such pain

#

either requires Clone

fervent lark
#

@placid drum its solved

#

which means i can FINALLY

#

continue the guidebook docs

placid drum
#

I'll check pr

fervent lark
#

its in the commits

#

basically one of the sacrafices i had to do is:

impl<T, T2> TaskFrame for FallbackTaskFrame<T, T2>
where
    T: TaskFrame,
    T2: TaskFrame<Args = T::Error>
{
    type Error = T2::Error;
    type Args = T::Args;

    async fn execute(&self, ctx: &TaskFrameContext, args: &Self::Args) -> Result<(), Self::Error> {
        // ...
    }
}
#

no additional Args parameter

#

but...

#

you can get around this limitation, via a middleman that attaches this argument type as a TaskHook more specifically a shared object

#

dynamic? Yes, does it happens too often to worry about? Definitely no

#

i've realized something...

#

ye the clones...

#

this doesn't work

#
struct TaskFrameA;
struct TaskFrameB;

impl TaskFrame for TaskFrameA {
    type Error = String;
    type Args = ();

    async fn execute(&self, _ctx: &TaskFrameContext, _args: &Self::Args) -> Result<(), Self::Error> {
        Err("error".to_string())
    }
}

impl TaskFrame for TaskFrameB {
    type Error = String;
    type Args = String;

    async fn execute(&self, _ctx: &TaskFrameContext, args: &Self::Args) -> Result<(), Self::Error> {
        Err(*args) // We are taking String by reference
    }
}
#

end me...

#

im done, this is the user's choice

#

not mine

#

either make the type cloneable or use an Arc<T>

#

nothing else

placid drum
#

feeling the heat

fervent lark
#

unbelievable

formal sedge
#

git so fckin hard ferrisSob

placid drum
#

haa if we do x post 100 star easy

fervent lark
#

ig do it though yourself

#

just remember not to share it prematurely as otherwise we won't maximize the effect and may damage our image

fervent lark
#

what the heck

#

github is vibe coding their shit, i see...

formal sedge
#

commits endpoint works well so this should work

fervent lark
#

also the guidebook's workflows chapter is slowly being finished

#

the funny thing is the Commits button works

#

il have to finish the Dependencies chapter

#

tweak "Time In A Wokflow" a bit

#

and add a Patterns & Summary chapter summarizing everything i taught

formal sedge
#

wth 😂

formal sedge
#

Waiting 49ms

thread 'task::frames::timeout_taskframe_test::task_finishing_after_timeout_returns_error' (532316) panicked at tests/src/task/frames/timeout_taskframe_test.rs:62:5:
assertion failed: exec.is_err()

TimeoutTaskFrame is precise at less than a millisecond 😮

fervent lark
formal sedge
lament owl
#

Hey I would like to focus on cron! Macro. I have few questions tho. Sorry if they may sound dumb or obvious to You - I’m yet to understand the repo 🙂

I digged a little bit through the repo.
The docstrings mention that cron! already exist and I found some stub only - I assume it’s just assuming that the pending issue as already finalized ?

I found the dynamic logic here and I assume I may use/base heavily on those
https://github.com/GitBrincie212/ChronoGrapher/blob/master/base/src/task/trigger/schedule/cron.rs

As far as I understand it, dod of the macro is:

  • Validate the input literals and raise compiler error if those fail
  • Construct TaskScheduleCron from multiple cron fields input literals
  • Add unit tests
GitHub

Powerful, developer-experience centric, blazingly fast and extensible job scheduler and workflow orchestration platform - GitBrincie212/ChronoGrapher

formal sedge
#

ask McBrincie212 he will enlighten you

fervent lark
# lament owl Hey I would like to focus on cron! Macro. I have few questions tho. Sorry if the...

Its ok, i get you're new to the project, no one fully knows everything (except me since i provision the project).

The docstring yes, they assume the feature is finalized (as to not have to update it continiously and only under changing the feature). The cron! macro is not implemented but the dynamic logic is already done. I would advise refining the dynamic logic since its not the best so you can extract it more easily

All of the bullets of the dod are true and exactly what i want. However, do not reimplement the cron parser, rather find a way to exract the parser and transpiler in such a way that its used in the dynamic logic and the macro logic to reduce mantainability cost

lament owl
#

Understood, thanks for clarification

placid drum
#

@fervent lark loved they frame dependency work like you can disable but even its resolved this will treated as unresolved by scheduler.

#

without changing dependency graph

fervent lark
#

thx ig

placid drum
#

Well that seems small but make lot difference

I never used other complex cron so don't know it's common feature or not

fervent lark