Just a simple idea on how to integrate a thinking model for Neuro/Evil. Parallel processing is not novelty, but anyways, I'll give my input on this:
The idea here is to separate Neuro's logic into two layers, one of which is responsible for her direct answers (as she's been doing so far), the other one is for thinking (processing more tough questions, requiring more time and processing).
I've attempted to display this process on a simple diagram (see pic.):
VC or Twitch chat queries will go into Q blocks, responsible for decision - does the question require thinking, i.e. it's a submodel that decides whether you need to process the given prompt as a simple or a complex one. If the block's output is no, then the question is passed onto main layer model in R blocks (whichever way Neuro answers at the moment), so the question is processed instantly. If the Q-block output is yes, then the question is forwarded to the thinking model P, which acts as a inner monologue. Meanwhile the query can be skipped (if it's a regular twitch chat query) and move on to the next. Once the next question has been processed in the next Q block, regardless of the answer, the previous P block provides the result for the previous query, and the question before than can now be answered in R' block. (Note: if the Q2 block outputs 'no', meaning question can be processed instantly, it will be answered before outputting the answer for the previous query Q1 from the thinking layer P1).
If there are no parallel queries going (ex. Neuro is talking to Vedal without Chat, Priority Messages, etc - a 1 on 1 dialogue), thinking will have to be processed in Serial mode instead of Parallel (Q1 -> {no: R1, yes: P1 -> R1'), which will obviously slightly increase latency in a regular dialogue.
Short designations:
Q - thinking-decision blocks;
P - thinking processing models/units;
R - simple response;
R' - response after thinking.
gets himself a B100 or a high end GPU of such sort)