#What exactly triggers a Flus StartOfTurn message?

1 messages · Page 1 of 1 (latest)

umbral ivyBOT
#

Hi @hushed python. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

The knowledge sources do not describe the internal algorithm Flux uses to decide when to emit StartOfTurn, so I can only explain it based on the documented behavior, not the exact model logic.

From the docs:

  • Flux emits structured turn events (StartOfTurn, Update, EagerEndOfTurn, TurnResumed, EndOfTurn) instead of raw continuous transcripts. [Flux vs Nova-3]
  • A typical turn starts with one or more Update events, then a StartOfTurn once Flux has detected that the user has begun speaking. You can see this in the turn lifecycle example, where silence Updates are followed by a StartOfTurn as soon as speech appears in the transcript. [Turn lifecycle]

However:

  • The docs do not specify a numeric threshold, VAD rule, or confidence metric that directly controls StartOfTurn (unlike EndOfTurn, which is explicitly controlled by eot_threshold, eager_eot_threshold, and eot_timeout_ms). [EOT config; State machine]
  • The configurable parameters (eot_threshold, eager_eot_threshold, eot_timeout_ms) only affect end-of-turn behavior and eager events, not when StartOfTurn fires. [EOT parameters; Quickstart config]

So, based on the available docs:

#
  • What triggers StartOfTurn?
    When Flux’s internal turn-state machine decides that a new user turn has begun—i.e., it has detected the onset of speech after a period of non-speech / previous turn completion. This is visible in the example where the first non-empty transcript (“Hi I”) is delivered with event: "StartOfTurn". [Turn lifecycle]

  • How is it determined?
    The exact criteria (acoustic thresholds, timing, internal confidence scores) are not documented. The docs only state that Flux “has built-in turn state machine” and “handles barge-in and turn-taking natively,” but they do not expose tunable knobs or formulas for StartOfTurn. [Flux overview; Nova-3 migration]

If you need to reason about it operationally:

  • Treat StartOfTurn as: “Flux is confident a new utterance has started; interrupt TTS if needed and start listening for this turn’s content.”
  • You can tune when turns end (and thus when the next StartOfTurn can occur) via eot_threshold, eager_eot_threshold, and eot_timeout_ms, but you cannot currently tune the StartOfTurn trigger itself based on the public documentation.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: