🎙️ Feature Proposal: Persistent Voice Profiles with Full Parameter Control
TL;DR – Suno Needs True Voice Profiles, Not Just Voice Labels
The Core Problem
Currently, a "Voice" in Suno is just a weak label. As soon as genre, mood, or instruments are entered in the prompt, the model completely overrides the voice. Result:
Voices sound different from track to track
It's nearly impossible to consistently develop an established character
Voice descriptions lose out to the rest of the prompt
The Vision
A Voice Profile must function as a strong pre-compiled prior – a fixed anchor that takes precedence over all other prompt signals (similar to Eleven Labs). If I write "Voice: Harald," Harald should remain Harald.
The proposed solution – Three levels
Level 1 – Identity (foundation)
voice_name, gender, age_group, voice_type, pitch_range, primary_language, accent_region + strength, etc.
Level 2 – Acoustic & Tonal Profile (the actual fingerprint)
Timbre (brightness, breathiness, raspiness, nasality…), formants, vibrato (rate, depth, onset), dynamics, pitch stability, etc.
Level 3 – Behavioral Parameters (the underestimated gold)
Articulation, melodic movement, technique toggles (melisma, falsetto, growl, belt…), emotional defaults, context override resistance (0–100) ← Key parameters
Additionally: Phonetic Language Profiles (Part 2)
Dedicated system for language variants (e.g., zh-TW-Mandarin vs. Cantonese, Arabic dialects, etc.) + Community Reference Packs – structured Audio uploads from native speakers who teach Suno true phonetics.
Why this must happen as a saved profile and not in the prompt:
Prompts are context → everything competes.
A saved profile is a fixed prior that is loaded before generation. This fundamentally solves the consistency problem.