#Improved Vocal Cloning (After RVC)

1 messages · Page 1 of 1 (latest)

raven fossil
#

RVC can be useful and excellent, but with the advancement of AI I think they need improvements for cloning. Look at what we have in UDio's AI, if we can do something like that but on vocals, it would be perfect.
I believe that a change in AI model would be necessary for this to be done.

tacit kayakBOT
#
𝙈𝙐𝙎𝙄𝘾 𝙏𝙍𝘼𝘾𝙆 𝘾𝙀𝙉𝙏𝙀𝙍
Improved Vocal Cloning (After RVC)

RVC can be useful and excellent, but with the advancement of AI I think they need improvements for cloning. Look at what we have in UDio's AI, if we can do something like that but on vocals, it would be perfect.
I believe that a change in AI model would be necessary for this to be done.

👍 Upvotes:

3

👎 Downvotes:

1

worn frost
#

"Guys, we just have to make the next UDio/Suno, but with RVC!" misc_yummy

UDio and Suno are trained on an absurd amount of data (hundreds to *thousands *of hours) with industrial grade GPUs (we are talking literal *hundreds *of gigs of VRAM) for weeks to months. This data can be on many different kinds of music and voices, and is generalized to create **one **product.
RVC is community supported by model makers, who train using limited amounts of datasets (*minutes *of audio) as well as limited quality GPUs (12 gig) for several *hours *so that **many **voices can be made at "good enough" quality to match a relatively limited amount of contexts.
To produce the same quality as UDio or Suno, you'd need to train using a similar amount of data, with similar grade GPUs, over a very large amount of time, for every single model to get the result that you are suggesting.

#

If you wanted this to become reality, you'd need to fund or find the funding for:

Voice data acquisition: There literally isn't enough audio data out there for almost all voices to be trained to variable qualities. You need hundreds to thousands of hours per voice. Non-celebrity actors: $200k–$500k per voice. Celebrity voices: millions each. That includes recording sessions, editing, cleaning up noise, labeling, and licensing contracts. You can’t skip this; quality AI depends entirely on the dataset.

GPU infrastructure for training: Industrial-grade GPUs (A100/H100-class) are required, linked in multi-GPU nodes for effective memory. One node costs $200k–$400k if you buy hardware, or $50k–$100k in cloud compute per model. You also need datacenter space, cooling, power, and redundancy—thousands per month in operational overhead.

**ML Engineering team: **You need engineers to build the training pipeline, tune models, handle multi-GPU scheduling, checkpointing, debugging, and evaluating results. This team alone costs $150k–$500k per year, depending on size and expertise.

Development team for deployment: A cloud platform to serve the models, APIs, web interfaces, GPU scheduling, scaling, monitoring, and support. Expect $50k–$200k+ per project to develop and maintain.

Legal, licensing, and HR: You need contracts for voice talent, licensing for music if used in training, employment contracts, GDPR/privacy compliance, and intellectual property safeguards. Costs: $50k–$200k at minimum; multiply heavily if celebrities are involved.

Operations staff: People to manage day-to-day running of servers, handle cloud infrastructure, monitor costs, troubleshoot failures, and manage backups. Add several full-time employees at $50k–$100k/year each.

#

Business development / talent acquisition: Staff to reach out to celebrities, negotiate deals, and handle scheduling.

Marketing and advertising: To sell or license the product, you need marketing strategists, content creation, ad campaigns, outreach, social media management. Easily $50k–$500k per campaign, and you’ll likely need multiple campaigns.

Accounting and finance: Budgeting, tax compliance, payroll, revenue tracking, financial planning.

Customer support / community management: If you provide access to end users, each new model may generate questions, bug reports, or requests for customization. You need support staff, potentially 24/7.

Operational costs per model: Storage, bandwidth, cloud hosting, monitoring, and backups. Even with infrastructure in place, each new model can add $5k–$20k annually. Scale that to 100+ models, and it’s hundreds of thousands in recurring costs.

So let’s tally a single high-quality voice model after infrastructure exists:

**Voice data: **$200k–$500k

GPU training: $50k–$100k

**ML engineering & dev: **$200k–$700k

Legal/HR: $50k–$200k

**Operations staff/support: **$50k–$150k

Marketing & advertising: $50k–$200k

Accounting/finance: $10k–$50k

Cloud/storage ops: $5k–$20k

**Total ** for 1new model: roughly $615k–$1.9M (non-celebrity), and multiples higher if celebrity voices or aggressive marketing campaigns are included.

Scaling this to 100 models: easily $60M–$190M.

Scaling to 1,000 models: roughly $615M–$1.9B.

#

**Just financially, ** it is impossible for weights to support, much less, their sub-owned AI HUB discord server. This is not to mention how incredibly hard it'd be to make it work from an organizational, legal, ethical, and financial standpoint.
However, If you find some kind of God given shortcut (you have the key to quantum computing), or are God himself, I'd be open to taking your suggestion with utmost care, caution, and religious fervor.

"Just be better guys! We just have to make

a change in AI model (whatever this means)
📈🤑 💰 !"