First, let's talk about the suitable position for TTS, which is "output-multimodal." TTS that achieves realistic results brings greater emotional value, making it easier to develop habits due to its attractiveness.
Regarding the applicable scenarios for TTS, it should be categorized based on the user audience of the platform and the specific context. TTS is suitable for emotional selection, while bots are suitable for productivity-oriented scenarios.
For example, if you optimize a TTS for a 2D character, its position should be in "output-text-multimodal." The existence value of this 2D character is a form of role-playing, so the virtual companionship scenario has practical significance.
Let me give you a familiar example. I created a TTS with a vivid voice of "Naruto Uzumaki." Its significance lies in providing virtual companionship, allowing users who like Naruto to find meaning and value in the interaction. However, this is just a companionship scenario, and it will attract more Naruto fans to experience it.
If we further consider what can be added to enhance this experience, it would be the addition of a scenario. For example, "Naruto Uzumaki teaches you English." In this case, the design of the previous prompt should include prompts that guide the correct learning of English. The final result is a virtual companionship with a more practical significance. At this point, the significance of this TTS becomes significant as it can now solve practical problems. Additionally, it can be adjusted and refined to cater to the platform's user base, forming the foundation of a platform economy.
If a character from Genshin Impact is popular within a community, TTS with lively and realistic characteristics can attract these individuals. When combined with meaningful companionship, it ultimately leads to productivity conversion.