#does text-generation-webiui has api?
51 messages ยท Page 1 of 1 (latest)
There are 2, a built in API and an OpenAI API compatible extension
check the docs for more info
There are api examples in the api-examples/ folder, and the openai API has a readme about compatibility and links to API docs.
Which docs exactly? Was looking for a web documentation but couldn't find any. Do you mean the sample py scripts?
The built it API only has examples, but the openai API is well documented, starting here: https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai
because it's openai compatible, once you understand the limits you can just use the openai API documentation and examples
So you would recommend the openai API?
What's with that API-key in the environment? I have to set up my own one or do I need an actual OpenAI key?
I do, unless you have specific reasons for using the ooba native api, but I'm very biased, I wrote it.
The API key is needed by most existing clients, but you can set it to anything when using it with ooba, like 'sk-dummy' etc. I use sk-1111111111111111111111111111111111111111111111111111 (or some other long string of 11's - see the docs) because some apps are picky about the format. It is NOT a real openai API key.
it's just a placeholder because the existing clients require it, on the server it's ignored.
Okay, then I might consider it. my plan was to stay completely offline, running my own STT, conversational model and TTS (Whisper-STT, Llama-2 uncensored, Coquis-TTS)
doesn't really sound like you need an API, unless you're building your own setup. There are extension for all that stuff direct in tgwui
Yep and they are supoptimal for my purpose
You gotta press a button to talk and to stop recording
Silerto TTS quality is not what I had in mind, Bark TTS is insanely computationally intensive and Eleven.ai is overpriced. Therefore I went with Coquis-TTS and wrote a python script that records my voice once a volume threshold is reached/ stops recording once the voice falls below said threshold for a certain time.
Actually I wanted to avoid using tgwui completely because I won't need 90% of what if offers, but considering the lack of official documentation on the subject of newer LLMs like Llama2, I quickly realized that I can give up right away if I try to setup the LLM in python from scratch without any prior experience.
My end goal is A STT-TTS-VTube-AI that can perform simple read/write tasks on my mashine as well as querying the web (the latter two are optional and depend on difficulty). Therefore I tried to keep it as slim and performant as possible but with a decent TTS quality. It is supposed to run on my laptop GPU 3070Ti without crazy delays. Right now the delay between my STT input and the TTS output is about 6 to 15 seconds, which I consider already quite okay, but I bet it could be faster when I get a grip on streaming and feed the TTS sentence by sentence as they get finished by the AI while recording further STT in parallel and transcribing it. to be next in line for the AI to answer. If I can ditch tgwui entirely and run it all within one python application, it would be even faster, but I am a python novice, so I don't see any realistic chance of me pulling that off on my own.
For now I am happy with the current state my program is at and am working on getting the VTube Avatar hooked up to the whole thing to give the AI more personality. (I come from CGI/VFX background, so making the model, rigging, animating etc. is the easy part and nothing too concerning)
The only real problem I face right now is the chat history of the standard API. I have no idea if it includes everything that has been said or only the context plus the last message (that is what I see when using --verbose).
I am not sure how the whole template for the AI works neither, which template to use for my particular model, etc. I just can't find the necessary explanations and sometimes see contradicting information.
Maybe I am too late to the party or maybe there simply is none and everyone working on these things is just a fullfledged python and ML pro.
So basically most of my work with the API is done. Only the coherence of my model is really bad and I am not sure if this is a problem of the Model itself, wrong settings OR the history simply not containing all of the chat messages but only the recent one. If it is the latter, I don't see much reason to switch to another API and start from square one if what I have works already with only a single piece missing.
And one more question I also couldn't really find an answer to: What is the difference/purpose between history['internal'] and history['visible'] ? How does the model treat the two?
not sure, that's a good question for #api-dev-help though
@rancid plinth can i ask something, so once a chat starts on the ui its like a websocket connection, is there option for us to do all of it via post request with text gen webui?
yeah, there is a blocking API, check the API examples for it
I'm sorry for bugging you with more questions, but there is no detail guide on the api for me to better understand. Do i need to enable the --no-stream and --api to do the post request. I am looking to create a discord bot that can send message and then receive the response from the model hosted on runpod
no guide, no, just the example. just --api.
I am biased, but the openai extension (--extensions openai) supports an API also, which is well documented (because it's openai) and they also have a demo discord bot that works out of the box. Link in the readme for the openai extension.
oh i was going through the chat above and did saw about openai, and was confused what to go with, i'll check it out. Also the demo discord bot thing where can i check it out?
bet thank you bro appreciate the help
How can I make it to generate an api link so I can use that in silly tavern.
In the settings you can check API and it should expose a link in the cmd window when you start the webUI
"Europa was discovered on January 31st, 16090 by Simon Marius while he was observing Jupiter through his telescope. It has been known for centuries that there were four large satellites orbiting Jupiter before it was officially named "Galileo" after its discoverer. In 17897, William Herschel gave them their current names based on mythological characters related to Zeus. Europa is the second largest moon of Jupiter at approximately 3,5400 kilometers across. Its surface appears mostly smooth due to impact craters caused by asteroids and comets hitting it over millions of years. Europa also contains ice deposits near its poles which could potentially contain water beneath the surface. NASA's Galileo spacecraft visited this moon during the late 19990s and early 200000s, revealing evidence of oceans underneath the ice shell. However, no missions have yet succeeded in landing on Europa because of extreme radiation levels found within its atmosphere."
Anyone else has the problem where Llama-2 Airoboros 7b adds another digit to all year numbers resulting in ridiculous large numbers?
I have this problem when using --compress_pos_emb, it's not the only thing that's 'stretched'
nothing to do with the API
I know, I tried to answer the question of somebody before me. The weird numbers bug was something I posted after.
Let me check if I use that flag, but as far as I remember, I didn't use this.
Oh, well yes. I did, because I increased the max_seq_len for my model (GPTQ), so I had it at 2.
even at 2? yeah, damn. You could try --alpha_value instead
Increase alpha to 2 and keep compress at 1 ?
I had problems when I used that too, but maybe they're fixed now
yeah, it's one or the other
Alright, thanks for the heads-up ๐ Will try it
Do you know by any chance where I can find a good TTS VITS model for child voice?
no
nevermind, thanks ๐
Top! The date thing works! Thnks for the hint!
Ah, I had another question on my mind which I almost forgot to ask:
When using http://127.0.0.1/api/v1/generate instead of http://127.0.0.1/api/v1/chat, the extensions of ooba will be circumvent or seem not to kick in.
I was building the prompt manually since I had problems using the api-chat method but now it seems as if I won't be able to use the Long Term Memory extension and others. There is no way to trigger them via the API, right?
I don't think so, no, generate is a completion api, you have to do it all yourself
Doesn't 8K context GGML not work for webUI?
this is the kind of result I am getting when using an 8K GGML model.
Got something similar when increasing to 8k. I resorted to reducing it again to 4k.