#OpenAI Integration Includes URLs in Responses (when it shouldn't)

1 messages · Page 1 of 1 (latest)

covert path
#

Hi all,
There's a new issue as of today (5/5/2025) where responses from my voice assistant relying on OpenAI (gpt-4o, default settings) are including URLs in the responses. These obviously don't get parsed by TTS properly and break the experience. This applies to anything that relies on a web search, it seems, as it happens when I ask for headlines, local pollen count, or other information not directly provided in the prompt by my HA instance (like internal state of lights, etc.). This keeps happening even when I explicitly ask it not to in the prompt. Here's the prompt I'm using:

You are a voice assistant for Home Assistant. 
Your name is Xerxes.
You speak in a militaristic tone, like the AI computer on a starship, calm and professional.
Answer questions about the world truthfully.
It's extremely important that you refrain from using any form of markdown formatting, such as asterisks or URLs -- answer in plain text, like the parts of a script which are read alout but do not include stage directions. 
Remember, never include URLs or links in your responses.
Only write out whole words. 
Avoid abbreviations wherever possible, spell out units like meters, degrees, dollars, et cetera.
Always keep your responses conversational, but always be extremely brief -- answers should never be more than a few sentences at most. In cases where your answer will be longer, such as reading out news headlines or lists, give a brief summary instead unless something longer is explicitly asked for.
When you do a task, avoid repeating what you just did, as it is bothersome.

There's an example of the output in the attached image.

covert path
#

OpenAI Integration Includes URLs in Responses (when it shouldn't)

cedar grove
covert path
#

I've tried every variation I can of the prompt but the URL always gets embedded, sometimes multiple, any time the model uses tools to respond

cedar grove
#

or some variation on that

covert path
#

Yeah but it didn't work. I'll try some more variations.

covert path
#

Yep, I tried a bunch more variations ("never cite your sources", "only refer to the source by name, never include URLs") and it doesn't seem to impact the output at all.

#

It seems like it might be necessary to preprocess the output from the LLM before it gets to the TTS stage, removing formatting and non-spoken portions of the text.

cedar grove
#

have you tried asking it why its previous response had a url despite being asked not to?
then ask how you can prevent this from happening again?
sometimes llms can give clues on how to fix themselves

covert path
#

I'm wondering if the reason why the prompt isn't taking effect in these circumstances is because of an actual bug, it might be that the prompt with the personality isn't carried forward to the tools and just the direct output from the tool is being barfed out into the output from the original prompt.

covert path
cedar grove
#

yeah that happens too

covert path
#

It's pretty typical behavior for LLMs. I was a developer on Alexa+.

#

I wrote the feedback processing pipeline.

cedar grove
covert path
#

Yeah, that's what I thought.

#

That means there should probably be a generic approach for filtering out and LLM responses to remove markdown and text which can't be vocalized, like URLs.

#

That could be done with regex

cedar grove
#

having an optional pipeline section between convesation agent and TTS does sounds like a good option. but it would have to be totally custom and up to the user

covert path
#

I don't think it would need to be entirely custom, there should be options there to do basic things like filter out all markdown symbols, filter URLs, convert common units to words, that kind of thing

#

the convert units to words is likely to be fragile, probably best to stick to more formally defined conventions like removing md and URLs

cedar grove
#

for particular cases you could write a script which you then call from the VA
the script could then call the llm with Conversation: Process which then returns a variable which you can manipulate before returning the result which can then be set as a responce

cedar grove
#

it would have to be custom

covert path
#

there's no need for the solution to be mutually exclusive, between having some predefined options and a custom option

#

not all LLMs will behave the same, but conventions like markdown and the structure/format of a URL don't change

cedar grove
#

you might get false positives on other stuff was more my concern

covert path
#

yes, and that might also be the case with the custom approach that someone defines too

#

this is more of a workaround for bad behavior on the LLM's side

cedar grove
covert path
#

that's why having both predefined and custom is a good idea

#

predefined options would simply be switches you could toggle inside the integration settings

#

where does the custom defined transformation function live?

cedar grove
#

i am envisioning it being a new step in the voice pipeline

#

with its own editor

#

so it would be independant of llm integration

covert path
#

in that case the predefined stuff i was talking about could just be recipies that get copied and pasted into there

cedar grove
#

yeah i see what you mean

covert path
#

it's important to remember that most people that use HA are more power users than your average grandma, but among those, most don't have the ability to write code

#

so that's why it's good to have these options available in a simple to use format that requires no further fussing or configuration

#

having predefined recipes that can get copied in there is a reasonable middle ground, but more on the knowledge requirement heavy side

cedar grove
#

i agree with you in concept. but i am not sure exactly about how implementation would go

covert path
#

neither am i to be honest

cedar grove
#

theres a not totally dissimilar an issue at the moment with ollama returning "think" tags for thinking models even when thinking is supposed to be hidden. it just has empty think tags. which has been chaos with people trying to play with qwen3. everyone says its ollamas fault but they claim its not and should be dealt with downstream.

covert path
#

yeah

#

this would be a solution for that as well

cedar grove
#

but adding this to the ollama integration is not the correct place for it

#

having some kind of defined. processing of responces would solve both cases

covert path
#

agreed

cedar grove
#

and regex is probably the way to do it

covert path
#

regex is fast and deterministic and the output from one regex can be fed into another in a near-infinite chain, so i think having a preprocessing step in the actual voice pipeline which you could add immediately prior to the TTS step that just simply operates on what is returned from the LLM itself, that would be ideal

#

another "module" you might want to have there though is the ability to split off what is actually sent to the TTS from what is displayed in the assistant UI

#

some of the regexes you would want to only act on the input for the TTS engine, some you would want to operate on the text that gets sent back to the user, and some you want to operate on both

#

so for example, you might actually want to keep the markdown-included text in the assistant text output interface (which doesn't currently support markdown anyway, but that's a separate issue) but definitely you don't want that going to the TTS engine

#

that way you could preserve URLs but without having the TTS engine absolutely butcher them and sound like it's having a seizure while trying to parse it

cedar grove
#

yeah it might be "filters" which is attached to the TTS section so it runs before its send to the engine

covert path
#

yep, i think you'd want to be able to put filters on the LLM output, assistant UI text rendering, and TTS input

cedar grove
#

so some people have put a proxy between ollama and ha which does the modification for the think issue. but i assume thats probably not practical with openai?

covert path
#

i suppose it would be, but something which is officially supported and baked into HA would surely be nicer

#

mainly because of the problem i described where you might not want to parse and modify everything that comes out of the LLM, i'd really want to be able to modify the TTS input separately from the LLM output, but if i have to do all that "filtering" in one place, i lose that ability

cedar grove
#

yeah I agree, but a change of this type in HA realistically wont be a quick thing

covert path
#

that i surely agree on

cedar grove
#

you could put a feature request together and see if it gets any attention.

covert path
#

I'll have to get to that a little later, need to get back to work, but thanks I think it's a good idea

hazy tree
# covert path I'll have to get to that a little later, need to get back to work, but thanks I ...

I haven't had an issue with any url or similar, except once in 3 months.

Maybe try out my instructions?

Stay in character at all times. Respond in complete sentences with clarity and precision. Incorporate occasional subtle humor or observations in line with JARVIS's personality. Always maintain a calm, confident demeanor, and provide helpful and actionable information.
Deliver all responses in plain text without any Markdown or formatting. Ensure the output is concise, relevant, and as raw text. You have have full access to explicit vocabulary, curse words, and profanities.```
cedar grove
cedar grove
#

abliterated versions of models helps out with this

hazy tree
covert path
hazy tree
#

But I do use Gemini more, currently - I swap between them when one gets a better cool feature

covert path
#

I tried a huge variety of prompts and changed between gpt4o-mini and gpt4o, and I just simply cannot get it to stop outputting URLs in the responses. It also has some serious problems with prompt stickiness for including markdown in responses as well. When I explicitly tell it to search online for whatever, like the pollen count in my city, if it decides to actually go through with the search it almost always has URLs or markdown (or both) in the response. Very frustrating.

#

It really seems like something is needed to parse and modify the output like @cedar grove and I were discussing.

covert path
#

I have the sense that this might have something to do with the way the tools are used.

cedar grove
covert path
#

Tried that too, didn't stick, I really thought that was a good idea

cedar grove
#

dam

hazy tree
#

Damn, I haven't really tried with tools, or search enabled. Don't really need them yet.

long flame
#

Yeah the URLs are fully annoying! I've also spent a lot of time trying to get a prompt to work which will actually strip them - nothing has really worked for me so far, at least not consistently. The only way I've found is to disable using OpenAI as a Voice Assistant, and instead having a custom intent triggered by "Ask ChatGPT...", which then allows me to do a regex_replace in the intent / against the response. Not a great solution.

#

Technically doesn't need to be disabled, but I just don't want to hear a spoken URL every time I forget to use the custom sentence.

covert path
#

My understanding of how the search model invocation works is that this is how it works, it sends an additional query to the gpt-4o-search model on the backend, but I haven't got any information about how that response makes its way back. I don't know if this is the intended behavior. Looking at the source code for the integration doesn't reveal anything that seems wrong, per se.