#devs_voice-archived
1 messages Β· Page 4 of 1
If there are any projects already started I would love to participate. Otherwise I would try to start some experiments with it
shouldn't such an integration expose its own conversation agent and not necessarily rely on the current intents as a dependency? π€ however, I see no issue with multiple PRs for new intents if you want to use the existing intents.
Yeah, that is where I'm not sure about how to start. intents are only used by the home assistant conversation agent?
for now, yes, as far as i am aware. i don't know of any other integration that uses them
ok thx. I will take a look at the current ChatGPT agent and try to integrate functions to it
its all new for me (longchain and the conversation agents) so it will be very experimental
I have a personal project that does this, by adding a new conversation agent (which I started by copy pasting the openai one and then started to add callable functions (like control_light, shopping list and calendar) to it). Works pretty good.
I decided to branch it off a bit though, so that I run the conversation agent as a small api inside a docker container instead and I will make the conversation agent component in home assistant a thin client that only passes on the conversation to this api. I find it much faster to develop this way, probably because I'm not experienced enough with how to set up a dev environment in home assistant (I end up restarting the dev container a lot).
i tried creating a wyoming service using elevenlabs because i thought i would be able to reduce latency by processing one sentence at a time and sending one AudioChunk for each sentence. but it seems like the client waits for AudioStop before starting to play audio (only tried in the config "try voice" in the browser). is there a way around this?
Same with the mobile assistant, anyone know if this is the expected behaviour for all clients? E.g. the wyoming-satellite also?
Hey everyone, I have very small question π Could you explain me what is a different between slots and requires_context? When I should use slots? It will be nice also share with me example π eg. What is a diffferent between this two yaml configurations?
- sentences:
- "close <name> [in <area>]"
slots:
domain: "cover"
and
- sentences:
- "close <name> [in <area>]"
requires_context:
domain: cover
I found information about slots only on this link https://developers.home-assistant.io/docs/voice/intent-recognition/template-sentence-syntax#responses but about requires_context I found only this information https://developers.home-assistant.io/docs/voice/intent-recognition/template-sentence-syntax#requiresexcludes-context. I also check example from the hassil repository https://github.com/home-assistant/hassil/blob/main/examples/en.yaml but still I can't understand differences.
requires_context: {domain: "cover"} means you are only going to match that sentence if the provided {name} (<name> is shorthand for [the] {name}) is a cover
slots: {domain: "cover"} means you are setting the value of a slot named domain to cover, regardless of what domain the {name} has, which may have implications in how the intent is handled
this part makes 100% sense to me πͺ and that's exactly how I understood it
unfortunately, I still don't understand this part, why I should set the domain value via slots?
Generally, you shouldn't. But say you have a sentence like "is any door locked". There is no entity name there to reference the lock domain, so you have to let the intent handler know that it should query for entities pertaining to the lock domain
@worthy wave I converted your message into a file since it's above 15 lines :+1:
That is correct, but i can't remember off the top of my head if you need requires_context if you don't have {name}. I am inclined to say no
Now it's much more understandable for me π Thank you very much for your help πͺ
If I may ask one more thing, I'll change the topic a bit. π I have two devices M5Atom Echo. My experience with using them and wake words is not very good. Is there any way to check how sound is transmitted from M5Atom to HA and how on the live HA tries to recognize wake words? I don't really know where the problem is. I have HA installed on a NUC 13 Pro and it is a powerful machine. Very often the word wakeWords is not recognized.
I also did tests with a speaker Jabra SPEAK 510 MS (via USB) and here I also had a lot of problems for wake words to work properly. Do you know how to examine where the problem is?
I tested the script locally https://github.com/dscripka/openWakeWord/blob/main/examples/detect_from_microphone.py and in each case, each selected word is recognized correctly..
@worthy wave I converted your message into a file since it's above 15 lines :+1:
I watched the entire last presentation (Voice Assistant Contest Launch) and there it was indicated to use the debug mode in Pipelin. But unfortunately it doesn't work very well for me.
That is a question answered multiple times (and pinned) in #voice-assistants-archived
Thanks, I'm going to look for it
No need, i linked it
@worthy wave I converted your message into a file since it's above 15 lines :+1:
Sorry, I wrongly create message above π @scarlet echo I have another question. Is is possible to use the same sentences for support two different domain? eg. I Poland we can say: "Open the door". The meaning of such a sentence is twofold: Open the door (eg. patio doors with the electric engine) and unlock the door (just unlock the door lock). So we can cover two different domain cover.door and lock. Can I do something like that? (Check example below)
Yes. It requires a combination of requires_context and excludes_context so as to make the sentences mutually exclusive. I've had the same issue with opening covers in Romanian, where "opening" is a subset of "turning on" (there are a few more words for turning on)
In your example you are missing a requires_context: {domain: "lock"} (or cover, respectively). If there are any clashes with words in the generic homeassistant_HassTurnOn or Off, then you need to specifically excludes_context there
Otherwise, the sentence might match the generic homeassistant_HassTurnXx
However, the door and the lock in your example need to have different names/aliases, otherwise the first matching sentence's intent will be handled
i just wrote Whisper API wyoming server. https://github.com/ser/wyoming-whisper-api-client
using it with local whisper.cpp server instance
it's not extremely faster than faster whisper but it's advantage is to utilise much more popular API
Hey, I know you can't judge language you don't understand, but if anyone notices some obvious stupidities on this PR concerning something else than the language structure itself, please let me know. https://github.com/home-assistant/intents/pull/1990
I haven't done this large PR ever and this fast, so ...I'll have to fix some missing tests. I wonder why none of my local commands complained about that?
Hi, does anyone know if there is a plan to integrate GPU to Whisper and Piper to Home Assistant so that we can plug and use a GPU ? Thanks
https://github.com/ser/wyoming-whisper-api-client but no docker support atm
Hi @ivory vessel What is the minimum length/number of voice samples to be able to create a new piper voice? Of course, I am aware that the more the better... π I rather wonder whether 4 hours is enough or whether much more is needed? My second question is whether it is possible for you to train a new language from such samples or do I have to do it myself. My only problem is that I don't have the right equipment to do something like this. Of course, the sound will be publicly available.. π PS. I talk about polish language
@worthy wave have you tried xtts2?
@ivory vessel A completely different topic I'm wondering is whether it is possible to train the current wakeWords using your own voice or is it rather difficult to do? I'm asking because after my recent tests I noticed that the M5 Stack Echo microphone works properly in my configuration, but due to the fact that my accent is not English, the word itself does not work super accurately to wake up the microphone. And I was wondering if I could record several dozen uses of such a word by household members and whether it was possible to train such a model to work more correctly.
What is this?
you talk about https://github.com/BoltzmannEntropy/xtts2-ui?
yes yes.. I saw this, but it works only with english accent
so no, there is nothing else and custom verifying models do not work for home assistant
https://github.com/coqui-ai/TTS with XTTS2 model https://huggingface.co/coqui/XTTS-v2
jakoΕΔ tego modelu wbije ciΔ w fotel
that is the reason why I ask if there is such a possibility to add several dozen additional samples of voice to create "my own" wake worksfrom current existing models π
great, can you have any example π
you need to read openwakeword github issues and eventually chat with mr dscripka
i think there are examples on https://huggingface.co/coqui/XTTS-v2
yep, but without polish language example
so polish is identically awesome
you won't recognise it from real speech
please note, GPU is a must for that model
to have real time inference
Both topics seem a bit more suitable for #voice-assistants-archived (where they have been discussed previously)
I have also seen it, I already have samples but I do not know what to do with them 1500 examples (about 2 hours) π
Piper recording studio provides the "right" amount of samples you need to read aloud to train a TTS voice model
XTTS2 needs just few samples, btw
when is the DL to merge translations for 2024.3?
@scarlet echo I have small question π I'm one of Polish leader, but I don't know why I can't merge PR prepared by me and accepted from other person.. can you explain me what I should do to be able also merge PR related with polish language? Sorry if this question isn't for you π
i'm not a repo admin, but i see you're not in the language-leaders group https://github.com/orgs/home-assistant/teams/language-leaders?query= so I don't think you have write access to the intents repo. only @ivory vessel can help
for example @shell dirge is in that group and he never had issues with merging PRs
Working on Slovenian translation of sentences in VA and one thing I can't figure out (for now on my priority list) is how to tackle the speech output of numbers. Example: the text output (of sensor value) is 22.5 Β°C. Which is fine, but when spoken on VA I get "two-two-five C" (in Slovenian). Any advice how to tackle this to get whole number output? And decimals? The spoken output ignores decimal point...Thanks
Does slovenian use a comma , as a decimal separator?
If so, take a look at this and the following messages #devs_voice-archived message
don't the intents repo use the same setence parser?
On HA, for "coloca batatas na lista" I get "batanas n" as item.
Using script.intentfest parse from the intents repo, I get "batatas "
https://github.com/home-assistant/hassil/issues/92
looks like that bug, but in this case only happen on HA π€ which is weirder
you have to look what version of hassil you have both in HA and in the intents devcontainer (or whatever you're using)
I have some Polish sentences here: https://github.com/rhasspy/piper-recording-studio/tree/master/prompts/Polish (Poland)_pl-PL
This is about 2 hours worth of audio, and is definitely enough since I've trained a few voices so far π
If you'd like to use the contribution website to record (https://contribute.rhasspy.org/) let me know at voice@nabucasa.com and I can get you a login code.
This is possible with snowboy: https://github.com/rhasspy/wyoming-snowboy
For openWakeWord, I will need to train a large multi-speaker Polish model in order to train Polish wake words. The data exists, I just need to find the time π
I have recordings with text. However, at the moment I do not have the equipment to train new models. I can share these recordings now, then maybe new models can be trained π
I plan to create at least one more women's voice
Sure, if you're willing to share the audio I can train a new voice π
Regarding the snowboy, unfortunately I had problems with it and ultimately I was unable to train the new model with my voice. I also noticed that it only supports English and Chinese languages
For now, I have focused on a major update of the Polish language for intents π
I don't know if there is any option to speed up PR checking.. π because I don't have the ability to merge them
It looks like the language leaders are taking a look at your PR (assuming https://github.com/home-assistant/intents/pull/1996). We still have more than a week before the next HA release, so plenty of time.
Yes, but unfortunately this is only part of changes... the whole thing is still missing, related to checking the sensor and binary_sensor... and I don't want to work on it until this PR is not closed (this is a bit related) π
@ivory vessel If I am currently training a voice and the test output seems to have hit a plateau (using the test output) Can I simply halt training, update the dataset wav and csv with more data, run prep again then resume from my current checkpoint? Will doing so start trianing with the additional data or do I need to start over with a clean set of training folders?
Yes, this should just fine. I'm guessing this would be the better approach, but an alternative would be to just prepare the new data and resume training on that alone. It would be worth experimenting with if you have the time, and I'd be interested in the results π
Well at the moment I am working with limited data as I am trying to train a voice off of existing samples not samples I am creating..
As it takes hours to see the results I just was curious if I had the correct approche
one question about the voice assistant: Do we want to add basic-but-not-extricly-smart-home-related commands like "what time is it?"
I find myself missing asking alexa alexa about what time is it, adding timers while I'm cooking, etc...
I think adding them would make for a smoother transition for people already using alexa and google speakers
"what time is it" you can add directly from automation π it is very easy..
@worthy wave you mean with custom sentences?
no, just automation inside HA
I'm not following I'm afraid
so what I said, with custom sentences ^
I see. I saw them as one and the same
for sure, one can add any sentence to perform anything they want. My question is if we should ship by default several of the most common ones
much like we have sentences for the weather, we could have them for time
and similarly to how we have sentences to manage a shopping list, i'd expect senteces to set alarms and timers
I'm speaking from a user's perspective that is looking to replace alexa with HA
@spare forge go ahead and propose some intents and/or sentences. There's nothing set in stone
There is a VA expectations poll in the forums that might give hints as to what people expect
https://community.home-assistant.io/t/poll-what-do-you-use-your-voice-assistant-for-what-do-you-expect-it-to-do-multiple-selections/693669/5
Setting timers and alarms is very high on that list
@scarlet echo in the architecture repo?
I think you can just go for it. @ivory vessel ? Thoughts?
I did a couple proposals in the past, but I wasn't sure if the architecture one was the right repo for it or voice had another one
@spare forge Can you link those to me so I can collect them into a list? Thanks!
I'm not aware of a comprehensive list of basic tasks that alexa knows how to handle, but I'll try to search one or compile one myself
some of them, like timers, might require creating new services, others like asking for the time seem rather simple
All of them require new intents
Here is one for Google https://support.google.com/assistant/answer/7172842?hl=en
Thanks @scarlet echo same in Slovenian. The value with (,) is properly spoken out by TTS, the value with (.) is not. But I could check it only on Try voice button on VA Settings, since the jinja2 filter replace doesn't change the sensor value properly. -> I get the same sensor state in Dev tools is with . (see pic1 https://ibb.co/X5XG1dx ) but when I open it it's with ,localized (see pic2 https://ibb.co/ygfF7XG )π€¨ It gets me mad slowly... In template editor in devs section I get sensor state with dot (.) (see pic3 https://ibb.co/zVsJyBN) but I have configures in personal settings as 1.234.567,89 (see pic4 https://ibb.co/4WSzHhv ). If I use replace filter in template I get thisπ https://ibb.co/WHcQB95
new intents for sure, but some might require even new services that don't exist
Timers are going to require HA to be able to initiate a TTS response on the satellite, which it currently can't do. But I think this will be pretty straightforward.
Well, really just an event when the timer elapses. Not necessarily TTS.
A timer is just a future time stamped event with a name and destination media device really
some assistants allow you to set named timers
named timers are critical for people who cook IMO. I set timers to know when something i'm boiling is done, while I'm baking something else, while there's another one for the max screen time of my daughter
I 100% need names on my timers
Named timers will definitely be supported π
I am using HA 2024.2 (the problem exists for 2 or 3 releases already), and checked out intents tag 2024.2.2, so I think they are on the same version
well, even HA on dev has the same issue, so it is not a version mismatch...
found it! its because HA uses recognize_all while the intents parse script uses recognize
and recognize_all returns two results: the correct one and the one with the "n"
@ivory vessel I opened two discussions in the architecture repo. One for adding support for basic sentences like asking the time, setting alarms, etc..., and another one to add a "Brief mode" similar to the one alexa has, which makes it prefer shorter responses over verbose ones
Can one set up a Wyoming server or something and capture all recorded audio from an ESPHome satellite without trying to use it for HA? I.e. without running it through a pipeline?
@ivory vessel Is this intent release still just a beta release? Will there be another release before the official launch? I wanted to perfect the cover and valve parts for area management today, but you were super fast this month. π
Thanks! I'll take a look.
Yeah, I'll do another bump on release day so there's still plenty of time π
As a follow up since you need a fairly significant minimum amount of data for the prep to complete without error, creating a completely new dataset completely wasn't practical.. Instead I have added some new incremental data targeted at problem words and removed some redundant data from the main set to try and PULL it to a better spot... It is still sluring some words however.. I might abandon the current checkpoints and go back to starting from a good one with a revised dataset but at the moment my loss gen is going down so I am going to give it some time to bake
Not sure if I should be on ESPHome Discord or here... I'm trying to setup the S3-BOX-3 and the wakeword works... but then when I try to interact with my voice pipeline/assistant, the last log entry I get is:
[D][esp_adf.microphone:273]: Microphone started
[D][voice_assistant:414]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE
... I'm not sure if anything is making it to whisper or not
Hmm even after re-training my voice model seems to be slurring words as compared to the source data, I wonder if it is a factor of having too small of a dataset or if the training prep stage is attributing the wrong phonetic breakdown of the source for some reason.
Do I get it right, that when I want to limit the intent to a specific domain, e.g.:
requires_context:
domain:
- cover
This only works if the sentence contains the entity {name}, So there is no way to say I want to control entities of a specific domain in an {area}?
And if you do not mind a second question. How to address devices in the same area as the assistant device. I have found 2 ways: ```
requires_context:
area:
slot: true
and
slots:
name: "all"
Are they equal? Which one is right? How do they work? I could not find it in the doc.
It's probably better to explain what you want to achieve. Generically speaking, requires_context: {domain: ['cover']} applies to entities referenced by {name}, indeed
If you want to target devices which are in a specific area, you need to requires_context: {area: 'bedroom'}slots: {area: 'bedroom'} for example
what this does is that it creates a condition to filter only entities which have an area assigned, but slot: true means that the area of the satellite is promoted to a slot and used for filtering entities from the same area that the satellite is in (which is provided automatically in the context)
this one does next to nothing. it's used for automatically upgrading the service call to name: "all", but it would do the same if name was None. for example in requests like turn on the lights in the kitchen
either way, it has nothing to do with area
Thanks
i think i made a mistake here and I corrected it
if i want to ask for the temperature in the bedroom, I'd like to ask what is the temperature in the {area} bedroom. But I think I have to ask what is the temperature on the bedroom temperature sensor to get the entity name in it
(and in thsi case this is going to crash into the climate domain, but that's another thing)
are we talking about built-in sentences?
Yes, have been contributing the sentences in cs. It works, but in these two areas I do not fully understand how it works.
understood. there was a voluntary choice not to respond to what's the temperature in the bedroom with temperature sensors, but only with climate current_temperature
a sensor with device_class: temperature could just as well be a 3D printing nozzle temp sensor, a fridge temperature sensor etc. whereas climate entities are pretty straightforward
that said, you can still ask (in English, at least) what is the <device_class> of <sensor name> and some variations, but you have to name the sensor (or its aliases)
understood. probably works for thermostats. if you use TRVs they usually do not measure temperature, but I guess it is what it is
some TRVs expose climate entities. what exact entities do you have?
just the TRV switch?
No, zwave or zigbee TRVs. And they show as climate entities. Anyway, I can always havea custom automation to inject the current temperature from the room sensork if I am desparate π
so what's the problem, then?
i mean if they expose climate entities
got sidetracked, the question was about the filtering device class when refering to area, not the device name. And using the same area as the satellite
specifically for temperatures, you don't need to filter anything, as the HassClimateGetTemperature intent only applies to the climate domain, for which there are no device_classes. you can't query sensors and I've briefly explained why
for querying/issueing commands in the same area as the satellite, this is how to do it
that piece of YAML makes sure that area was included in the context (coming from the area assigned to the satellite) and it promotes the area to a slot, which is then used for filtering entities
if you're good with Python, i strongly recommend going through the hassil code and the default_agent to really understand how intent recognition and handling works
Can you provide an audio example? Also, please remind me of the voice model's language.
What is a good way / your prefered way of sharing audio files? I am training an English model from existing samples and synthetic ones from another model.
@ivory vessel sent you some samples as a direct message
yup, that looks ok to me
Thanks
Well, I managed to include everything π even the weather status display is now working in Hungarian. Thank you for waiting this long.
Do we have any intent for stopping opening/closing cover? Example: when the sentence is called: 'open the blinds' the HA starts opening the blinds which takes some time...if we want to stop in the middle or in some desired position - is there any intent already? Or did I miss something? Thanks!
we don't currently
Thanks @scarlet echo, probably the only way is with automation and custom sentence as a trigger?
i guess, yes. but we could add the sentence(s) and intent. could you open a ticket please?
Speaking of covers, the documentation says that HassOpenCover and HassCloseCover are deprecated, and we shall use HassTurnOn/Off. Is that the goal? I haven't seen any language that has this implemented (at least EN does not have it). How does the Stop fit in?
Isn't coverHassTurnOn and coverHassTurnOff we are talking about? In intents I mean
Easy solution for your last comment - take a sharpie and draw a scale on the wall next to the blind. I am sure the partner will understand π
that's IF the cover supports HassSetPosition. If it doesn't, you're out of luck
what do you mean it's not implemented? there are no HassOpenCover and HassCloseCover being pushed forward, as only HassTurnOn and HassTurnOff are used in Assist, in all languages https://github.com/home-assistant/intents/blob/main/sentences/en/cover_HassTurnOn.yaml
I mean that if I look at the HassTurnOn intents:
- sentences:
- "<turn> on (<area> <name>|<name> [in <area>])"
- "[<turn>] (<area> <name>|<name> [in <area>]) [to] on"
- "activate (<area> <name>|<name> [in <area>])"
It reacts on turn: "(turn|switch|change)" or activate. So words open or raise that are used in HasOpenCover are not implemented.
π The desired position is relative...for instance relative to the shade from sun, and it's 'different' every day (even not noticeably) π
Yes, cover is in the domain. So you can turn or switch it on, or activate it (whatever it means)
no, it's listed in the excludes_context, which specifically does not match these sentences with entities from the cover domain
I couldn't agree. Cover is open or close. Not on or off like switch
Sorry, Mea Culpa
what does match is what i've linked 5 messges before: #devs_voice-archived message
Aaa, I was confused. Sorry for wasting the time/space here
@scarlet echo just checking your PR https://github.com/home-assistant/intents/pull/2045 because I had some troubles with intent valve (I had to differentiate from set positionand open valve, so I had to use a synonym in sl). Is this the reason the homeassistant_HassSetPosition.yaml was deleted? Thanks!
I have split the homeassistant_HassSetPosition into domains and nothing else supports the intent other than cover and valve, so yes, i deleted the homeassistant domain
Some help please: response for sensor_HassGetState which is one in form: {{ slots.name | capitalize }} je {{ state.state_with_unit }} gives me clumsy response according to sl language.
How can <class> (from expansion rules) be added in front of slots.name. So the response will be more human friendly? E.g. for duration sensor: "**Trajanje** trenutnega programa pomivalnega stroja je 64 min" I need bolded (**) word which is from <class> expansion rules? If I put slots.device_class in front I get device class untraslated e.g. duration not trajanje in Slovene. π
I'm interested in designing an open source Voice Assistant hardware and a elegant case compatible with Echo Dot V3 accessories. What component level hardware would be best for the community to get Far Field audio capture with 3+ mics, and be able to exclude it's own audio? I'm currently working on the project over here https://community.home-assistant.io/t/far-field-satellite-with-an-elegant-3d-printed-enclosures/699893
I don't feel like there are any ideal off the shelf modules that can quite compete with Amazon or Google, especially when it comes to the Mic arrays for directional/far field arrays. Has the community found a good Voice processing unit that works with a ESP32?
the Onju Voice is pretty decent from a few meters away https://github.com/justLV/onju-voice
It's a nice project, though still lacking a dedicated VPU, and I would guess have issues with detecting voice with Barge in, or echo cancellation. Wouldn't this type of chip make the audio pickup easier on the ESP32 https://www.microsemi.com/document-portal/doc_download/136798-zl38063-datasheet
I suppose the question is, what hardware would make the life of the developers easier to make a ideal smart speaker? To start with focusing on the MIC array that can work in noisy environments from a distance
Hey, for some reason whenever i say "SÀÀdÀ", Assist hears "Saada" (which is also a Finnish word). Should I fix this in intents, or open an issue elsewhere? Translation is roughly "Modify". "Saada" isn't something that I can immediately think for any voice commands.
That's an issue with the STT engine. What are you using?
Mike H has a workaround for similar sounding words which would help in exactly these situations, but it's not ready for prime time yet (mostly due to missing text-to-phoneme libraries with a usable license)
To answer your question, adding nonsensical words to the sentences just because that's what the STT hears is a band-aid on a broken bone and i would advise against it
This is what I thought as well that it's just a "bandaid". I'm using HA Cloud for STT
I'm using HA Cloud for STT
Ouch! Take a look at this discussion for details on the other thing i mentioned https://github.com/home-assistant/hassil/pull/80
Hello
is there any way to set the "assist" when it doesn't understand the request it send the message to gpt api so like that we can use both the control of assist and the power of ai in the same time
There is no built in way as of right now, only with a custom integration.
@scarlet echo I was excited to see my media_player intents in the release today! I totally appreciate dev time is precious but was wondering if any of the other service calls, particularly media_previous_track (as we now have next track) are on the roadmap? I have the custom intents I am using for all the other service calls ready to go!
Thank you for those contriibutions, btw! I have no clue about the roadmap. But I encourage you to add those new intents whenever you want (just mark them as unsupported). If you need any help, I'm here
for example, I've just opened a PR for the implementation of a HassClimateSetTemperature intent. No idea if it was on the roadmap, but I've heard many times that the roadmap was largely influenced by community contributions
Great I can do that. What do you have to do to mark them as unsupported?
OK so I will need to amend that file, add the sentences file and the tests file. Will try and get to it soon!
...just for English and then make sure all linters and tests pass
Last question. I saw above discussion about homophones. In one of my intents I have "Clear <media_player> (queue|cue|Q|cube)" Is this kosher? I can forsee frowns about cube (must be my poor pronounciation but it saves me a lot of didn't understands!) but the other three are homophones so should I list at least those three?
you should not list them, especially for a first iteration of a new intent. however, if they help you, i'd totally suggest having that particular custom sentence on your system, tied to the same intent
@ivory vessel @noble copper were the last intents updates included in yesterdays release? The new Dutch intents are not working (eg volume of media players and vacuum start)
They don't work in 2024.3, but they do work on the 2024.4 nightly
I messed up and got the PR in too late π¬
We'll have to wait for the point release, unfortunately.
okay, no worries, but that explains it π
something else
STT sometimes adds comma's. If it does everywhere all is fine, but not if it only adds one
Weird, seems to work in English
Oh, wait. It does fail with the same intent. I'll take a look.
Thanks!
Oh, this is such good news, I've been searching for two hours what the difference is between the dev and the stable weather π Because what I put in doesn't work. But now I'm relieved.
i also noticed today that "set curtain to 90%" worked but for some reason "set curtain to 100%" became "set curtain, to 100%" (translated from finnish)
Sorry about that! I was trying to get some last minute PR's merged and I missed the window π
Oh, I didn't write this in a negative way. I was just really happy that I didn't mess something up. I couldn't do what you guys do month after month π I do pay attention to what work goes into releasing a main build.
Is there a list of supported ADC's/VPU's that are currently supported by the project, be it with a ESP32 or RPi? I'm aiming to design a new satellite with beam forming
Is the ZL38063 supported? https://www.microsemi.com/product-directory/connected-home/4432-zl38063
ESPHome supports I2S microphones directly: https://esphome.io/components/microphone/i2s_audio
The Raspberry Pi would need an I2S kernel module.
@scarlet echo Just doing these extra media player intents. I think I have got it all working. Tests are all passing EXCEPT when I add a response key to the tests. Then I get an assertion error even though it is an identical format to the previous ones I have done. Ideas? The error looks like AssertionError: No response template for intent HassMediaClearPlaylist named default: clear TV queue
Testing issues
should we have support for (custom) integrations to define their own set of intents and sentences which could be added by default to the Home Assistant conversation agent?
so for example, the Alarmo integration could expose an "arm Alarmo" or "arm [the] home alarm" sentence that the default conversation agent would adopt and have ready to use instantly
or since I've been discussing with Gav from Music Assistant, the MA integration could expose specific intents for media playback or other media-related actions
thoughts?
So I assume that I would treat the processed output from the ZL38063 as a Microphone input to the ESP32 then π
How advanced is the current audio processing on the ESP32 S3. Can we do Acoustic echo cancellation and/or beam forming in software?
Has anyone managed to show the beam form direction picked up on the mic array back to the end user? Specifically in a similar way to an amazon echo?
The ESP32-S3 is capable of echo cancellation and other audio clean up (not sure about beam forming), but it is not being used currently in ESPHome. Espressif's ADF libraries are not exactly easy to use outside of their examples π
I only see echo cancellation, blind source separation, and noise suppression listed here: https://www.espressif.com/en/news/esp-afe-algorithms
Ah, that's good to know that it's technically possible. I'm guessing that boils down to Espressif IDF has the library but Platformio is lacking. That also explains why the development boards I've seen omit having advanced sound processors (as they expect the S3 to do a bit more heavy lifting)
Thank you for the link, that's what I was looking for, I just couldn't find the keywords to track that down
If there currently isn't support due to the Espressif library not being ported to Platformio, do you think there is need to offload that processing into a dedicated hardware chip such as https://www.microsemi.com/product-directory/connected-home/4432-zl38063
How is development going in regards to space usage of Flash and Ram, looking through the ESP32 S3 data sheet it seems that it supports up to 1gb on both. Would that be ok any use? (Above the 16mb and 8mb?)
It looks like 64 MB would be easy enough to get up to and still be inside the virtual address space
I've opened a discussion in the architecture repo for including devices among the things Assist can query to do its job (e.g. what is the <device_class> in <device_name> - what is the temperature in the fridge). If you think it's worthy, please vote and/or comment https://github.com/home-assistant/architecture/discussions/1060
bump ^^
What can I do to get this merged? - I tried making the same changes via the preferred github codespace method however I don't have the necessary permission to push, I think I need to be a language leader? feels like a chicken-egg problem, need two PRs to be language lead, but can't get these merged π
@ivory vessel
Hi @worldly narwhal, sorry about the wait! I've assigned myself these PR's to take a look π
@worldly narwhal There was some problem with the CI and I couldn't get your PR's to run the tests. I pulled the changes into a single PR and got it merged. Adding you as a language leader for sw now. Thanks for your patience π
I think the documentation is slightly incorrect. You need to publish it to your repo first, then make PR from there. I do not think people have the right to publish the branch to home assista t intents. Correct?
You need to open 2 PRs, not to merge them yourself
I was not talking about Merge, but creating the PR. I think when I follow the documentation step by step, when I create PR, it tries to publish the branch to homeassistant/intents first, and then create PR from this branch to main. But I have no permission to create branch on homeassistant/intents. So I had to publish it to my account first, and create PR from there (I think it automatically creates a fork of homeassistant/intents first - don't catch me there, I am not a git expert).
π₯³ thanks! wait was well worth it π I'll get started on the rest - if there's something I should change/fix in the future to avoid CI problems I'm all ears
that step was indeed a bit confusing, but clear for me personnally now, going to stick to the codespace method π (these first PRs were from a forked intents repo - hadn't yet seen the message that codespace was preferred), after trying a few times i'll see if there's a way/need to improve the tutorial
codespaces have nothing to do with forks (you can create a codespace on your fork) and it's a good idea to update the sentences on your own fork and then create PRs. You can also work on branches in the intents repo, but never commit dirrectly (without a PR) on the main branch
I think the last sentence is not correct. People are generally not allowed to create or commit to branches on homeassistant/intents (and I do not mean the main). You might not see that as you have more rights.
So it is not only a good idea to update the sentences in your own fork, but that's the only way. And this is also what is confusing on the documentation.
Nobody was talking about making commits to the main directly I think.
Now that @worldly narwhal is a language leader, he can commit dirrectly to the repo, which is not advisable. that was my point
Ok. I think we were talking about the documentation in general.
@scarlet echo not sure if I've asked here before, I'm looking at making an "Ideal" open spurce smart speaker PCB, with whatever hardware would be best suited to this project and Willow. Could you please advise who would be the best members to contact to collaborate with?
Can't say i can think of too many people, you'll probably have more success on the ESPHome server. Here are a few that come to mind:
@static stump, founder of Raspiaudio, probably up to his ears in closed source hardware design
@lyric harbor, founder of Willow, unsure about his availability
I don't know if he's on this server, but Sebastian from SmartSolutions4Home may be another good pick as a skilled electrical engineer https://smartsolutions4home.com/about-me/
Note that the above message has not notified the tagged people that they were tagged
Are there docs on how to stream sound over websockets to the assist_pipeline integration for wake word detection? Or is it the same as for stt?
Also, in what format does the audio stream have to be? (I'm not really experienced in working with audio formats)
Same format. You just need to set the start stage to wake.
And in what format should the audio stream be? Can't find that in that document.
It's near the bottom: https://developers.home-assistant.io/docs/voice/pipelines/#sending-speech-data
You send one byte with the handler id, then raw 16khz mono with 16-bit (signed) samples.
Thanks!
@lean beacon I converted your message into a file since it's above 15 lines :+1:
Is there any development done for the recommended M5 Atom Echo to solve the issues with it when used with Homeassistant?
That's far too vague to answer, and not related to development
Please continue in #voice-assistants-archived
It will resample to 16Khz automatically, but this will cause more CPU usage in HA.
Ah okay thanks.
What power supply is everyone using to power the M5 stack? I am looking to replace my Amazon Echo devices and will need 8 of them.
Oops... Apologies, I thought I was in that channel.
pressing button on Echo to say 'Doe de espresso uit' (is in Dutch, Espresso is an alias) and the response is, 'Sorry, ik kan geen apparaat vinden met de naam De Espresso', would that be an issue to report here, or would that be expected.
I could have sworn it did act properly before, so guess there was some development that changed its behavior
Echo -> Alexa right?
my voip assistant connectors dont disconnect when the call is finished. What info more is needed to make a bug.
one is on for 20 hours now.
no! Atom Echo, my apologies (wake word is disabled, that's why I need to push the button)
That's a bit strange
Can't reproduce it
Is that when you talk to the device, or when you type? in my case its when I give the voice command
i have this switch, with aliases
I just want to thank the devs, the latest update fixed my over-a-month-fight with the voice assistant ecosystem
aaaand didn't survive a reboot, damn pulseaudio, you're savage!
heya. the voip assistant does strange things then asterisk breaks in and moves the call to a other channel.
the assist stays open and the assist processing in the debug runs forever.
The lists on the _common.yaml file of the intents cannot have the same "in" value for multiple "out" values, or can it?
in portuguese, "persiana" can be used for both blind and shutter...
and also "estore"
Hi @ivory vessel , is it possible to add metadata information from sentence to response (it mean render_response)? eg. If I have a example sentence (below) it will be nice to have metadata <key>: <value> in response eg, one_sensor: "{{ metadata.response_text }} {{ state.state_with_unit }}".
# Wind speed
- sentences:
- "<what_is_the_class_of_name>"
response: one_sensor
requires_context:
domain: sensor
device_class: wind_speed
slots:
domain: sensor
device_class: wind_speed
expansion_rules:
class: "(prΔdkoΕΔ|szybkoΕΔ) [wiatru]"
metadata:
response_text: PrΔdkoΕΔ wiatru wynosi
The problem in Polish is that in order to correctly create a response for the indicated device, you would need to create the name of the device in its basic form (without inflection). But in current yaml configuration it is impossible. That's why I wanted to prepare answers without providing the name of the device. They will contain information about the class of the device we are asking about.
take my upvote!
Currently, to do this I have to prepare a large number of responses for each device..
something like this (but it not looks quite good in the main configuration):
one_sensor_apparent_power: Moc pozorna urzΔ
dzenia wynosi {{ state.state_with_unit }}
one_sensor_aqi: Indeks jakoΕci powietrza wynosi {{ state.state_with_unit }}
one_sensor_atmospheric_pressure: CiΕnienie atmosferyczne wynosi {{ state.state_with_unit }}
one_sensor_battery: Poziom baterii wynosi {{ state.state_with_unit }}
one_sensor_carbon_dioxide: StΔΕΌenie dwutlenku wΔgla wynosi {{ state.state_with_unit }}
one_sensor_carbon_monoxide: StΔΕΌenie tlenku wΔgla wynosi {{ state.state_with_unit }}
one_sensor_current: NatΔΕΌenie prΔ
du elektrycznego wynosi {{ state.state_with_unit }}
one_sensor_data_rate: PrΔdkoΕΔ transferu danych wynosi {{ state.state_with_unit }}
one_sensor_data_size: Rozmiar danych wynosi {{ state.state_with_unit }}
one_sensor_date: Data w kalendarzu to {{ state.state_with_unit }}
one_sensor_distance: OdlegΕoΕΔ wynosi {{ state.state_with_unit }}
...
@worthy wave although it's an absolutely excellent idea and I can try to do that PR myself, just FYI you can hack this as we speak with slots instead of metadata
or/and if it is possible to add exact text which was recognised from expansion_rules to response. Eg in response we can see extra key like {{ rules.class }} which will contains text like prΔdkoΕΔ wiatru or szybkoΕΔ wiatru π I know it won't be easy, but it would certainly make it easier to create correct answers π
@scarlet echo yes I know that I can use slots..
but slots in not a good place to add just a response.. I preffer create lot of responses in stead of this π
now maybe it will works, but on the future it can generate lot of problem π
sentences/xx/homeassistant_HassWhatever.yaml
intents:
HassWhatever:
data:
- sentences:
- "abracadabra"
slots:
testslot: "test value"
response: testslot
responses/xx/HassWhatever.yaml
responses:
intents:
HassWhatever:
testslot: "{{ slots.testslot }}"
$ python3 -m script.intentfest parse --language en --sentence 'abracadabra'
{
"text": "abracadabra",
"match": true,
"intent": "HassWhatever",
"slots": {
"testslot": "test value"
},
"context": {},
"response_key": "testslot",
"response": "test value"
}
yes, I know I can do that... but I don't want to do like that π
i totally agree
@scarlet echo maybe you tested it on HA, small question: if I have real device on HA eg. living room door and I create aliases like door in living room. When I ask: What is the state of door in living room? What will be the value in {{ slots.name }}? living room door or door in living room?
the slot text is what you said, in this case door in living room
Big thank's.. so it still not solving problem with convert polish name of device to base form π
maybe there is some magic field in HA that I can fill in to always use this form for answers.. hehe π
we have loads of issues with not having both the slot "text" and value being available in responses (and other places). basically, there are places where we need both the "translated" and "untranslated" versions of a slot (e.g. a zone name + ID or an entity_id along with the friendly name etc.). i'm not sure of the timeline for this (or if there even is one) and i'm reluctant to implement it, as many of my contributions have become severely outdated by the time someone reviewed them and i don't have enough time to keep them up to date
yes, I saw your PR (and branch) related with translations.. that is the reason why in polish language I use lot of conditions to create correct response, again I know that is not a good solution, but without this the response will not make a sense in polish language
you're gonna love this, then π https://community.home-assistant.io/t/entity-metadata-like-number-gender-or-number-for-localization-or-user-generated-names/535963
I have exactly the same problem in Polish language π
I don't know the Romanian language, but I see a lot of similarities to the Polish language
Created a PR π: https://github.com/home-assistant/intents/pull/2108
FYI @scarlet echo I decided to rename the range scale to multiplier per your suggestion.
I didn't realize the modification was in the intenta repo. After a quick skim, i thought the core had to be altered
It does, I forgot to mention that this is to try it out first before I modify core.
But then again, after a week of vacation and hundreds of emails both at work and personal, my context switching fu was not at its peak π
Nope, you're definitely right π
Since you're online, Mike, great work with a certain Andean mammal! π Can't wait to test it out
Lol, thanks! It can't control HA just yet, but that's the next step. I have a proof-of-concept working, but we decided to generalize things just a bit more π
@worthy wave I converted your message into a file since it's above 15 lines :+1:
@worthy wave I converted your message into a file since it's above 15 lines :+1:
result:
===================================================================================================================== test session starts
platform darwin -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0
rootdir: .../Home Assistant/intents
configfile: pyproject.toml
collected 79 items / 75 deselected / 4 selected
tests/test_language_intents.py .. [ 50%]
tests/test_language_sentences.py .. [100%]
============================================================================================================== 4 passed, 75 deselected in 1.08s
@ivory vessel big thanks for this small changes.. It will really help to design better voice experience β€οΈ πͺ π
Sorry Iβm just dropping in hereβ¦ I think there is a languistics aspect that you both are getting at that might need to be handled differently that a slot. Some languages have pre-positional phrases where some have post-positional phrases. Adding to the fun, within the phrase the order of the linguistic object will change.
So having specific software objects that reflect the parts of speech is useful. My thinking is that the slot could be dirived based on the language setting affecting the construction of the parts of speech determined through an NLP library pulling apart the words through chunks, stemming, and lemmatization.
That last sentence was trying to do too much.
What is your point?
Yes⦠apologies⦠The point is to handle the linguistic differences prior to attempting to handle the intent and contents of the slot.
STT -> Pragmatics -> Semantics -> Syntax -> Morphology -> Translate -> Intent and Slot -> Action
Agreed, but all those pieces are missing atm. Some may be a bit of overkill. I doubt anyone will be against somebody implementing those things.
I had a similar thought as I was typing, βArenβt you doing this right now, you big dummy? If not you than who?β
My inner dialogue can be brutal.
So I played around a bit. That pipeline is as I typed it is miserable. If nothing else, incredibly slow.
Now Iβm mulling over if this is feasible as typed. π₯Έπ
My inner dialogue can be brutal.
π
Hello guys! I've just finished initital version of custom integration of AllTalk TTS.
AllTalk TTS is in my opinion the best currently available TTS system.
tests, comments, patches are really welcome
Hey Guys, I have RTX 4090 and I try train new polish voice. But I have one problem witch torch version 1.13.1
>>> import torch
>>> print(torch.__version__)
1.13.1+cu117
>>> out = torch.fft.rfft(torch.randn(1000).cuda())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR
Does anyone know how to solve this problem? I looked for some solutions but unfortunately I can't find anything..
It is worth adding that I am working on Ubuntu 22.04 π
I found a solution π https://github.com/rhasspy/piper/issues/295 the first tests looks really great
@scarlet echo could you please review this PR? https://github.com/home-assistant/intents/pull/2108 π
done. sorry for the delay
I have something I don't understand.
In cover_HassTurOn I have the following intent
- sentences:
- open [de|het] <curtain> <in> <area>
- "[de|het] <curtain> <in> <area> openen"
- "[<doe>] [de|het] <curtain> (<open> <in> <area>|<in> <area> <open>)"
response: "cover"
requires_context:
device_class: "curtain"
domain: "cover"
<curtain> refers to "(gordijn[en]|vitrage[s])"
These are the tests (which pass just fine)
- sentences:
- Open het gordijn in de woonkamer
- Vitrage woonkamer open
- Doe het gordijn open in de woonkamer
intent:
name: HassTurnOn
slots:
area: Woonkamer
context:
device_class: curtain
domain: cover
response: Geopend
but if I try an individual sentence from those tests, it doesn't work
@TheFes β /workspaces/intents (nl_volume) $ python3 -m script.intentfest parse --language nl --sentence 'Open het gordijn in de woonkamer'
{
"text": "Open het gordijn in de woonkamer",
"match": false
}
that sounds like it's because you're not inferring context in your command. i can't remember how you can do that, though
Hello guys with privileges, I need you to review my AllTalkTTS HACS PR: https://github.com/hacs/default/pull/2457
You will not regret, it's seriously the most advanced existing free TTS
The folks here have nothing to do with that
that's bad, making this integration i did not realise that hacs backlog is 3 months long π¦
i would not bother
it seriously negatively affects HA project as it drains steam from developers
a few things here @hollow hollow
- HACS (as the name implies) is not HA, but the community store. this channel is dedicated to developers on HA voice stuff
- you can always list any repo as a custom Github repo in your HACS instance and use it without waiting for HACS to merge a PR adding the repo to the default collection. you can also instruct your potential users to do that
- as the HACS documentation for publishers states, the backlog is quite long and it will take a while to get to yours. out of personal experience, it took about 4 months for me
- "it seriously negatively affects HA project as it drains steam from developers" - although I agree (again, out of personal experience) that it can be frustrating and exhausting to wait as a developer for your contribution to get merged and used somehow, i seriously doubt that this process (be it in regards to HA or HACS) affects the HA project. there are just too many great things happening all at once for your great thing to make it a dealbreaker. and only so little manpower to handle and organize all that greatness
- when you created the PR in HACS, this was in the description. don't believe me? edit your PR (unless you've deleted everything there)
<!--
DO NOT REQUEST REVIEWS, THAT IS JUST RUDE, IF YOU DO THE PULL REQUEST WILL BE CLOSED!
Make sure to check out the guide here: https://hacs.xyz/docs/publish/start
-->
thanks for your lecture, it was funny!
But I would really like that review
Github clearly writes on the PR page: review required
I suppose it must be a Schrodinger review then - it is required and not requested in the same time!
And going back to your funny lecture a bit, the only person I see doing reviews is a Nabu Casa employee, it sounds like HA-related thing
"...those who refuse will be shot at dawn" is in the fineprint
3 months delays are evidently their internal problem which may be related to lack of workforce or bad procedures
i suggest you ask for a refund
as they could delegate someone from a community
now please stop spamming this channel with off-topic things you don't (want to) grasp
I will just ignore your dumb comments, it will be easier π
They are funny though
with that attitude you are almost guaranteeing nobody will review your PR, devs read these channels too π
And frankly, you're alienating potential users by being difficult to deal with π
I'm going to step in as a moderator and say: you're in the wrong place, you need to follow the rules of this and GitHub
Or technical limitations maybe? You are making assumptions there.
Making a negative scene, without having actual foundations or context
As provoked by mr telelele I checked there is only one reviewer, so I think Mr Stefan you can't blackmail me efficiently
It isn't blackmailing IMHO, it is true. Such approach wouldn't be received well in general
You might not want to hear that, that is fine π€·ββοΈ
I clearly presented the actual foundation: 3 months no review and hundreds of reviews waiting, so you are simply lying, Mr Frenck
anyways, HACS != Home Assistant development, so this might not be the right place for this
Pro tip: do not accuse the Home Assistant core team of lying
Yes we can close this topic indeed
@hollow hollow Sorry, that felt offensive, where was I lying?
Why not if he is lying
That is not what I said
Mr Frenck said: "without having actual foundations"
I said: You are making assumptions on what is happening or the reasons what is going on, while there is no response and thus no foundations for those conclusions. You are guessing
yes, you had no response and no context, you cannot make such conclusions out of thin air
no you wrote "without having actual foundations "
it was a lie
you just jumped on me
I understand you are unhappy with the wait, but π€·ββοΈ You are also drawing conclusions based on nothing but wait
probably because it's a real problem
Alright, ok this is going nowhere. Let's stop this here. This is not HA voice development related.
OK
I just opened this PR 5 days ago that improves the voice assistant Arabic language https://github.com/home-assistant/intents/pull/2125 but I found a test that failed, however I only edited the yaml file https://github.com/home-assistant/intents/actions/runs/8646997707/job/23722202820?pr=2125 so what's the problem?
you need to edit the tests to match the changes you made https://github.com/home-assistant/intents/blob/main/tests/ar/light_HassLightSet.yaml
I modified it and it still fails!
@subtle sierra I converted your message into a file since it's above 15 lines :+1:
sorry, i can't read Arabic so it's pretty hard for me to help out. i'd suggest tagging the AR language leaders in your PR, asking for help
Hey guys, has anyone tried adding own pretrained voice to piper? Do you know how to do this? I tried various ways but unfortunately it doesn't work..
Finally, it start working for me π that are few tips to use the own voice:
- Add new files to
/share/piperbut the correct name for the files iswg_glos_meski.onnxandwg_glos_meski.onnx.json. Don't usepl_PL-wg_glos_meski-medium.onnx - Restart piper add on and core HA to see changes
- Update your pipeline and select new voice which was added to your HA, in my case it just was
wg_glos_meski (medium) - Update all automations where you use
tts.speakservice. You should select your new voice useoptions:configuration and set correct valuesvoice: wg_glos_meskiwhere value (wg_glos_meski) is the name of the voice:
service: tts.speak
data:
media_player_entity_id: media_player.korytarz_homepod
cache: false
options:
voice: wg_glos_meski
message: Wykryto wyciek wody w kuchni pod ekspresem do kawy.
target:
entity_id: tts.piper
Only one small problem is the inability to set the voice directly in the add-on π Even when I try to set it directly via yaml configuration.. it is just impossible π because add-on voice list are hardcoded in plugin configuration: https://github.com/home-assistant/addons/blob/master/piper/config.yaml#L28. If I set something, I get an error message π
@scarlet echo How do you rate the effect of onju-voice? Are you satisfied with this speaker? How does wakewords perform when there is slight background noise? and most importantly, does it cope well when there is a slight noise and we say a command? PS. I'm asking because I saw that you prepared a video on YT with instructions on replacing a PCB π
i'm pretty happy with it. you can find multiple posts about it in #voice-assistants-archived (search for "onju"). however, I am not a "power user", i don't use it along with TV and whatnot. for my needs, it's perfectly suited
thanks for the information, I'm asking you because I know you use it yourself, unfortunately in my case Atom Echo did not work properly even though many people had no problem
The hardware in the Onju is pretty good. I am waiting for the software to improve so as to fully utilize it π¬
Hi how do I create custom models for micro wake word?
Or is it better to use openwakeword for the time being?
I think we may have made the wrong decision in regards to querying cover entities for questions such as "Which windows are open?". But switching to binary_sensor would be a breaking change, so I started a poll here to get some feedback on which type of entities people have https://github.com/home-assistant/intents/discussions/2168
If it will reveal that binary_sensors are more prevalent, is it ok to switch the default target domain to binary_sensor? @ivory vessel @noble copper
for custom models , at the moment it is better to use openwakeword or snowboy , it is quite in depth and complex task to create these for mWW . The process in detail can be found here https://github.com/kahrendt/microWakeWord
We shouldn't make a decision but support both
a decision has been made ~1 year ago. i've discussed with Mike the potential solution to implement support for such thing (i.e. "entity" slot lists, so you can have more than one entities referenced in a sentence (e.g. "is Paulus at the supermarket?") with one or more filters which entities should match
the trouble is that PRs in that area get stale and I, for one, don't have the time to redo everything from the ground up after 2 months because everything got overhauled
so the proposed solution was simply just as bad, but more general
alternatively, we could create a binary_sensor_as_x helper, which turns binary_sensors into covers only from the interface, similarly to switch_as_x https://github.com/home-assistant/architecture/discussions/1084
I've opened a bounty to get outbound calls working for our voip utils https://github.com/home-assistant-libs/voip-utils/issues/17
All paths lead to pjsip, as with most SIP UAs you can probably think of. If you want to get hacky with it I'm sure you could made a bastardised UA that just makes a call and does not care too much about playing nicely you can hack something together, as with the original UA specific to a Grandstream HT801, it really depends how long you want the call to be and how the UAS reacts. If the audio you want to transmit is under 30 seconds you can probably get away with murder and just send an INVITE, wait for a 200 OK, ACK it and send/receive audio (with very specific codec choices as with the original). What's your actual use case? I can be a bit more specific with more information on what you want to achieve (Does it need to auth? What does it actually need to call?).
@scarlet echo I updated to the latest version of onju-voice-microwakeword from your repo and it seems to only recognize the wake up word once and then never ever again. Does that sound like a known issue to you? Also, playing media doesn't seem to do anything
https://github.com/tetele/onju-voice-satellite/issues/46
Also, this is rather a question for #voice-assistants-archived
What sort of hardware is recommended to train/finetune a new voice for Piper? Is a single gpu with 24GB of VRAM enough, or it would be preferred to have a multi-gpu setup?
(another way to phrase this question is "What hardware is Mike training piper on")
(i'm interested in contributing voices for Piper)
Also, are there tools for processing public domain audio book recordings into a dataset?
I was thinking of like, a tool that uses whisper to transcribe the audio files and save all that metadata into a csv file
I remember Mike saying somewhere that he used public domain audio books to create voices for piper, so I imagine he didn't create datasets manually and used some tools for automating the process=
It was a PAIN to set up (had to compile freaking gcc 11 to build one ancient module old enough to go to school) and it errors out or stalls unless you feed it a single 48khz WAV file (and I also had to run it in CPU mode because it was video memory oom-ing) but this achieved what I wanted:
https://github.com/davidmartinrius/speech-dataset-generator
The whisper transcription is OK but dodgy in a few places, still looking for a good software to edit metadata
Hi all, I would like to extend the conversation component to allow to send TTS messages to for example esphome's Voice_Assistant. I have been looking into the code and it seems easily to do without the need change very much, Sadly i have not the skills and environment to do it my self., I tried. Is there anyone that is willing to help me out?
that sounds out of scope for a conversation agent. have you opened up an architecture discussion?
you can use the tts.speak service and target any media_player, one of which could be in the ESPHome voice satellite
I know i can use the media player. But some devices do not support media_player option. And from what i understand the media player requires MP3 codex to play audio. While VA uses WAV audio.
I see this suggestion as an conversation starter, like HAOS: "Did you take you meds?" , ME: "Yes"/"No" etc. HAOS: When "Yes" =>"Oky noted" else "Time to take them Now." etc.
Anyway i will ready the architecture discussion forum and place my suggestion there.
Oh, ok. I had the same proposal last year https://github.com/home-assistant/architecture/discussions/907
And what was the responce on that?
there was a lot of reactions on your suggestion i see, but so far i can see none are in the direction of let me implement it. Am i right?
correct. there needs to be a decision from an architectural standpoint in order to guarantee the merging of the feature
From the look at the code, it is almost there, there are no architectural changes needed, imho, it is just extending the already existing code. The assist_pipeline has all the setup options that are needed to send the TTS messaages.
you're looking at it simplistically, i fear. OK, so you can emit a TTS message, but how will that tie into your response? the proper approach, in my view, is to build the foundations for a conversation (i.e. back and forth messages) that can be started by either party
No kidding, This would be my optimal solution as well. And i fully agree to that. And i still belief this is still possible within the current pipeline architect. Maybe not as extended as your proposal is but i see some lights in the dark.
My approach is doing it step by step. First the initial message from HA and later controlling responses. With some proper automation setup this can be done by adding different triggers that response on what is said.
working on the timer intents now, but I get this error when doing the tests
FAILED tests/test_language_intents.py::test_homeassistant_HassCancelTimer[nl] - AssertionError: Intent HassCancelTimer does not support slot 'seconds'. See intents.yaml for supported slots
any idea where this comes from. until now I only made direct translations from EN to NL, didn't use seconds directly anywhere
https://github.com/home-assistant/intents/pull/2194 this is what I have now, still not sure where those unsupported slots come from
@thefly what happens when someone says something like pizzatimer instead of pizza timer?
or keukentimer vs keuken timer
there's an optional space between them, so it works with or without
{area}[ ]timer and {timer_name:name}[ ]timer
@scarlet echo I converted your message into a file since it's above 15 lines :+1:
Ah nice, there is a party in my area this weekend to celebrate the intents repo https://www.intentsfestival.nl/en/
If the winds are good, I can just listen to it for free (also when I intend to sleep π )
there is a campsite! so you can stay in**tents **
i feel bad about my on-topic message question becoming less relevant, but do they test for compliance at the entrance, before allowing you to go to the main stage?
on-topic then
It's a bit unclear to me when to use slots and when to use requires_context
I had requires_context here, but that didn't work. changing it to slots makes it work
- sentences:
- "open [de] garage[ ][deur]"
- "[de] garage[ ][deur] openen"
- "[<doe>] [de] garage[ ][deur] <open>"
- "<zou> [de] garage[ ][deur] ((<open> willen | <open> kunnen | <open>[ ])<doe>|openen)"
- "<zou> [de] garage[ ][deur] (kunnen|willen) [<open>[ ]<doe>|openen]"
response: "cover_device_class"
slots:
device_class: "garage"
domain: "cover"
I think I initially just copied this from the EN version
requires_context (in sentences definition) is for when:
- you use a
{name}in the sentence and want to make sure the sentence matches a certaindomain(although that can be enforced through the filename -domain_IntentName),device_classetc. - you want to make sure that the satellite used had an
areaassigned, to treat sentences likearea-aware without the user naming the area
slots (in sentences definition) is for when you want to specify a certain slot value without the user saying it. for example
- sentences:
- "start a half hour timer"
slots:
minutes: 30
this will populate the slot value with a value you specify, then will hand it over to the conversation agent to use it
ah thanks
there's a slight issue with the context in tests, as far as i see it. you can't send context without expecting it as a slot, so the sentences MUST send out context as a slot, and i see that as a bug which I have tried correcting https://github.com/home-assistant/intents/pull/2142
basically, at the moment, input context should be the same as output context in a test
which should not be the case, as i see it
I'm not sure who needs to hear this, but the new code review bot in the intents repo seems very good. A welcome addition, thanks!
yeah, it's active on all HomeAssistant repo's, but it seems to provide good information
How can I apply as language leader? I see that for DE there are quite a few open PRs that are neither commented nor reviewed. To distribute the work load I'd like to help out and join the current leaders.
HI All, apologies if its the wrong place to ask a Voice question - I have voice working from an ESPHome device to HA .. the wake words are running fine, but when the command is spoken, HA doesnt detect the end of the sentence. (Long pauses for timeout) - the phrase is correct, just lots of silence.
is there a setting I can butcher to experiment more >
More for #voice-assistants-archived
You can try tweaking these:
noise_suppression_level: 2
auto_gain: 31dBFS
volume_multiplier: 2.0
Thank you!
Dear @neon swan, not sure of this is the right why to contact you. I'm working on a mobile VA device (https://discord.com/channels/429907082951524364/1171818251983011920) . For this I like to create a solution to allow HA directly to talk to VA without the need to answer a question π . I know this can be done using the media_player. The thing is that i do not like to add a heavy component like this in esphome, just get announcements from HA. I have been looking in to the HA core code and i belief that the current architecture has everything that is needed to allow this.
But i like to confirm what i figured out and talk on how to implement this new feature.
Hello
So I managed to have the answer to my query on the Atom Echo from GPT (OpenAI Extended) talked either by a mp3 sent to it or directly with the text being spoken by Alexa's voice to my Echo Dot. The thing is that I did all this by modifying the components code. I'd like if possible to do this directly from the Atom Echo yaml conf and send the text of the answer via the notify service that makes a request to the Echo Dot, but I don't know how to retrieve the text of the answer. I only manage to get the audio or I also can get the text of what I've said so my STT.
To get the text of what I have said, it is
on_stt_end:
then:
- lambda: id(stt_text) = x.c_str();
I tried tts_text but it's not working. Has anyone a clue ? Thanks
please don't crosspost, and this is not a dev question
you'll get more support for that over in the ESPHome Discord
How do folks reviewing new sentences feel about using an LLM to generate alternative sentences? If I'm not blindly committing but instead using it to come up with alternative natural sounding permutations, will that be accepted?
After the new release talking about sticking an LLM in to parse intents in an online system, I got thinking that some of that flexiblity could be achieved offline by letting the LLM come up with sentences structures for each intent. So basically the inverse.
if it makes sense, nobody cares you used AI to generate alternative sentences
Mike's initial plan with the intents repo was precisely to have grammatically correct sentences for doing stuff and then to have some ML model come up with new ones based on them
that PR seems good, i'll formally review it on Github today
Great! Thanks. I just wanted to make sure it wasnβt against some policy. Iβll carry on. I have a few other PRs in draft to expand sentences quite a bit. I also noticed that thereβs a lot of repeated patterns between files. Iβve got another branch started to kind of refactor that a bit. Iβll make sure to keep the patches manageable though.
i've left some feedback. you might want to apply it throughout the things you want to modify
please don't create huge PRs. one per domain or new piece of functionality seems about right
Yea. That sounds good. Mind having some discussion about the feedback here?
Mostly around precise grammar structure vs same intent. Targeting precision rather than capturing intent seems unintuitive to me given that it will force users to speak very precisely. Some users may have different grasps of the English language.
For example: If my toddler says: βis all the shades open?β It is not correct grammar, however the intent is unambiguous. It feels to me that it would be best to respond to this kind of query with the expected intent all the same.
fo' shizzle
joking aside, i don't know what to say about that. I think sentences should be grammatically correct, as @ivory vessel initially wanted them. if that changed, he'll let us know
but to answer the initial part of your question, this is a good place to discuss such issues. the best place would be here https://github.com/home-assistant/intents/discussions
Thanks for the pointer. And for this topic, probably specifically here: https://github.com/home-assistant/intents/discussions/871
i guess that's a good place and there are already people engaged there
Some slightly "incorrect" sentences are fine, like subject/verb disagreement. If I could get a spare moment, I could try and implement the second stage of the plan and train a small machine learning model on the existing sentences π
Ah, ok. Thanks for the guidance! I've got a few PRs that I'm working on to hopefully make a lot of sentences more general. I've found that it still has the feeling of needing magic encantations while the sentence count is small. Expanding to cover more variants, even grammatically incorrect ones, will help with positive response rates.
Discuss m5 Atom stuff here? I've had 2 working as voice assistants for 9 months, but now one never detects speech. I've factory reset, rebuilt firmware and uploaded it, power cycled, whatever I could think of. It sees and logs button presses but as for speech it just stays in the WAITING_FOR_VAD state forever, whereas the other unit "detects speech" even in a quiet empty room every 5 seconds. Does the hardware just die?
Is it possible to have Assist either (1) not respond with a vocal answer, or (2) respond with just saying "Done".
I know what I've asked it to do, so I really don't need to be informed that switch has been turned on when that's what I've asked it to do.
does home assistant support streaming chunked TTS responses?
Seems like it's waiting for the full file to download and then it plays back instead of playing immediately
Not yet, but it's been proposed to have 3 options: (1) always respond, (2) only respond when targeting things outside the current area, and (3) no responses.
No, the TTS system is tied in with the media system which is file/URL based.
Do you know if there plans to change this? I figure it would improve the interaction with voice assistants (specially when generation is local)
No plans for now. It would be a major overhaul to the TTS and media integrations. Right now, everything assumes a complete file.
I've used conversation triggers and intents only because I want to control the response. Ideally, I could just choose a tone to emulate Alexa
It won't squelch the tts response but you can use wyoming protocol to send an event that HA listens to and responds with an mp3 file
So mine for examples uses a little python blip to pick out a random file from a folder of star trek computer beeps when it hears wakeword and when it is done transcribing stt, you could do similar at any point in the pipeline
Are there any already availible open source solutions that we might be able to explore as a stop-gap or tack-on solution until something more holistic can be drawn up?
I know in my own home latency between end of stt and response is the biggest friction point right now so it would be great to explore this as one avenue towards better responsiveness. I haven't made any contributions to the project yet but I'd be happy to jump in now that I'm happy with where my individual setup is
I looked into some ways to do it and all seemed kind hacky, so I settled for 'Done' (compared to 'turned on input underscore Boolean', which is just silly π )
Hmmm yeah now that I look, it doesn't seem like there's a way to specifically play an event when it succeeds at the action it performed. Just when it succeeds at listening to you, or generating a response, which may not in itself be indicative of success
That being said, I think you can get what you want by playing a success noise on one of those two conditions and returning no speech from the intent. That way, if there's an error firing or finding the intent, it will still inform you verbally, but if it succeeds, it will play a success noise and then should proceed silently
I'd be happy to share the code to get that working but I don't want to spam the dev chat π
Appreciate the offer, but I'm okay with what I have for now. I'm more focused on walk-up-and-talk reliability across all my devices for now
If you find one, let me know!
I thought music assistant might offer a way but I need to check further into it.
It could be a great idea to bark up their tree. Probably a lot of experience among their contributors hacking media_player to extend functionality
I assume that's why the integration makes duplicates the way it does
im going to modify the open ai tts extension to use chunked audio and will try to pass to a music assistance entity... lets see!
i was a little too hopeful, theres still some more work to make that work.
@ivory vessel first of all thank you for the great work on Assist (I'm following you since Rhasspy), thanks to the complete team here too. I'm currently experimenting with a quite small LLM (without GPU) Ollama gemma:2B (the system runs on a server with 8GB memory). Without any "sensors" data in the template the response is acceptable time wise (off course due to the limit of the system it takes couple of milli seconds to respond). Adding the sensor data within the template in the context it takes quite longer (it depends from the amount of data to process, some time minutes, and from the test I've done it is quite proportional with the amount of data in the context). I thought it is possible probably to combine Gemma with Assist. The basic idea is to use Assist detect the user intent, and provide the context to Gemma that formulate the response (user: "what is the temp. in Living Room" -> Assist get the data in Living Room and provide the context to Gemma or user:"turn on Entrance light" -> Assist get the intent do the action -> Gemma get the result context). As well by doing so, it could be possible to control the house, as per passing from Assist to detect the intent could be possible to do so. I know probably this isn't a real "AI" or better a kind of AI teams implementation.. but this could help to keep all in Local and have similar performances to the current OpenAi implementation also on limited systems. The "Conversation" when detect an intent of "general topic" such as User: "why the sky is blue" -> Assist would not found the device "sky" -> should get the user input to Gemma directly. I could invest some time play around with it. And this is just to share the idea with you guys. Thanks once again for the great work.
I think any solution would need to exist outside HA's TTS system (either as a custom integration or externally). The Wyoming protocol that's used for Piper is streaming by default, so it's at least possible to create something based it. The next version of Piper will include streaming voices as well, so we're getting closer.
I'm very interested in these sorts of experiments π
Assist could definitely be used to detect the intent, though (as everyone knows) it's fairly rigid. Some ideas I've had for doing this differently:
- Use the LLM via text to categorize the intent (slow)
- Use the LLM to get an embedding of the user's sentence and compare it with pre-computed embeddings of the various intents (faster)
- Use a tiny BERT model to train an intent classifier (should be even faster)
What kind of server are you running?
the "media_content_type" attribute of media_player might be a good axis for that tts convo from earlier.
Might provide an angle to extend some tts optimizations into that integration at least. It would probably make sense from an end-user perspective as well to offer them an intended mode for voice, since media_player will increasingly be used for tts
Iβm running it on a Fujitsu Server Primergy Tx140 S1 (old and cheap server). HA runs on Docker (itβs a supervised version) the OS is Debian Server 11 (amd64). Ollama is installed on the OS directly. So I use localhost to connect it. I think to open soon I branch and start to work on it if you donβt mind.
I also order an AMD GPU just in case but the target is to get it work smoothly and I can ensure you (can do some demo video to show you what I mentioned) Ollama gets the stats for example of the sun position only the response is handled in few milliseconds. Iβm currently studying the docs of LLM.py.