#devs_voice-archived

1 messages Β· Page 4 of 1

cedar moon
#

also it would be possible to create automations by voice with the correct entities, if GPT would know them

#

If there are any projects already started I would love to participate. Otherwise I would try to start some experiments with it

scarlet echo
#

shouldn't such an integration expose its own conversation agent and not necessarily rely on the current intents as a dependency? πŸ€” however, I see no issue with multiple PRs for new intents if you want to use the existing intents.

cedar moon
#

Yeah, that is where I'm not sure about how to start. intents are only used by the home assistant conversation agent?

scarlet echo
cedar moon
#

ok thx. I will take a look at the current ChatGPT agent and try to integrate functions to it

#

its all new for me (longchain and the conversation agents) so it will be very experimental

nimble ferry
#

I have a personal project that does this, by adding a new conversation agent (which I started by copy pasting the openai one and then started to add callable functions (like control_light, shopping list and calendar) to it). Works pretty good.

#

I decided to branch it off a bit though, so that I run the conversation agent as a small api inside a docker container instead and I will make the conversation agent component in home assistant a thin client that only passes on the conversation to this api. I find it much faster to develop this way, probably because I'm not experienced enough with how to set up a dev environment in home assistant (I end up restarting the dev container a lot).

nimble ferry
#

i tried creating a wyoming service using elevenlabs because i thought i would be able to reduce latency by processing one sentence at a time and sending one AudioChunk for each sentence. but it seems like the client waits for AudioStop before starting to play audio (only tried in the config "try voice" in the browser). is there a way around this?

nimble ferry
#

Same with the mobile assistant, anyone know if this is the expected behaviour for all clients? E.g. the wyoming-satellite also?

worthy wave
#

Hey everyone, I have very small question πŸ™‚ Could you explain me what is a different between slots and requires_context? When I should use slots? It will be nice also share with me example πŸ™‚ eg. What is a diffferent between this two yaml configurations?

- sentences:
    - "close <name> [in <area>]"
  slots:
    domain: "cover"

and

- sentences:
    - "close <name> [in <area>]"
  requires_context:
    domain: cover

I found information about slots only on this link https://developers.home-assistant.io/docs/voice/intent-recognition/template-sentence-syntax#responses but about requires_context I found only this information https://developers.home-assistant.io/docs/voice/intent-recognition/template-sentence-syntax#requiresexcludes-context. I also check example from the hassil repository https://github.com/home-assistant/hassil/blob/main/examples/en.yaml but still I can't understand differences.

scarlet echo
#

requires_context: {domain: "cover"} means you are only going to match that sentence if the provided {name} (<name> is shorthand for [the] {name}) is a cover

#

slots: {domain: "cover"} means you are setting the value of a slot named domain to cover, regardless of what domain the {name} has, which may have implications in how the intent is handled

worthy wave
worthy wave
scarlet echo
#

Generally, you shouldn't. But say you have a sentence like "is any door locked". There is no entity name there to reference the lock domain, so you have to let the intent handler know that it should query for entities pertaining to the lock domain

west gulchBOT
#

@worthy wave I converted your message into a file since it's above 15 lines :+1:

scarlet echo
#

That is correct, but i can't remember off the top of my head if you need requires_context if you don't have {name}. I am inclined to say no

worthy wave
worthy wave
#

If I may ask one more thing, I'll change the topic a bit. πŸ™‚ I have two devices M5Atom Echo. My experience with using them and wake words is not very good. Is there any way to check how sound is transmitted from M5Atom to HA and how on the live HA tries to recognize wake words? I don't really know where the problem is. I have HA installed on a NUC 13 Pro and it is a powerful machine. Very often the word wakeWords is not recognized.

I also did tests with a speaker Jabra SPEAK 510 MS (via USB) and here I also had a lot of problems for wake words to work properly. Do you know how to examine where the problem is?

west gulchBOT
worthy wave
#

I watched the entire last presentation (Voice Assistant Contest Launch) and there it was indicated to use the debug mode in Pipelin. But unfortunately it doesn't work very well for me.

worthy wave
#

Thanks, I'm going to look for it

west gulchBOT
#

@worthy wave I converted your message into a file since it's above 15 lines :+1:

worthy wave
#

Sorry, I wrongly create message above πŸ™‚ @scarlet echo I have another question. Is is possible to use the same sentences for support two different domain? eg. I Poland we can say: "Open the door". The meaning of such a sentence is twofold: Open the door (eg. patio doors with the electric engine) and unlock the door (just unlock the door lock). So we can cover two different domain cover.door and lock. Can I do something like that? (Check example below)

scarlet echo
#

In your example you are missing a requires_context: {domain: "lock"} (or cover, respectively). If there are any clashes with words in the generic homeassistant_HassTurnOn or Off, then you need to specifically excludes_context there

#

Otherwise, the sentence might match the generic homeassistant_HassTurnXx

#

However, the door and the lock in your example need to have different names/aliases, otherwise the first matching sentence's intent will be handled

hollow hollow
#

using it with local whisper.cpp server instance

#

it's not extremely faster than faster whisper but it's advantage is to utilise much more popular API

broken elbow
#

Hey, I know you can't judge language you don't understand, but if anyone notices some obvious stupidities on this PR concerning something else than the language structure itself, please let me know. https://github.com/home-assistant/intents/pull/1990
I haven't done this large PR ever and this fast, so ...I'll have to fix some missing tests. I wonder why none of my local commands complained about that?

cunning veldt
#

Hi, does anyone know if there is a plan to integrate GPU to Whisper and Piper to Home Assistant so that we can plug and use a GPU ? Thanks

worthy wave
#

Hi @ivory vessel What is the minimum length/number of voice samples to be able to create a new piper voice? Of course, I am aware that the more the better... πŸ˜€ I rather wonder whether 4 hours is enough or whether much more is needed? My second question is whether it is possible for you to train a new language from such samples or do I have to do it myself. My only problem is that I don't have the right equipment to do something like this. Of course, the sound will be publicly available.. πŸ™‚ PS. I talk about polish language

hollow hollow
#

@worthy wave have you tried xtts2?

worthy wave
#

@ivory vessel A completely different topic I'm wondering is whether it is possible to train the current wakeWords using your own voice or is it rather difficult to do? I'm asking because after my recent tests I noticed that the M5 Stack Echo microphone works properly in my configuration, but due to the fact that my accent is not English, the word itself does not work super accurately to wake up the microphone. And I was wondering if I could record several dozen uses of such a word by household members and whether it was possible to train such a model to work more correctly.

worthy wave
hollow hollow
hollow hollow
#

jakoΕ›Δ‡ tego modelu wbije ciΔ™ w fotel

worthy wave
#

that is the reason why I ask if there is such a possibility to add several dozen additional samples of voice to create "my own" wake worksfrom current existing models πŸ™‚

worthy wave
hollow hollow
#

you need to read openwakeword github issues and eventually chat with mr dscripka

worthy wave
#

yep, but without polish language example

hollow hollow
#

so polish is identically awesome

#

you won't recognise it from real speech

#

please note, GPU is a must for that model

#

to have real time inference

scarlet echo
worthy wave
scarlet echo
#

Piper recording studio provides the "right" amount of samples you need to read aloud to train a TTS voice model

hollow hollow
#

XTTS2 needs just few samples, btw

hollow hollow
#

OMG, this model is totally crazy super fast

broken elbow
#

when is the DL to merge translations for 2024.3?

worthy wave
#

@scarlet echo I have small question πŸ™‚ I'm one of Polish leader, but I don't know why I can't merge PR prepared by me and accepted from other person.. can you explain me what I should do to be able also merge PR related with polish language? Sorry if this question isn't for you πŸ™‚

scarlet echo
#

for example @shell dirge is in that group and he never had issues with merging PRs

severe oyster
#

Working on Slovenian translation of sentences in VA and one thing I can't figure out (for now on my priority list) is how to tackle the speech output of numbers. Example: the text output (of sensor value) is 22.5 Β°C. Which is fine, but when spoken on VA I get "two-two-five C" (in Slovenian). Any advice how to tackle this to get whole number output? And decimals? The spoken output ignores decimal point...Thanks

scarlet echo
#

Does slovenian use a comma , as a decimal separator?

scarlet aurora
#

don't the intents repo use the same setence parser?
On HA, for "coloca batatas na lista" I get "batanas n" as item.
Using script.intentfest parse from the intents repo, I get "batatas "

scarlet echo
ivory vessel
ivory vessel
worthy wave
#

I plan to create at least one more women's voice

ivory vessel
worthy wave
#

For now, I have focused on a major update of the Polish language for intents πŸ™‚

#

I don't know if there is any option to speed up PR checking.. πŸ˜… because I don't have the ability to merge them

ivory vessel
worthy wave
#

Yes, but unfortunately this is only part of changes... the whole thing is still missing, related to checking the sensor and binary_sensor... and I don't want to work on it until this PR is not closed (this is a bit related) πŸ™‚

cobalt needle
#

@ivory vessel If I am currently training a voice and the test output seems to have hit a plateau (using the test output) Can I simply halt training, update the dataset wav and csv with more data, run prep again then resume from my current checkpoint? Will doing so start trianing with the additional data or do I need to start over with a clean set of training folders?

ivory vessel
cobalt needle
spare forge
#

one question about the voice assistant: Do we want to add basic-but-not-extricly-smart-home-related commands like "what time is it?"

#

I find myself missing asking alexa alexa about what time is it, adding timers while I'm cooking, etc...

#

I think adding them would make for a smoother transition for people already using alexa and google speakers

worthy wave
spare forge
#

@worthy wave you mean with custom sentences?

worthy wave
#

no, just automation inside HA

spare forge
#

I'm not following I'm afraid

spare forge
#

so what I said, with custom sentences ^

worthy wave
#

πŸ˜„

#

these are two different configurations

spare forge
#

I see. I saw them as one and the same

#

for sure, one can add any sentence to perform anything they want. My question is if we should ship by default several of the most common ones

#

much like we have sentences for the weather, we could have them for time

#

and similarly to how we have sentences to manage a shopping list, i'd expect senteces to set alarms and timers

#

I'm speaking from a user's perspective that is looking to replace alexa with HA

scarlet echo
#

@spare forge go ahead and propose some intents and/or sentences. There's nothing set in stone

cobalt needle
#

Setting timers and alarms is very high on that list

spare forge
#

@scarlet echo in the architecture repo?

scarlet echo
spare forge
#

I did a couple proposals in the past, but I wasn't sure if the architecture one was the right repo for it or voice had another one

ivory vessel
#

@spare forge Can you link those to me so I can collect them into a list? Thanks!

spare forge
#

some of them, like timers, might require creating new services, others like asking for the time seem rather simple

scarlet echo
#

All of them require new intents

severe oyster
#

Thanks @scarlet echo same in Slovenian. The value with (,) is properly spoken out by TTS, the value with (.) is not. But I could check it only on Try voice button on VA Settings, since the jinja2 filter replace doesn't change the sensor value properly. -> I get the same sensor state in Dev tools is with . (see pic1 https://ibb.co/X5XG1dx ) but when I open it it's with ,localized (see pic2 https://ibb.co/ygfF7XG )🀨 It gets me mad slowly... In template editor in devs section I get sensor state with dot (.) (see pic3 https://ibb.co/zVsJyBN) but I have configures in personal settings as 1.234.567,89 (see pic4 https://ibb.co/4WSzHhv ). If I use replace filter in template I get thisπŸ™ƒ https://ibb.co/WHcQB95

spare forge
ivory vessel
#

Timers are going to require HA to be able to initiate a TTS response on the satellite, which it currently can't do. But I think this will be pretty straightforward.

#

Well, really just an event when the timer elapses. Not necessarily TTS.

cobalt needle
#

A timer is just a future time stamped event with a name and destination media device really

#

some assistants allow you to set named timers

spare forge
#

named timers are critical for people who cook IMO. I set timers to know when something i'm boiling is done, while I'm baking something else, while there's another one for the max screen time of my daughter

#

I 100% need names on my timers

ivory vessel
#

Named timers will definitely be supported πŸ‘

scarlet aurora
#

well, even HA on dev has the same issue, so it is not a version mismatch...

scarlet aurora
#

found it! its because HA uses recognize_all while the intents parse script uses recognize

#

and recognize_all returns two results: the correct one and the one with the "n"

spare forge
#

@ivory vessel I opened two discussions in the architecture repo. One for adding support for basic sentences like asking the time, setting alarms, etc..., and another one to add a "Brief mode" similar to the one alexa has, which makes it prefer shorter responses over verbose ones

scarlet echo
#

Can one set up a Wyoming server or something and capture all recorded audio from an ESPHome satellite without trying to use it for HA? I.e. without running it through a pipeline?

reef anchor
#

@ivory vessel Is this intent release still just a beta release? Will there be another release before the official launch? I wanted to perfect the cover and valve parts for area management today, but you were super fast this month. πŸ™‚

ivory vessel
cobalt needle
# ivory vessel Yes, this should just fine. I'm guessing this would be the better approach, but ...

As a follow up since you need a fairly significant minimum amount of data for the prep to complete without error, creating a completely new dataset completely wasn't practical.. Instead I have added some new incremental data targeted at problem words and removed some redundant data from the main set to try and PULL it to a better spot... It is still sluring some words however.. I might abandon the current checkpoints and go back to starting from a good one with a revised dataset but at the moment my loss gen is going down so I am going to give it some time to bake

marsh roost
#

Not sure if I should be on ESPHome Discord or here... I'm trying to setup the S3-BOX-3 and the wakeword works... but then when I try to interact with my voice pipeline/assistant, the last log entry I get is:

[D][esp_adf.microphone:273]: Microphone started
[D][voice_assistant:414]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE

... I'm not sure if anything is making it to whisper or not

cobalt needle
#

Hmm even after re-training my voice model seems to be slurring words as compared to the source data, I wonder if it is a factor of having too small of a dataset or if the training prep stage is attributing the wrong phonetic breakdown of the source for some reason.

floral path
#

Do I get it right, that when I want to limit the intent to a specific domain, e.g.:

requires_context:
  domain:
    - cover

This only works if the sentence contains the entity {name}, So there is no way to say I want to control entities of a specific domain in an {area}?

#

And if you do not mind a second question. How to address devices in the same area as the assistant device. I have found 2 ways: ```
requires_context:
area:
slot: true

#

and

slots:
          name: "all"

Are they equal? Which one is right? How do they work? I could not find it in the doc.

scarlet echo
#

If you want to target devices which are in a specific area, you need to requires_context: {area: 'bedroom'} slots: {area: 'bedroom'} for example

scarlet echo
scarlet echo
#

either way, it has nothing to do with area

floral path
#

Thanks

scarlet echo
floral path
#

(and in thsi case this is going to crash into the climate domain, but that's another thing)

scarlet echo
#

are we talking about built-in sentences?

floral path
#

Yes, have been contributing the sentences in cs. It works, but in these two areas I do not fully understand how it works.

scarlet echo
#

understood. there was a voluntary choice not to respond to what's the temperature in the bedroom with temperature sensors, but only with climate current_temperature

#

a sensor with device_class: temperature could just as well be a 3D printing nozzle temp sensor, a fridge temperature sensor etc. whereas climate entities are pretty straightforward

#

that said, you can still ask (in English, at least) what is the <device_class> of <sensor name> and some variations, but you have to name the sensor (or its aliases)

floral path
#

understood. probably works for thermostats. if you use TRVs they usually do not measure temperature, but I guess it is what it is

scarlet echo
#

some TRVs expose climate entities. what exact entities do you have?

#

just the TRV switch?

floral path
#

No, zwave or zigbee TRVs. And they show as climate entities. Anyway, I can always havea custom automation to inject the current temperature from the room sensork if I am desparate πŸ™‚

scarlet echo
#

i mean if they expose climate entities

floral path
#

got sidetracked, the question was about the filtering device class when refering to area, not the device name. And using the same area as the satellite

scarlet echo
#

specifically for temperatures, you don't need to filter anything, as the HassClimateGetTemperature intent only applies to the climate domain, for which there are no device_classes. you can't query sensors and I've briefly explained why

scarlet echo
#

that piece of YAML makes sure that area was included in the context (coming from the area assigned to the satellite) and it promotes the area to a slot, which is then used for filtering entities

#

if you're good with Python, i strongly recommend going through the hassil code and the default_agent to really understand how intent recognition and handling works

ivory vessel
cobalt needle
cobalt needle
#

@ivory vessel sent you some samples as a direct message

floral path
floral path
reef anchor
severe oyster
#

Do we have any intent for stopping opening/closing cover? Example: when the sentence is called: 'open the blinds' the HA starts opening the blinds which takes some time...if we want to stop in the middle or in some desired position - is there any intent already? Or did I miss something? Thanks!

severe oyster
#

Thanks @scarlet echo, probably the only way is with automation and custom sentence as a trigger?

scarlet echo
#

i guess, yes. but we could add the sentence(s) and intent. could you open a ticket please?

floral path
#

Speaking of covers, the documentation says that HassOpenCover and HassCloseCover are deprecated, and we shall use HassTurnOn/Off. Is that the goal? I haven't seen any language that has this implemented (at least EN does not have it). How does the Stop fit in?

severe oyster
floral path
scarlet echo
scarlet echo
floral path
#

I mean that if I look at the HassTurnOn intents:

      - sentences:
          - "<turn> on (<area> <name>|<name> [in <area>])"
          - "[<turn>] (<area> <name>|<name> [in <area>]) [to] on"
          - "activate (<area> <name>|<name> [in <area>])"

It reacts on turn: "(turn|switch|change)" or activate. So words open or raise that are used in HasOpenCover are not implemented.

severe oyster
floral path
#

Yes, cover is in the domain. So you can turn or switch it on, or activate it (whatever it means)

scarlet echo
#

no, it's listed in the excludes_context, which specifically does not match these sentences with entities from the cover domain

severe oyster
scarlet echo
floral path
#

Aaa, I was confused. Sorry for wasting the time/space here

severe oyster
#

@scarlet echo just checking your PR https://github.com/home-assistant/intents/pull/2045 because I had some troubles with intent valve (I had to differentiate from set positionand open valve, so I had to use a synonym in sl). Is this the reason the homeassistant_HassSetPosition.yaml was deleted? Thanks!

scarlet echo
severe oyster
#

Some help please: response for sensor_HassGetState which is one in form: {{ slots.name | capitalize }} je {{ state.state_with_unit }} gives me clumsy response according to sl language.
How can <class> (from expansion rules) be added in front of slots.name. So the response will be more human friendly? E.g. for duration sensor: "**Trajanje** trenutnega programa pomivalnega stroja je 64 min" I need bolded (**) word which is from <class> expansion rules? If I put slots.device_class in front I get device class untraslated e.g. duration not trajanje in Slovene. πŸ™

dire oar
#

I don't feel like there are any ideal off the shelf modules that can quite compete with Amazon or Google, especially when it comes to the Mic arrays for directional/far field arrays. Has the community found a good Voice processing unit that works with a ESP32?

scarlet echo
dire oar
#

I suppose the question is, what hardware would make the life of the developers easier to make a ideal smart speaker? To start with focusing on the MIC array that can work in noisy environments from a distance

broken elbow
#

Hey, for some reason whenever i say "SÀÀdÀ", Assist hears "Saada" (which is also a Finnish word). Should I fix this in intents, or open an issue elsewhere? Translation is roughly "Modify". "Saada" isn't something that I can immediately think for any voice commands.

scarlet echo
# broken elbow Hey, for some reason whenever i say "SÀÀdÀ", Assist hears "Saada" (which is also...

That's an issue with the STT engine. What are you using?
Mike H has a workaround for similar sounding words which would help in exactly these situations, but it's not ready for prime time yet (mostly due to missing text-to-phoneme libraries with a usable license)
To answer your question, adding nonsensical words to the sentences just because that's what the STT hears is a band-aid on a broken bone and i would advise against it

broken elbow
scarlet echo
quasi blade
#

Hello
is there any way to set the "assist" when it doesn't understand the request it send the message to gpt api so like that we can use both the control of assist and the power of ai in the same time

copper yacht
#

There is no built in way as of right now, only with a custom integration.

dense sphinx
#

@scarlet echo I was excited to see my media_player intents in the release today! I totally appreciate dev time is precious but was wondering if any of the other service calls, particularly media_previous_track (as we now have next track) are on the roadmap? I have the custom intents I am using for all the other service calls ready to go!

scarlet echo
#

for example, I've just opened a PR for the implementation of a HassClimateSetTemperature intent. No idea if it was on the roadmap, but I've heard many times that the roadmap was largely influenced by community contributions

dense sphinx
dense sphinx
scarlet echo
dense sphinx
# scarlet echo ...just for English and then make sure all linters and tests pass

Last question. I saw above discussion about homophones. In one of my intents I have "Clear <media_player> (queue|cue|Q|cube)" Is this kosher? I can forsee frowns about cube (must be my poor pronounciation but it saves me a lot of didn't understands!) but the other three are homophones so should I list at least those three?

scarlet echo
#

you should not list them, especially for a first iteration of a new intent. however, if they help you, i'd totally suggest having that particular custom sentence on your system, tied to the same intent

hollow silo
#

@ivory vessel @noble copper were the last intents updates included in yesterdays release? The new Dutch intents are not working (eg volume of media players and vacuum start)

#

They don't work in 2024.3, but they do work on the 2024.4 nightly

ivory vessel
hollow silo
#

okay, no worries, but that explains it πŸ™‚

#

something else

#

STT sometimes adds comma's. If it does everywhere all is fine, but not if it only adds one

ivory vessel
#

Weird, seems to work in English

#

Oh, wait. It does fail with the same intent. I'll take a look.

hollow silo
#

Thanks!

reef anchor
broken elbow
#

i also noticed today that "set curtain to 90%" worked but for some reason "set curtain to 100%" became "set curtain, to 100%" (translated from finnish)

ivory vessel
reef anchor
dire oar
#

Is there a list of supported ADC's/VPU's that are currently supported by the project, be it with a ESP32 or RPi? I'm aiming to design a new satellite with beam forming

dire oar
ivory vessel
dense sphinx
#

@scarlet echo Just doing these extra media player intents. I think I have got it all working. Tests are all passing EXCEPT when I add a response key to the tests. Then I get an assertion error even though it is an identical format to the previous ones I have done. Ideas? The error looks like AssertionError: No response template for intent HassMediaClearPlaylist named default: clear TV queue

scarlet echo
#

Testing issues

scarlet echo
#

should we have support for (custom) integrations to define their own set of intents and sentences which could be added by default to the Home Assistant conversation agent?
so for example, the Alarmo integration could expose an "arm Alarmo" or "arm [the] home alarm" sentence that the default conversation agent would adopt and have ready to use instantly
or since I've been discussing with Gav from Music Assistant, the MA integration could expose specific intents for media playback or other media-related actions
thoughts?

dire oar
#

Has anyone managed to show the beam form direction picked up on the mic array back to the end user? Specifically in a similar way to an amazon echo?

ivory vessel
ivory vessel
dire oar
#

Thank you for the link, that's what I was looking for, I just couldn't find the keywords to track that down

dire oar
dire oar
#

How is development going in regards to space usage of Flash and Ram, looking through the ESP32 S3 data sheet it seems that it supports up to 1gb on both. Would that be ok any use? (Above the 16mb and 8mb?)
It looks like 64 MB would be easy enough to get up to and still be inside the virtual address space

scarlet echo
#

I've opened a discussion in the architecture repo for including devices among the things Assist can query to do its job (e.g. what is the <device_class> in <device_name> - what is the temperature in the fridge). If you think it's worthy, please vote and/or comment https://github.com/home-assistant/architecture/discussions/1060

worldly narwhal
#

bump ^^

What can I do to get this merged? - I tried making the same changes via the preferred github codespace method however I don't have the necessary permission to push, I think I need to be a language leader? feels like a chicken-egg problem, need two PRs to be language lead, but can't get these merged πŸ˜…

ivory vessel
ivory vessel
#

@worldly narwhal There was some problem with the CI and I couldn't get your PR's to run the tests. I pulled the changes into a single PR and got it merged. Adding you as a language leader for sw now. Thanks for your patience πŸ™‚

floral path
scarlet echo
#

You need to open 2 PRs, not to merge them yourself

floral path
#

I was not talking about Merge, but creating the PR. I think when I follow the documentation step by step, when I create PR, it tries to publish the branch to homeassistant/intents first, and then create PR from this branch to main. But I have no permission to create branch on homeassistant/intents. So I had to publish it to my account first, and create PR from there (I think it automatically creates a fork of homeassistant/intents first - don't catch me there, I am not a git expert).

worldly narwhal
worldly narwhal
scarlet echo
floral path
# scarlet echo codespaces have nothing to do with forks (you can create a codespace on your for...

I think the last sentence is not correct. People are generally not allowed to create or commit to branches on homeassistant/intents (and I do not mean the main). You might not see that as you have more rights.
So it is not only a good idea to update the sentences in your own fork, but that's the only way. And this is also what is confusing on the documentation.
Nobody was talking about making commits to the main directly I think.

scarlet echo
#

Now that @worldly narwhal is a language leader, he can commit dirrectly to the repo, which is not advisable. that was my point

floral path
#

Ok. I think we were talking about the documentation in general.

dire oar
#

@scarlet echo not sure if I've asked here before, I'm looking at making an "Ideal" open spurce smart speaker PCB, with whatever hardware would be best suited to this project and Willow. Could you please advise who would be the best members to contact to collaborate with?

scarlet echo
#

Can't say i can think of too many people, you'll probably have more success on the ESPHome server. Here are a few that come to mind:
@static stump, founder of Raspiaudio, probably up to his ears in closed source hardware design
@lyric harbor, founder of Willow, unsure about his availability
I don't know if he's on this server, but Sebastian from SmartSolutions4Home may be another good pick as a skilled electrical engineer https://smartsolutions4home.com/about-me/

#

Note that the above message has not notified the tagged people that they were tagged

lean beacon
#

Are there docs on how to stream sound over websockets to the assist_pipeline integration for wake word detection? Or is it the same as for stt?
Also, in what format does the audio stream have to be? (I'm not really experienced in working with audio formats)

lean beacon
ivory vessel
lean beacon
#

Thanks!

west gulchBOT
#

@lean beacon I converted your message into a file since it's above 15 lines :+1:

broken beacon
#

Is there any development done for the recommended M5 Atom Echo to solve the issues with it when used with Homeassistant?

compact gate
#

That's far too vague to answer, and not related to development

ivory vessel
lean beacon
#

Ah okay thanks.

left socket
#

What power supply is everyone using to power the M5 stack? I am looking to replace my Amazon Echo devices and will need 8 of them.

left socket
unborn saddle
#

pressing button on Echo to say 'Doe de espresso uit' (is in Dutch, Espresso is an alias) and the response is, 'Sorry, ik kan geen apparaat vinden met de naam De Espresso', would that be an issue to report here, or would that be expected.

#

I could have sworn it did act properly before, so guess there was some development that changed its behavior

neat relic
#

Echo -> Alexa right?

severe forum
#

my voip assistant connectors dont disconnect when the call is finished. What info more is needed to make a bug.

#

one is on for 20 hours now.

unborn saddle
hollow silo
#

Can't reproduce it

unborn saddle
#

Is that when you talk to the device, or when you type? in my case its when I give the voice command

#

i have this switch, with aliases

young wadi
#

I just want to thank the devs, the latest update fixed my over-a-month-fight with the voice assistant ecosystem

young wadi
#

aaaand didn't survive a reboot, damn pulseaudio, you're savage!

severe forum
#

heya. the voip assistant does strange things then asterisk breaks in and moves the call to a other channel.
the assist stays open and the assist processing in the debug runs forever.

scarlet aurora
#

The lists on the _common.yaml file of the intents cannot have the same "in" value for multiple "out" values, or can it?

#

in portuguese, "persiana" can be used for both blind and shutter...

#

and also "estore"

worthy wave
#

Hi @ivory vessel , is it possible to add metadata information from sentence to response (it mean render_response)? eg. If I have a example sentence (below) it will be nice to have metadata <key>: <value> in response eg, one_sensor: "{{ metadata.response_text }} {{ state.state_with_unit }}".

# Wind speed
- sentences:
    - "<what_is_the_class_of_name>"
  response: one_sensor
  requires_context:
    domain: sensor
    device_class: wind_speed
  slots:
    domain: sensor
    device_class: wind_speed
  expansion_rules:
    class: "(prΔ™dkoΕ›Δ‡|szybkoΕ›Δ‡) [wiatru]"
  metadata:
    response_text: PrΔ™dkoΕ›Δ‡ wiatru wynosi
#

The problem in Polish is that in order to correctly create a response for the indicated device, you would need to create the name of the device in its basic form (without inflection). But in current yaml configuration it is impossible. That's why I wanted to prepare answers without providing the name of the device. They will contain information about the class of the device we are asking about.

scarlet echo
#

take my upvote!

worthy wave
#

Currently, to do this I have to prepare a large number of responses for each device..

#

something like this (but it not looks quite good in the main configuration):

#
one_sensor_apparent_power: Moc pozorna urzΔ…dzenia wynosi {{ state.state_with_unit }}
one_sensor_aqi: Indeks jakoΕ›ci powietrza wynosi {{ state.state_with_unit }}
one_sensor_atmospheric_pressure: CiΕ›nienie atmosferyczne wynosi {{ state.state_with_unit }}
one_sensor_battery: Poziom baterii wynosi {{ state.state_with_unit }}
one_sensor_carbon_dioxide: StΔ™ΕΌenie dwutlenku wΔ™gla wynosi {{ state.state_with_unit }}
one_sensor_carbon_monoxide: StΔ™ΕΌenie tlenku wΔ™gla wynosi {{ state.state_with_unit }}
one_sensor_current: NatΔ™ΕΌenie prΔ…du elektrycznego wynosi {{ state.state_with_unit }}
one_sensor_data_rate: PrΔ™dkoΕ›Δ‡ transferu danych wynosi {{ state.state_with_unit }}
one_sensor_data_size: Rozmiar danych wynosi {{ state.state_with_unit }}
one_sensor_date: Data w kalendarzu to {{ state.state_with_unit }}
one_sensor_distance: OdlegΕ‚oΕ›Δ‡ wynosi {{ state.state_with_unit }}
...
scarlet echo
#

@worthy wave although it's an absolutely excellent idea and I can try to do that PR myself, just FYI you can hack this as we speak with slots instead of metadata

worthy wave
#

or/and if it is possible to add exact text which was recognised from expansion_rules to response. Eg in response we can see extra key like {{ rules.class }} which will contains text like prΔ™dkoΕ›Δ‡ wiatru or szybkoΕ›Δ‡ wiatru πŸ™‚ I know it won't be easy, but it would certainly make it easier to create correct answers πŸ˜€

#

@scarlet echo yes I know that I can use slots..

#

but slots in not a good place to add just a response.. I preffer create lot of responses in stead of this πŸ™‚

#

now maybe it will works, but on the future it can generate lot of problem πŸ™‚

scarlet echo
#

sentences/xx/homeassistant_HassWhatever.yaml

intents:
  HassWhatever:
    data:
      - sentences:
          - "abracadabra"
        slots:
          testslot: "test value"
        response: testslot
#

responses/xx/HassWhatever.yaml

responses:
  intents:
    HassWhatever:
      testslot: "{{ slots.testslot }}"
#
$ python3 -m script.intentfest parse --language en --sentence 'abracadabra'
{
  "text": "abracadabra",
  "match": true,
  "intent": "HassWhatever",
  "slots": {
    "testslot": "test value"
  },
  "context": {},
  "response_key": "testslot",
  "response": "test value"
}
worthy wave
#

yes, I know I can do that... but I don't want to do like that πŸ™‚

scarlet echo
#

i totally agree

worthy wave
#

@scarlet echo maybe you tested it on HA, small question: if I have real device on HA eg. living room door and I create aliases like door in living room. When I ask: What is the state of door in living room? What will be the value in {{ slots.name }}? living room door or door in living room?

scarlet echo
#

the slot text is what you said, in this case door in living room

worthy wave
#

Big thank's.. so it still not solving problem with convert polish name of device to base form πŸ™‚

#

maybe there is some magic field in HA that I can fill in to always use this form for answers.. hehe πŸ˜…

scarlet echo
#

we have loads of issues with not having both the slot "text" and value being available in responses (and other places). basically, there are places where we need both the "translated" and "untranslated" versions of a slot (e.g. a zone name + ID or an entity_id along with the friendly name etc.). i'm not sure of the timeline for this (or if there even is one) and i'm reluctant to implement it, as many of my contributions have become severely outdated by the time someone reviewed them and i don't have enough time to keep them up to date

worthy wave
#

yes, I saw your PR (and branch) related with translations.. that is the reason why in polish language I use lot of conditions to create correct response, again I know that is not a good solution, but without this the response will not make a sense in polish language

worthy wave
#

I have exactly the same problem in Polish language πŸ˜…

#

I don't know the Romanian language, but I see a lot of similarities to the Polish language

ivory vessel
#

FYI @scarlet echo I decided to rename the range scale to multiplier per your suggestion.

scarlet echo
#

I didn't realize the modification was in the intenta repo. After a quick skim, i thought the core had to be altered

ivory vessel
#

It does, I forgot to mention that this is to try it out first before I modify core.

scarlet echo
#

But then again, after a week of vacation and hundreds of emails both at work and personal, my context switching fu was not at its peak πŸ˜…

ivory vessel
#

Nope, you're definitely right πŸ˜„

scarlet echo
#

Since you're online, Mike, great work with a certain Andean mammal! πŸ˜‹ Can't wait to test it out

ivory vessel
#

Lol, thanks! It can't control HA just yet, but that's the next step. I have a proof-of-concept working, but we decided to generalize things just a bit more πŸ˜‰

west gulchBOT
#

@worthy wave I converted your message into a file since it's above 15 lines :+1:

#

@worthy wave I converted your message into a file since it's above 15 lines :+1:

worthy wave
#

result:

===================================================================================================================== test session starts
platform darwin -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0
rootdir: .../Home Assistant/intents
configfile: pyproject.toml
collected 79 items / 75 deselected / 4 selected

tests/test_language_intents.py ..                                                                                                                                                                                                                       [ 50%]
tests/test_language_sentences.py ..                                                                                                                                                                                                                     [100%]

============================================================================================================== 4 passed, 75 deselected in 1.08s 
#

@ivory vessel big thanks for this small changes.. It will really help to design better voice experience ❀️ πŸ’ͺ πŸ˜€

tidal ridge
# scarlet echo we have loads of issues with not having both the slot "text" and value being ava...

Sorry I’m just dropping in here… I think there is a languistics aspect that you both are getting at that might need to be handled differently that a slot. Some languages have pre-positional phrases where some have post-positional phrases. Adding to the fun, within the phrase the order of the linguistic object will change.

So having specific software objects that reflect the parts of speech is useful. My thinking is that the slot could be dirived based on the language setting affecting the construction of the parts of speech determined through an NLP library pulling apart the words through chunks, stemming, and lemmatization.

#

That last sentence was trying to do too much.

scarlet echo
#

What is your point?

tidal ridge
#

Yes… apologies… The point is to handle the linguistic differences prior to attempting to handle the intent and contents of the slot.

tidal ridge
#

STT -> Pragmatics -> Semantics -> Syntax -> Morphology -> Translate -> Intent and Slot -> Action

scarlet echo
tidal ridge
scarlet echo
hollow hollow
#

Hello guys! I've just finished initital version of custom integration of AllTalk TTS.

#

AllTalk TTS is in my opinion the best currently available TTS system.

#

tests, comments, patches are really welcome

worthy wave
#

Hey Guys, I have RTX 4090 and I try train new polish voice. But I have one problem witch torch version 1.13.1

>>> import torch
>>> print(torch.__version__)
1.13.1+cu117
>>> out = torch.fft.rfft(torch.randn(1000).cuda())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

Does anyone know how to solve this problem? I looked for some solutions but unfortunately I can't find anything..

#

It is worth adding that I am working on Ubuntu 22.04 πŸ™‚

worthy wave
worthy wave
hollow silo
#

I have something I don't understand.
In cover_HassTurOn I have the following intent

      - sentences:
          - open [de|het] <curtain> <in> <area>
          - "[de|het] <curtain> <in> <area> openen"
          - "[<doe>] [de|het] <curtain> (<open> <in> <area>|<in> <area> <open>)"
        response: "cover"
        requires_context:
          device_class: "curtain"
          domain: "cover"

<curtain> refers to "(gordijn[en]|vitrage[s])"

#

These are the tests (which pass just fine)

  - sentences:
      - Open het gordijn in de woonkamer
      - Vitrage woonkamer open
      - Doe het gordijn open in de woonkamer
    intent:
      name: HassTurnOn
      slots:
        area: Woonkamer
      context:
        device_class: curtain
        domain: cover
    response: Geopend
#

but if I try an individual sentence from those tests, it doesn't work

#
@TheFes ➜ /workspaces/intents (nl_volume) $ python3 -m script.intentfest parse --language nl --sentence 'Open het gordijn in de woonkamer'
{
  "text": "Open het gordijn in de woonkamer",
  "match": false
}
scarlet echo
#

that sounds like it's because you're not inferring context in your command. i can't remember how you can do that, though

hollow hollow
#

You will not regret, it's seriously the most advanced existing free TTS

compact gate
#

The folks here have nothing to do with that

hollow hollow
#

that's bad, making this integration i did not realise that hacs backlog is 3 months long 😦

#

i would not bother

#

it seriously negatively affects HA project as it drains steam from developers

scarlet echo
#

a few things here @hollow hollow

  1. HACS (as the name implies) is not HA, but the community store. this channel is dedicated to developers on HA voice stuff
  2. you can always list any repo as a custom Github repo in your HACS instance and use it without waiting for HACS to merge a PR adding the repo to the default collection. you can also instruct your potential users to do that
  3. as the HACS documentation for publishers states, the backlog is quite long and it will take a while to get to yours. out of personal experience, it took about 4 months for me
  4. "it seriously negatively affects HA project as it drains steam from developers" - although I agree (again, out of personal experience) that it can be frustrating and exhausting to wait as a developer for your contribution to get merged and used somehow, i seriously doubt that this process (be it in regards to HA or HACS) affects the HA project. there are just too many great things happening all at once for your great thing to make it a dealbreaker. and only so little manpower to handle and organize all that greatness
  5. when you created the PR in HACS, this was in the description. don't believe me? edit your PR (unless you've deleted everything there)
<!--
DO NOT REQUEST REVIEWS, THAT IS JUST RUDE, IF YOU DO THE PULL REQUEST WILL BE CLOSED!
Make sure to check out the guide here: https://hacs.xyz/docs/publish/start
-->
hollow hollow
#

thanks for your lecture, it was funny!

hollow hollow
#

But I would really like that review

#

Github clearly writes on the PR page: review required

#

I suppose it must be a Schrodinger review then - it is required and not requested in the same time!

#

And going back to your funny lecture a bit, the only person I see doing reviews is a Nabu Casa employee, it sounds like HA-related thing

scarlet echo
hollow hollow
#

3 months delays are evidently their internal problem which may be related to lack of workforce or bad procedures

scarlet echo
#

i suggest you ask for a refund

hollow hollow
#

as they could delegate someone from a community

scarlet echo
#

now please stop spamming this channel with off-topic things you don't (want to) grasp

hollow hollow
#

I will just ignore your dumb comments, it will be easier πŸ™‚

#

They are funny though

drowsy inlet
split bison
#

And frankly, you're alienating potential users by being difficult to deal with πŸ˜‰

#

I'm going to step in as a moderator and say: you're in the wrong place, you need to follow the rules of this and GitHub

earnest marsh
#

Making a negative scene, without having actual foundations or context

hollow hollow
#

As provoked by mr telelele I checked there is only one reviewer, so I think Mr Stefan you can't blackmail me efficiently

earnest marsh
#

It isn't blackmailing IMHO, it is true. Such approach wouldn't be received well in general

#

You might not want to hear that, that is fine πŸ€·β€β™‚οΈ

hollow hollow
#

I clearly presented the actual foundation: 3 months no review and hundreds of reviews waiting, so you are simply lying, Mr Frenck

earnest marsh
#

anyways, HACS != Home Assistant development, so this might not be the right place for this

split bison
hollow hollow
#

Yes we can close this topic indeed

earnest marsh
#

@hollow hollow Sorry, that felt offensive, where was I lying?

hollow hollow
#

Why not if he is lying

hollow hollow
#

you told i have not presented the foundation

#

Which I clearly did

earnest marsh
#

That is not what I said

hollow hollow
#

Mr Frenck said: "without having actual foundations"

earnest marsh
#

I said: You are making assumptions on what is happening or the reasons what is going on, while there is no response and thus no foundations for those conclusions. You are guessing

#

yes, you had no response and no context, you cannot make such conclusions out of thin air

hollow hollow
#

no you wrote "without having actual foundations "

#

it was a lie

#

you just jumped on me

earnest marsh
#

I understand you are unhappy with the wait, but πŸ€·β€β™‚οΈ You are also drawing conclusions based on nothing but wait

hollow hollow
#

probably because it's a real problem

earnest marsh
#

Alright, ok this is going nowhere. Let's stop this here. This is not HA voice development related.

hollow hollow
#

OK

subtle sierra
subtle sierra
#

I modified it and it still fails!

west gulchBOT
#

@subtle sierra I converted your message into a file since it's above 15 lines :+1:

scarlet echo
#

sorry, i can't read Arabic so it's pretty hard for me to help out. i'd suggest tagging the AR language leaders in your PR, asking for help

worthy wave
#

Hey guys, has anyone tried adding own pretrained voice to piper? Do you know how to do this? I tried various ways but unfortunately it doesn't work..

worthy wave
#

Finally, it start working for me πŸ™‚ that are few tips to use the own voice:

  1. Add new files to /share/piper but the correct name for the files is wg_glos_meski.onnx and wg_glos_meski.onnx.json. Don't use pl_PL-wg_glos_meski-medium.onnx
  2. Restart piper add on and core HA to see changes
  3. Update your pipeline and select new voice which was added to your HA, in my case it just was wg_glos_meski (medium)
  4. Update all automations where you use tts.speak service. You should select your new voice use options: configuration and set correct values voice: wg_glos_meski where value (wg_glos_meski) is the name of the voice:
service: tts.speak
data:
  media_player_entity_id: media_player.korytarz_homepod
  cache: false
  options:
    voice: wg_glos_meski
  message: Wykryto wyciek wody w kuchni pod ekspresem do kawy.
target:
  entity_id: tts.piper
#

Only one small problem is the inability to set the voice directly in the add-on πŸ™‚ Even when I try to set it directly via yaml configuration.. it is just impossible πŸ™‚ because add-on voice list are hardcoded in plugin configuration: https://github.com/home-assistant/addons/blob/master/piper/config.yaml#L28. If I set something, I get an error message πŸ™‚

worthy wave
#

@scarlet echo How do you rate the effect of onju-voice? Are you satisfied with this speaker? How does wakewords perform when there is slight background noise? and most importantly, does it cope well when there is a slight noise and we say a command? PS. I'm asking because I saw that you prepared a video on YT with instructions on replacing a PCB πŸ˜‰

scarlet echo
worthy wave
scarlet echo
#

The hardware in the Onju is pretty good. I am waiting for the software to improve so as to fully utilize it 😬

meager onyx
#

Hi how do I create custom models for micro wake word?

#

Or is it better to use openwakeword for the time being?

scarlet echo
#

I think we may have made the wrong decision in regards to querying cover entities for questions such as "Which windows are open?". But switching to binary_sensor would be a breaking change, so I started a poll here to get some feedback on which type of entities people have https://github.com/home-assistant/intents/discussions/2168
If it will reveal that binary_sensors are more prevalent, is it ok to switch the default target domain to binary_sensor? @ivory vessel @noble copper

hardy cargo
noble copper
scarlet echo
# noble copper We shouldn't make a decision but support both

a decision has been made ~1 year ago. i've discussed with Mike the potential solution to implement support for such thing (i.e. "entity" slot lists, so you can have more than one entities referenced in a sentence (e.g. "is Paulus at the supermarket?") with one or more filters which entities should match
the trouble is that PRs in that area get stale and I, for one, don't have the time to redo everything from the ground up after 2 months because everything got overhauled
so the proposed solution was simply just as bad, but more general

scarlet echo
noble copper
hollow steeple
#

All paths lead to pjsip, as with most SIP UAs you can probably think of. If you want to get hacky with it I'm sure you could made a bastardised UA that just makes a call and does not care too much about playing nicely you can hack something together, as with the original UA specific to a Grandstream HT801, it really depends how long you want the call to be and how the UAS reacts. If the audio you want to transmit is under 30 seconds you can probably get away with murder and just send an INVITE, wait for a 200 OK, ACK it and send/receive audio (with very specific codec choices as with the original). What's your actual use case? I can be a bit more specific with more information on what you want to achieve (Does it need to auth? What does it actually need to call?).

spare forge
#

@scarlet echo I updated to the latest version of onju-voice-microwakeword from your repo and it seems to only recognize the wake up word once and then never ever again. Does that sound like a known issue to you? Also, playing media doesn't seem to do anything

naive parcel
#

What sort of hardware is recommended to train/finetune a new voice for Piper? Is a single gpu with 24GB of VRAM enough, or it would be preferred to have a multi-gpu setup?
(another way to phrase this question is "What hardware is Mike training piper on")

#

(i'm interested in contributing voices for Piper)

naive parcel
#

Also, are there tools for processing public domain audio book recordings into a dataset?
I was thinking of like, a tool that uses whisper to transcribe the audio files and save all that metadata into a csv file
I remember Mike saying somewhere that he used public domain audio books to create voices for piper, so I imagine he didn't create datasets manually and used some tools for automating the process=

naive parcel
#

The whisper transcription is OK but dodgy in a few places, still looking for a good software to edit metadata

little robin
#

Hi all, I would like to extend the conversation component to allow to send TTS messages to for example esphome's Voice_Assistant. I have been looking into the code and it seems easily to do without the need change very much, Sadly i have not the skills and environment to do it my self., I tried. Is there anyone that is willing to help me out?

scarlet echo
little robin
#

I know i can use the media player. But some devices do not support media_player option. And from what i understand the media player requires MP3 codex to play audio. While VA uses WAV audio.

little robin
#

Anyway i will ready the architecture discussion forum and place my suggestion there.

little robin
#

there was a lot of reactions on your suggestion i see, but so far i can see none are in the direction of let me implement it. Am i right?

scarlet echo
#

correct. there needs to be a decision from an architectural standpoint in order to guarantee the merging of the feature

little robin
#

From the look at the code, it is almost there, there are no architectural changes needed, imho, it is just extending the already existing code. The assist_pipeline has all the setup options that are needed to send the TTS messaages.

scarlet echo
#

you're looking at it simplistically, i fear. OK, so you can emit a TTS message, but how will that tie into your response? the proper approach, in my view, is to build the foundations for a conversation (i.e. back and forth messages) that can be started by either party

little robin
#

No kidding, This would be my optimal solution as well. And i fully agree to that. And i still belief this is still possible within the current pipeline architect. Maybe not as extended as your proposal is but i see some lights in the dark.

#

My approach is doing it step by step. First the initial message from HA and later controlling responses. With some proper automation setup this can be done by adding different triggers that response on what is said.

hollow silo
#

working on the timer intents now, but I get this error when doing the tests
FAILED tests/test_language_intents.py::test_homeassistant_HassCancelTimer[nl] - AssertionError: Intent HassCancelTimer does not support slot 'seconds'. See intents.yaml for supported slots

#

any idea where this comes from. until now I only made direct translations from EN to NL, didn't use seconds directly anywhere

little robin
#

did you do a full search on 'seconds'?

#

I'm sure you did πŸ˜‰

hollow silo
little robin
#

@thefly what happens when someone says something like pizzatimer instead of pizza timer?

#

or keukentimer vs keuken timer

hollow silo
#

{area}[ ]timer and {timer_name:name}[ ]timer

west gulchBOT
#

@scarlet echo I converted your message into a file since it's above 15 lines :+1:

hollow silo
scarlet echo
#

do you... <cough cough> intend on going?

#

(i'll see myself out)

hollow silo
#

If the winds are good, I can just listen to it for free (also when I intend to sleep πŸ˜… )

hardy cargo
#

there is a campsite! so you can stay in**tents **

scarlet echo
#

i feel bad about my on-topic message question becoming less relevant, but do they test for compliance at the entrance, before allowing you to go to the main stage?

hollow silo
#

on-topic then

#

It's a bit unclear to me when to use slots and when to use requires_context

#

I had requires_context here, but that didn't work. changing it to slots makes it work

      - sentences:
          - "open [de] garage[ ][deur]"
          - "[de] garage[ ][deur] openen"
          - "[<doe>] [de] garage[ ][deur] <open>"
          - "<zou> [de] garage[ ][deur] ((<open> willen | <open> kunnen | <open>[ ])<doe>|openen)"
          - "<zou> [de] garage[ ][deur] (kunnen|willen) [<open>[ ]<doe>|openen]"
        response: "cover_device_class"
        slots:
          device_class: "garage"
          domain: "cover"
#

I think I initially just copied this from the EN version

scarlet echo
#

requires_context (in sentences definition) is for when:

  • you use a {name} in the sentence and want to make sure the sentence matches a certain domain (although that can be enforced through the filename - domain_IntentName), device_class etc.
  • you want to make sure that the satellite used had an area assigned, to treat sentences like area-aware without the user naming the area
#

slots (in sentences definition) is for when you want to specify a certain slot value without the user saying it. for example

- sentences:
    - "start a half hour timer"
  slots:
    minutes: 30
#

this will populate the slot value with a value you specify, then will hand it over to the conversation agent to use it

hollow silo
#

ah thanks

scarlet echo
#

there's a slight issue with the context in tests, as far as i see it. you can't send context without expecting it as a slot, so the sentences MUST send out context as a slot, and i see that as a bug which I have tried correcting https://github.com/home-assistant/intents/pull/2142

#

basically, at the moment, input context should be the same as output context in a test

#

which should not be the case, as i see it

scarlet echo
#

I'm not sure who needs to hear this, but the new code review bot in the intents repo seems very good. A welcome addition, thanks!

hollow silo
#

yeah, it's active on all HomeAssistant repo's, but it seems to provide good information

fair hill
#

How can I apply as language leader? I see that for DE there are quite a few open PRs that are neither commented nor reviewed. To distribute the work load I'd like to help out and join the current leaders.

pseudo bobcat
#

HI All, apologies if its the wrong place to ask a Voice question - I have voice working from an ESPHome device to HA .. the wake words are running fine, but when the command is spoken, HA doesnt detect the end of the sentence. (Long pauses for timeout) - the phrase is correct, just lots of silence.

#

is there a setting I can butcher to experiment more >

compact gate
#

You can try tweaking these:

  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.0
little robin
#

Dear @neon swan, not sure of this is the right why to contact you. I'm working on a mobile VA device (https://discord.com/channels/429907082951524364/1171818251983011920) . For this I like to create a solution to allow HA directly to talk to VA without the need to answer a question πŸ˜„ . I know this can be done using the media_player. The thing is that i do not like to add a heavy component like this in esphome, just get announcements from HA. I have been looking in to the HA core code and i belief that the current architecture has everything that is needed to allow this.
But i like to confirm what i figured out and talk on how to implement this new feature.

rotund frigate
#

Hello

So I managed to have the answer to my query on the Atom Echo from GPT (OpenAI Extended) talked either by a mp3 sent to it or directly with the text being spoken by Alexa's voice to my Echo Dot. The thing is that I did all this by modifying the components code. I'd like if possible to do this directly from the Atom Echo yaml conf and send the text of the answer via the notify service that makes a request to the Echo Dot, but I don't know how to retrieve the text of the answer. I only manage to get the audio or I also can get the text of what I've said so my STT.

To get the text of what I have said, it is

on_stt_end:
then:
- lambda: id(stt_text) = x.c_str();

I tried tts_text but it's not working. Has anyone a clue ? Thanks

compact gate
#

please don't crosspost, and this is not a dev question

#

you'll get more support for that over in the ESPHome Discord

calm sleet
#

How do folks reviewing new sentences feel about using an LLM to generate alternative sentences? If I'm not blindly committing but instead using it to come up with alternative natural sounding permutations, will that be accepted?

#

After the new release talking about sticking an LLM in to parse intents in an online system, I got thinking that some of that flexiblity could be achieved offline by letting the LLM come up with sentences structures for each intent. So basically the inverse.

scarlet echo
#

if it makes sense, nobody cares you used AI to generate alternative sentences
Mike's initial plan with the intents repo was precisely to have grammatically correct sentences for doing stuff and then to have some ML model come up with new ones based on them

scarlet echo
calm sleet
#

Great! Thanks. I just wanted to make sure it wasn’t against some policy. I’ll carry on. I have a few other PRs in draft to expand sentences quite a bit. I also noticed that there’s a lot of repeated patterns between files. I’ve got another branch started to kind of refactor that a bit. I’ll make sure to keep the patches manageable though.

scarlet echo
calm sleet
#

Yea. That sounds good. Mind having some discussion about the feedback here?

Mostly around precise grammar structure vs same intent. Targeting precision rather than capturing intent seems unintuitive to me given that it will force users to speak very precisely. Some users may have different grasps of the English language.

For example: If my toddler says: β€œis all the shades open?” It is not correct grammar, however the intent is unambiguous. It feels to me that it would be best to respond to this kind of query with the expected intent all the same.

scarlet echo
#

fo' shizzle

scarlet echo
calm sleet
scarlet echo
#

i guess that's a good place and there are already people engaged there

ivory vessel
calm sleet
#

Ah, ok. Thanks for the guidance! I've got a few PRs that I'm working on to hopefully make a lot of sentences more general. I've found that it still has the feeling of needing magic encantations while the sentence count is small. Expanding to cover more variants, even grammatically incorrect ones, will help with positive response rates.

verbal arch
#

Discuss m5 Atom stuff here? I've had 2 working as voice assistants for 9 months, but now one never detects speech. I've factory reset, rebuilt firmware and uploaded it, power cycled, whatever I could think of. It sees and logs button presses but as for speech it just stays in the WAITING_FOR_VAD state forever, whereas the other unit "detects speech" even in a quiet empty room every 5 seconds. Does the hardware just die?

smoky forge
#

Is it possible to have Assist either (1) not respond with a vocal answer, or (2) respond with just saying "Done".

I know what I've asked it to do, so I really don't need to be informed that switch has been turned on when that's what I've asked it to do.

fallow cedar
#

does home assistant support streaming chunked TTS responses?
Seems like it's waiting for the full file to download and then it plays back instead of playing immediately

ivory vessel
ivory vessel
fallow cedar
ivory vessel
compact gate
blissful cave
#

So mine for examples uses a little python blip to pick out a random file from a folder of star trek computer beeps when it hears wakeword and when it is done transcribing stt, you could do similar at any point in the pipeline

blissful cave
#

I know in my own home latency between end of stt and response is the biggest friction point right now so it would be great to explore this as one avenue towards better responsiveness. I haven't made any contributions to the project yet but I'd be happy to jump in now that I'm happy with where my individual setup is

compact gate
blissful cave
#

That being said, I think you can get what you want by playing a success noise on one of those two conditions and returning no speech from the intent. That way, if there's an error firing or finding the intent, it will still inform you verbally, but if it succeeds, it will play a success noise and then should proceed silently

#

I'd be happy to share the code to get that working but I don't want to spam the dev chat πŸ™‚

compact gate
#

Appreciate the offer, but I'm okay with what I have for now. I'm more focused on walk-up-and-talk reliability across all my devices for now

fallow cedar
blissful cave
#

I assume that's why the integration makes duplicates the way it does

fallow cedar
fallow cedar
delicate pike
#

@ivory vessel first of all thank you for the great work on Assist (I'm following you since Rhasspy), thanks to the complete team here too. I'm currently experimenting with a quite small LLM (without GPU) Ollama gemma:2B (the system runs on a server with 8GB memory). Without any "sensors" data in the template the response is acceptable time wise (off course due to the limit of the system it takes couple of milli seconds to respond). Adding the sensor data within the template in the context it takes quite longer (it depends from the amount of data to process, some time minutes, and from the test I've done it is quite proportional with the amount of data in the context). I thought it is possible probably to combine Gemma with Assist. The basic idea is to use Assist detect the user intent, and provide the context to Gemma that formulate the response (user: "what is the temp. in Living Room" -> Assist get the data in Living Room and provide the context to Gemma or user:"turn on Entrance light" -> Assist get the intent do the action -> Gemma get the result context). As well by doing so, it could be possible to control the house, as per passing from Assist to detect the intent could be possible to do so. I know probably this isn't a real "AI" or better a kind of AI teams implementation.. but this could help to keep all in Local and have similar performances to the current OpenAi implementation also on limited systems. The "Conversation" when detect an intent of "general topic" such as User: "why the sky is blue" -> Assist would not found the device "sky" -> should get the user input to Gemma directly. I could invest some time play around with it. And this is just to share the idea with you guys. Thanks once again for the great work.

ivory vessel
ivory vessel
# delicate pike <@638799193586139136> first of all thank you for the great work on Assist (I'm ...

I'm very interested in these sorts of experiments πŸ™‚
Assist could definitely be used to detect the intent, though (as everyone knows) it's fairly rigid. Some ideas I've had for doing this differently:

  • Use the LLM via text to categorize the intent (slow)
  • Use the LLM to get an embedding of the user's sentence and compare it with pre-computed embeddings of the various intents (faster)
  • Use a tiny BERT model to train an intent classifier (should be even faster)

What kind of server are you running?

blissful cave
#

the "media_content_type" attribute of media_player might be a good axis for that tts convo from earlier.

#

Might provide an angle to extend some tts optimizations into that integration at least. It would probably make sense from an end-user perspective as well to offer them an intended mode for voice, since media_player will increasingly be used for tts

delicate pike
# ivory vessel I'm very interested in these sorts of experiments πŸ™‚ Assist could definitely be...

I’m running it on a Fujitsu Server Primergy Tx140 S1 (old and cheap server). HA runs on Docker (it’s a supervised version) the OS is Debian Server 11 (amd64). Ollama is installed on the OS directly. So I use localhost to connect it. I think to open soon I branch and start to work on it if you don’t mind.
I also order an AMD GPU just in case but the target is to get it work smoothly and I can ensure you (can do some demo video to show you what I mentioned) Ollama gets the stats for example of the sun position only the response is handled in few milliseconds. I’m currently studying the docs of LLM.py.