#Using Assist to control DLNA music playback?

1 messages ยท Page 1 of 1 (latest)

iron root
#

Hello everybody,
I never tried working with assist, but I'd like to ask if it has the flexibilit to control media playback on multiple speakers using DLNA delivered music.
We have 5 alexa devices around our home... but those don't support DLNA playback. I am now thinking bout investing into getting new speakers that do have DLNA support OR those small intermediate devices that have DLNA power and that I can connect to a speaker. I'd probably still place a "alexa car" device next to it so we still have the alexa support.

My question is, is assist good enough to use voice commands to actually find the stuff you are looking for using DLNA?

Thanks in advance

rain gale
#

"find the stuff" means finding media or finding devices?

iron root
#

Media. I am under the assumption that DLNA capable clients will simply be found by HA ๐Ÿ˜„

#

It seems Alexa Echo 4th Gen devices have a 3.5 audio-in. I could grab a couple of those DLNA AudioCast receivers and plug em into that. Those should show up in HA as clients. This way I'd have both DLNA and Alexa support. Just have to find a way to switch between Audio-In and alexa "native" playback... its all just theory for now though xD

#

On the other hand... I'd gladly move away from alexa completely. The voice assistant they use seems like it never evolved... also, its reliant on active internet connection. AFAIK HA has built-in (running on device) support for their voice assistant... or does that still rely on the cloud? I am thinking bout also grabbing (at least one) a M5Stack Atom Echo for audio input... maybe HA assist is actually good and can do whatever my alexa can do?

#

Oh, that M5Stack thingy doesn't have a audio-out 3.5 ...only BT :/ not sure how I feel bout that. Had hoped I could simply plug it into a audio in

rain gale
#

Your question seems to be based on shallow understanding of what Assist is and what it does, or it's misphrased. No offense, either way ๐Ÿ™‚

Assist in itself does not find anything media related. Assist simply provides one or more pipelines, each consisting of a wake word engine (optional), a Speech-To-Text (STT) engine (optional), a "conversation agent" (a tool which decides how to respond to text with text) and a Text-To-Speech (TTS) engine (optional). You can pick each component, using either local or cloud options.

Finding media comes with the following challenges (as far as I can tell):

  • the name of the media needs to be in the same language as the pipeline, otherwise the single-language STT engine will probably not understand (e.g. "play Feuer Frei by Rammstein on the kitchen speaker"). This does not apply to text-only pipelines
  • even if it's in the same language, the STT might not be robust enough to recognize the actual thing you tell it to play due to made up words in the media title or simply for mishearing you (e.g. "play Fuu-Gee-La by The Fugees")
  • even if all these were properly transcribed, Assist does not handle the searching of media in a library. That's a job for another integration, which you can leverage via service calls actions. If you do have this integration (e.g. Music Assistant), then the conversation agent in Assist can use it, but...
  • there's no built-in intent in the default conversation agent for searching or playing media. You'd have to create your own custom intent or use a conversation agent which can do that
iron root
#

Thanks for clarification. Yeah, my assumption was its a full-fledged system that can govern over all entities.

Now that I understand its actual intended use I see the limitations.
I bet I could at least define some static sentences for it to pick up to execute some automations. This way I should be able to at least get it to play some static playlists that I would pre-define. Like "Play my NES soundtrack collection" ...which would just execute the correct automation. Sure, having some more flexibility in that regard would be nice "Play my NES soundtrack collection in bedroom", "...in bathroom".

#

Another thing though: How would I even get assist to respond? I mean, would it generate a audio file with the voice result and I'd then have to relay that to one of the speakers in my home? Being able to issue commands is great, but the response should be heard too... and I really dont want to connect a speaker direktly to my RPi and have that as the only speaker that plays the reply.

#

I guess I'll have to find some tutorials on this. Should I not get rid of my alexa devices I could playback the result using alexa.

rain gale
# iron root Thanks for clarification. Yeah, my assumption was its a full-fledged system that...

You can use placeholders in custom sentences (they're called "slots"), and they could use the aliases you define in the HA interface. That's a rabbit hole in itself, so I'd advise you try something and come back with more on-point questions if you can't get it working. Until then, just see the docs i've linked above. What I can tell you is that you DO have flexibility at hand. You just have to use it carefully and wisely

rain gale
rain gale
iron root
#

Alright. I'll start with some simple stuff then. Do you have any recommendation on what microphone powered deviecs I should get? The M5 Stack ATOM echo seems to be BT powered only... which I am not a huge fan of. Certainly the community found some good input devices already?

rain gale
#

every single turn-key device out there at the moment has its drawbacks. You can look at the ESP32-S3-Box-3, the Raspiaudio speakers or other stuff like that, but all have their drawbacks. HA themselves are working on a voice satellite device which seems to solve most problems, but as it's not out there, I can't say anything about how it works yet
there are also replacement PCBs for acoustically decent devices, such as the Onju Voice for the Nest Mini gen2 which might interest you
the other way is to make your own device and tailor it to your exact needs

iron root
#

ESP32-S3-Box-3 unavailable everywhere... also pricey. I think for now I'll just play around using my smartphone for input... if I get all to run then I'll look into adding more devices. We are also planning to get a dedicated table for controlling HA ...so that has a built in mic... might work just fine

#

Next limitation right now seems to be that HA custom wake words can only be english... which might be a problem here.

rain gale
iron root
#

Yeah, really exciting and definetly a great start. For me it might be usable... for my significant other and our 3yo daughter it might not be as easy to use though. I'll give it a try for sure and I'll keep watching development closely. If I structure my automations cleanly I should be able to later use em using assist easily

#

still, I'll do some simple tests with assist later just to get a first impression

#

I never really felt like I needed assist since I can just make everything from HA available to alexa and then use that... but now I want DLNA playback ...and we are used to using voice commands to control music playback. Alexa doesnt support DLNA ...so I will probably transition away from alexa... but that introduces the problem of needing another input device... and so on

rocky shadow
#

Hey!
Looks like you're trying to solve the problem, but still don't know that there's no ideal tools for that so far. ๐Ÿ™‚
I wanted to address some tricky points:

  1. Playing music directly on Alexa from HA is, m-m-m, almost impossible and definitely very hard to implement. It's involving unofficial (yet brilliant) AMP integration, which gets broken often because it's using unofficial APIs. And even then it'd involve issuing commands to Alexa like "play smoke on the water on Spotify", just programmatically. Yuck.
  2. In terms of M5Stack - it is WiFi client. It has its own speaker (teeny-tiny), but with some software quirk you can transfer text-to-speech response to any other HA-supported media player (e.g. Chromecast, Snapcast, DLNA etc.). Still, microphone quality is not good enough. Same relates to pretty much any device so far, excluding maybe S3 boxes (relatively) and Respeaker Lite which I'm using.
  3. I have pipeline with custom script, that converts user input (like "play the best song by Nirvana") into data with song and artist using ChatGPT. Then I use that to play through Music Assistant. Works okay with popular music... But every LLM has hallucinations, so sometimes it's glitching.
  4. About Music Assistant: for anything involving music in HA I recommend to check it closer. It's brilliant service, binding all your libraries to all your players in one place, and with seamless integration to HA.
ornate crow
#

Another vote for music assistant, I use that with the HA assist and it works just like a google or alexa, I can say "Play music by X" and it plays on the speaker asssigned to the area, I can specify a speaker, I can specify an album, artist, etc.

iron root
#

But, for music assistant the server shouldnt run on the same device as HA, right? I mean, HA is running on a RPi 4 ....from what I could gather Music Assistant is quite Ressource hungry.

I tried adding Music Assistant server yesterday and was unable to figure out how to add music to it. I'll probably get back to it again though