speechio + vosk function get_command(1) - I'm probably using it wrong | Viam Robotics. | Page 1

harsh vigil Apr 4, 2024, 4:45 PM

#

On Ubuntu Laptop I am running speechio and vosk as the speech recogniser, but get_command(1) is returning an empty string.

I have tested the microphone using aplay and arecord defaults, so I think that is OK.

Should I debug my code or the audio?

The setup for all services and the code attached

📎 speechio.json 📎 speechtest.py

uneven tiger Apr 5, 2024, 1:51 PM

#

At the moment, we’re having issues with using vosk as the “listen_provider” in the speech module. The SDK team is working on a fix with how modules communicate with each other. Apologies for the inconvenience. 🙏

#

I will add a note to the speech module README until we have that issue resolved.

harsh vigil Apr 5, 2024, 2:04 PM

#

uneven tiger I will add a note to the speech module README until we have that issue resolved.

Thanks - are you aware of a decent "How to do Google voice from scratch" by any chance? I have to set up identities etc for the challenge. If not, no worries, I will try to muddle through 😄

harsh vigil Apr 5, 2024, 2:04 PM

#

uneven tiger At the moment, we’re having issues with using vosk as the “listen_provider” in t...

ps - if VOSK eventually works as a remote that will be super super cool! Please pass to the devs

uneven tiger Apr 5, 2024, 2:06 PM

#

harsh vigil ps - if VOSK eventually works as a remote that will be super super cool! Please...

That’s the plan! 😊 we also have work in progress to make Piper TTS available for local speech soon too.

uneven tiger Apr 5, 2024, 2:07 PM

#

harsh vigil Thanks - are you aware of a decent "How to do Google voice from scratch" by any ...

I can do a little research and pass that along 👍

outer bough Apr 5, 2024, 3:10 PM

#

@harsh vigil in the meantime you can use the vosk and piper modules independently, also with the local-llm module if you want. the speechio module basically is a module that binds this all together, but you could do so in your user code

harsh vigil Apr 5, 2024, 3:45 PM

#

outer bough <@973245369758674978> in the meantime you can use the vosk and piper modules ind...

I found this https://cloud.google.com/text-to-speech/docs/authentication and so setup google cloud, but am getting an unusual log message

2024-04-05T15:38:41.213Z   warn robot_server.speechio     logger named %q sent a log over gRPC with an invalid level %qspeechioWARNING  
2024-04-05T15:38:18.832Z   warn robot_server.speechio     logger named %q sent a log over gRPC with an invalid level %qspeechioWARNING  
2024-04-05T15:38:13.854Z   warn robot_server.speechio     logger named %q sent a log over gRPC with an invalid level %qspeechioWARNING  
2024-04-05T15:38:06.587Z   warn robot_server.speechio     logger named %q sent a log over gRPC with an invalid level %qspeechioWARNING  
2024-04-05T15:38:04.037Z   warn robot_server.speechio     logger named %q sent a log over gRPC with an invalid level %qspeechioWARNING  
2024-04-05T15:37:58.930Z   warn robot_server.speechio     logger named %q sent a log over gRPC with an invalid level %qspeechioWARNING  
2024-04-05T15:37:43.154Z   warn robot_server.speechio     logger named %q sent a log over gRPC with an invalid level %qspeechioWARNING  
2024-04-05T15:37:24.396Z   warn robot_server.speechio     logger named %q sent a log over gRPC with an invalid level %qspeechioWARNING  
2024-04-05T15:37:18.928Z   warn robot_server.speechio     logger named %q sent a log over gRPC with an invalid level %qspeechioWARNING

Config is attached:

📎 speechio2.json

Google Cloud

Authenticate to Text-to-Speech | Cloud Text-to-Speech API | Goo...

Learn how to authenticate with Text-to-Speech

harsh vigil Apr 5, 2024, 3:46 PM

#

outer bough <@973245369758674978> in the meantime you can use the vosk and piper modules ind...

Thanks Matt - Piper is just for Raspberry Pi?

uneven tiger Apr 5, 2024, 5:32 PM

#

For any Linux single board computer or MacOS computer 🙂

harsh vigil Apr 6, 2024, 7:08 AM

#

uneven tiger For any Linux single board computer or MacOS computer 🙂

Cool - no speech output yet, but debugging it. Vosk seems to delay a long time - is there a setting as to how long to listen before processing? Also, am I right in assuming vosk has to be on same device as the microphone currently?

#

ps - debugging on a Linux Laptop lesson 1 - don't be logged into the laptop or it will hog the mic and speaker

harsh vigil Apr 6, 2024, 8:30 AM

#

Ahhh - I think vosk might work remotely if I use audio input locally and send the audio. Short on time, but plenty of things to do 😄

harsh vigil Apr 6, 2024, 10:40 AM

#

OK, splitting and adding a remote vosk from the raspberry pi 4 to the laptop running vosk is generating a "not implemented error"

File "/home/blm/python/viam_sdk/remote_check.py", line 68, in <module>
asyncio.run(main())

File "/home/blm/python/viam_sdk/remote_check.py", line 53, in main
speech_text = await speech_vosk.to_text(speech=audio_bytes, format="wav")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/blm/viam_env/lib/python3.11/site-packages/speech_service_api/api.py", line 174, in to_text
response: ToTextResponse = await self.client.ToText(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

📎 remote_check_wocreds.py

harsh vigil Apr 6, 2024, 10:41 AM

#

harsh vigil OK, splitting and adding a remote vosk from the raspberry pi 4 to the laptop run...

It seems as if the "to_text" function gets changed to "ToText" but I could be wrong.

#

ps @outer bough - typo on https://github.com/viam-labs/tts-piper it implements "say" and "to_speech", not "to_text"

GitHub

GitHub - viam-labs/tts-piper

Contribute to viam-labs/tts-piper development by creating an account on GitHub.

#

Am getting the same not implemented on piper

File "/home/blm/python/viam_sdk/remote_check.py", line 58, in main
audio_to_output = await speech_piper.to_speech(text="Implmenting your command now")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/blm/viam_env/lib/python3.11/site-packages/speech_service_api/api.py", line 179, in to_speech
response: ToSpeechResponse = await self.client.ToSpeech(request)

with

    speech_piper = SpeechService.from_robot(robot, name="blah-blah-blah:piper")
    audio_to_output = await speech_piper.to_speech(text="Implmenting your command now")
    print(f"Audio back received, length: {len(audio_to_output)}, type: {type(audio_to_output)}")

harsh vigil Apr 6, 2024, 11:07 AM

#

harsh vigil Am getting the same not implemented on piper File "/home/blm/python/viam_sd...

I can run the code locally on the laptop

    # speech
    speech_vosk = SpeechService.from_robot(robot, name="vosk")
    speech_piper = SpeechService.from_robot(robot, name="piper")
    

    # Quick test speech
    print("Testing speech ouput")
    piper_response = await speech_piper.to_speech("Hello, How can I help you?")
    print(f"Piper response:- length: {len(piper_response)} type: {type(piper_response)} details: {piper_response}")

    speech_text = await speech_vosk.to_text(speech=piper_response, format="wav")
    print(f"Vosk response:- length: {len(speech_text)} type: {type(speech_text)} details: {speech_text}")

But when the remote is run with the same code fails

    # Setup vosk and piper for speech
    speech_vosk = SpeechService.from_robot(robot, name="viamubuntulaptop2-main:vosk")
    speech_piper = SpeechService.from_robot(robot, name="viamubuntulaptop2-main:piper")

harsh vigil Apr 6, 2024, 12:24 PM

#

ps uform works fine as a remote

rpi_cam3 get_image return value: <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1920x1080 at 0x7F8D6A9D10>
uform get_classifications_from_camera return value: [class_name: "A man and a woman are standing close together in a room. The man is wearing glasses and a brown jacket with a zipper. The woman is wearing a white sweater and has a necklace. They both appear to be smiling. Behind them, there is a white board with red dots and a bulletin board with papers."
confidence: 1

#speechio + vosk function get_command(1) - I'm probably using it wrong