Categories: Uncategorized

Speech recognition and Speech synthesis using neural networks

Hello All,

Today I want to show you how Joker can be used for speech recognition and speech synthesis using neural networks and Joker Empathy module.

Joker Empathy module

I have brewed two docker containers for super simple usage. Just one command required to run neural network and obtain the results. This tutorial should work on any Linux and OSx . No GPU required, only CPU.

This funny video shows voice interaction with Joker:

Speech recognition (speech-to-text)

This service based on Kaldi ASR project. Kaldi’s ‘chain’ models (type of DNN-HMM model) used. Actual trained model released by api.ai team. Model contains 127847 words. Compare this number with Oxford English Dictionary which contains 171,476 words or average English-speaking adult knows between 20,000 and 30,000 words. And need to say that this model shows 11.2% word error rate (WER). This is very good results ! “Old” speech recognition methods (GMM-HMM) can show only 21+% WER.

To run test just issue following command in console:

docker run -it aospan/stt

builtin file will be processed and output should contain following text:
/opt/in/in.wav HELLO THIS IS SPEECH TO TEXT RECOGNITION FOR JOKER PROJECT

that is what actually system recognized from audio file. Here is a audio file:

Supply your own audio file

If you want to use your own audio file then run following command in console:

docker run -it -v `pwd`/in:/opt/in aospan/stt

input file format ‘wav, 16 bit, mono 16000 Hz’ and location is ‘in/in.wav’.

Performance

Joker can process speech-to-text in real time with 25% CPU usage. This more than enough for “real world” use-cases like voice control, voice assistance, text dictation, smart home and many more.

Speech synthesis (text-to-speech)

This service based on Merlin project. I have trained neural network on ‘cmu_us_bdl_arctic’ dataset (male voice) prepared by Carnegie Mellon University.

To run test just issue following command in console:

docker run -it -v `pwd`/out:/opt/out aospan/tts

resulting audio file location is out/tts.wav. Here is audio file:

default phrase ‘Hello, my name is Joker. Today is a great day because it’s my birthday’ was used. To supply your own phrase run following command:

docker run -it -v `pwd`/out:/opt/out aospan/tts "your phrase here"

Conclusions

Now we can build very user-friendly systems with natural voice control like Amazon Alexa or Google Home. But Joker does’t need online connectivity, all speech processing done locally. This improves privacy and security – no audio data shared with third party. And we can do voice control when no internet connection configured (for example, for fresh installations).

Please check Joker Walker module for use-case of voice control.

Abylay Ospan

Share
Published by
Abylay Ospan

Recent Posts

  • EV
  • Solar

Floridian Unlimited – rollable solar panel for Electric Vehicles (EV)

I use "Floridian Unlimited" prototype in this video. 400W and 1000W models are planned. For instance, 1000W model can gives…

5 years ago
  • EV
  • Solar

Open Electric Car project

Open eCar Open Source Software for Electric Vehicle (EV) Open eCar project is open source software for electric car. The…

6 years ago
  • Uncategorized

Joker TV hardware functional testing with OpenHTF

Joker TV hardware functional testing with OpenHTF Functional Testing of PCBs is always used as a final manufacturing step. Functional…

6 years ago
  • Uncategorized

DTMB TS dumps (China)

Following TS dumps created on January 2018 in Changsha, Hunan Province, China. DTMB dump (722MHz) sands-722mhz.ts (50MB) This stream plays without problems…

6 years ago
  • Uncategorized

DVB-S/S2 blind scan with Joker TV

Hello everybody! This post describes satellite transponders (DVB-S/S2) blind scan with Joker TV’s universal USB DTV receiver.  Firstly, I will…

6 years ago
  • homepage
  • Uncategorized

High bandwidth USB Isochronous transfers

Hello everybody, this post describes data transfer over USB from Joker TV to a host using high bandwidth USB isochronous transfers…

6 years ago