Categories: Uncategorized

Speech recognition and Speech synthesis using neural networks

Hello All,

Today I want to show you how Joker can be used for speech recognition and speech synthesis using neural networks and Joker Empathy module.

Joker Empathy module

I have brewed two docker containers for super simple usage. Just one command required to run neural network and obtain the results. This tutorial should work on any Linux and OSx . No GPU required, only CPU.

This funny video shows voice interaction with Joker:

Speech recognition (speech-to-text)

This service based on Kaldi ASR project. Kaldi’s ‘chain’ models (type of DNN-HMM model) used. Actual trained model released by api.ai team. Model contains 127847 words. Compare this number with Oxford English Dictionary which contains 171,476 words or average English-speaking adult knows between 20,000 and 30,000 words. And need to say that this model shows 11.2% word error rate (WER). This is very good results ! “Old” speech recognition methods (GMM-HMM) can show only 21+% WER.

To run test just issue following command in console:

docker run -it aospan/stt

builtin file will be processed and output should contain following text:
/opt/in/in.wav HELLO THIS IS SPEECH TO TEXT RECOGNITION FOR JOKER PROJECT

that is what actually system recognized from audio file. Here is a audio file:

Supply your own audio file

If you want to use your own audio file then run following command in console:

docker run -it -v `pwd`/in:/opt/in aospan/stt

input file format ‘wav, 16 bit, mono 16000 Hz’ and location is ‘in/in.wav’.

Performance

Joker can process speech-to-text in real time with 25% CPU usage. This more than enough for “real world” use-cases like voice control, voice assistance, text dictation, smart home and many more.

Speech synthesis (text-to-speech)

This service based on Merlin project. I have trained neural network on ‘cmu_us_bdl_arctic’ dataset (male voice) prepared by Carnegie Mellon University.

To run test just issue following command in console:

docker run -it -v `pwd`/out:/opt/out aospan/tts

resulting audio file location is out/tts.wav. Here is audio file:

default phrase ‘Hello, my name is Joker. Today is a great day because it’s my birthday’ was used. To supply your own phrase run following command:

docker run -it -v `pwd`/out:/opt/out aospan/tts "your phrase here"

Conclusions

Now we can build very user-friendly systems with natural voice control like Amazon Alexa or Google Home. But Joker does’t need online connectivity, all speech processing done locally. This improves privacy and security – no audio data shared with third party. And we can do voice control when no internet connection configured (for example, for fresh installations).

Please check Joker Walker module for use-case of voice control.

Abylay Ospan

NextJoker TV, FPGA Verilog/VHDL code »

Previous « Neural network scene understanding on Joker

Published by

Abylay Ospan

7 years ago

Joker TV hardware functional testing with OpenHTF
Joker TV hardware functional testing with OpenHTF Functional Testing of PCBs is always used as…
DTMB TS dumps (China)
Following TS dumps created on January 2018 in Changsha, Hunan Province, China. DTMB dump (722MHz) sands-722mhz.ts (50MB)…
Common Interface (CI) for descrambling TV channels
Hello everybody! This post describes how Common Interface (CI) works on Joker TV device. Common…

Floridian Unlimited – rollable solar panel for Electric Vehicles (EV)

I use "Floridian Unlimited" prototype in this video. 400W and 1000W models are planned. For instance, 1000W model can gives…

5 years ago

EV
Solar

Open Electric Car project

Open eCar Open Source Software for Electric Vehicle (EV) Open eCar project is open source software for electric car. The…

6 years ago

Uncategorized

Joker TV hardware functional testing with OpenHTF

Joker TV hardware functional testing with OpenHTF Functional Testing of PCBs is always used as a final manufacturing step. Functional…

6 years ago

Uncategorized

DTMB TS dumps (China)

Following TS dumps created on January 2018 in Changsha, Hunan Province, China. DTMB dump (722MHz) sands-722mhz.ts (50MB) This stream plays without problems…

6 years ago

Uncategorized

DVB-S/S2 blind scan with Joker TV

Hello everybody! This post describes satellite transponders (DVB-S/S2) blind scan with Joker TV’s universal USB DTV receiver. Firstly, I will…

6 years ago

homepage
Uncategorized

High bandwidth USB Isochronous transfers

Hello everybody, this post describes data transfer over USB from Joker TV to a host using high bandwidth USB isochronous transfers…

6 years ago

Speech recognition and Speech synthesis using neural networks

Speech recognition (speech-to-text)

Supply your own audio file

Performance

Speech synthesis (text-to-speech)

Conclusions

Related Post

Recent Posts

Floridian Unlimited – rollable solar panel for Electric Vehicles (EV)

Open Electric Car project

Joker TV hardware functional testing with OpenHTF

DTMB TS dumps (China)

DVB-S/S2 blind scan with Joker TV

High bandwidth USB Isochronous transfers