Hello All,
Today I want to show you how Joker can be used for speech recognition and speech synthesis using neural networks and Joker Empathy module.
I have brewed two docker containers for super simple usage. Just one command required to run neural network and obtain the results. This tutorial should work on any Linux and OSx . No GPU required, only CPU.
This funny video shows voice interaction with Joker:
This service based on Kaldi ASR project. Kaldi’s ‘chain’ models (type of DNN-HMM model) used. Actual trained model released by api.ai team. Model contains 127847 words. Compare this number with Oxford English Dictionary which contains 171,476 words or average English-speaking adult knows between 20,000 and 30,000 words. And need to say that this model shows 11.2% word error rate (WER). This is very good results ! “Old” speech recognition methods (GMM-HMM) can show only 21+% WER.
To run test just issue following command in console:
docker run -it aospan/stt
builtin file will be processed and output should contain following text:
/opt/in/in.wav HELLO THIS IS SPEECH TO TEXT RECOGNITION FOR JOKER PROJECT
that is what actually system recognized from audio file. Here is a audio file:
If you want to use your own audio file then run following command in console:
docker run -it -v `pwd`/in:/opt/in aospan/stt
input file format ‘wav, 16 bit, mono 16000 Hz’ and location is ‘in/in.wav’.
Joker can process speech-to-text in real time with 25% CPU usage. This more than enough for “real world” use-cases like voice control, voice assistance, text dictation, smart home and many more.
This service based on Merlin project. I have trained neural network on ‘cmu_us_bdl_arctic’ dataset (male voice) prepared by Carnegie Mellon University.
To run test just issue following command in console:
docker run -it -v `pwd`/out:/opt/out aospan/tts
resulting audio file location is out/tts.wav. Here is audio file:
default phrase ‘Hello, my name is Joker. Today is a great day because it’s my birthday’ was used. To supply your own phrase run following command:
docker run -it -v `pwd`/out:/opt/out aospan/tts "your phrase here"
Now we can build very user-friendly systems with natural voice control like Amazon Alexa or Google Home. But Joker does’t need online connectivity, all speech processing done locally. This improves privacy and security – no audio data shared with third party. And we can do voice control when no internet connection configured (for example, for fresh installations).
Please check Joker Walker module for use-case of voice control.
I use "Floridian Unlimited" prototype in this video. 400W and 1000W models are planned. For instance, 1000W model can gives…
Open eCar Open Source Software for Electric Vehicle (EV) Open eCar project is open source software for electric car. The…
Joker TV hardware functional testing with OpenHTF Functional Testing of PCBs is always used as a final manufacturing step. Functional…
Following TS dumps created on January 2018 in Changsha, Hunan Province, China. DTMB dump (722MHz) sands-722mhz.ts (50MB) This stream plays without problems…
Hello everybody! This post describes satellite transponders (DVB-S/S2) blind scan with Joker TV’s universal USB DTV receiver. Firstly, I will…
Hello everybody, this post describes data transfer over USB from Joker TV to a host using high bandwidth USB isochronous transfers…