Saturday, November 1, 2008

Continuous Speech Recognition for Linux

Hello people! Please help me out with this project!!! It'll benefit a lot of people.

Please go straight to 'Current Problem' near the end!

Objective
  • To create a FOSS Continuous Speech Recognition engine
  • Initially for Linux, ultimately crossplatform
  • A smartly designed GUI that finds the optimal balance between voice and hand input
This is a big project. Why am I pushing it? I have severe RSI. I _need_ speech recognition. And I _hate_ Windows. So I'm going to work towards creating it for Linux. I cannot do it on my own. My hands are fucked and besides I am a little stupid on Linux. Please muck in! We need more hands on board!

Why Vista Speech is shit
I hate the Vista speech recognition software. It makes me scream and yell. It was starting to damage my brain. The engine is great, but the interface is unbearable. Commands are intermingled with dictation. You dictate 'Fred was going to close the window'. And your window has gone! Designed by corporate fucking monkeys. I asked the Vista Speech team why it was so shit, and they said 'it's designed as a keyboard/mouse replacement. Not complement. Our specifications are _not_ to make an HCI optimally balanced between voice and hand.' Well this is my specification.

So, how to make continuous speech for Linux?
The FOSS community has CMU-Sphinx. It's the original speech engine. 20+ years ago, DARPA funded a department at CMU (Carnegie Mellon University) to do it. Vista Speech is accurate because they found and hired all the current Sphinx developers. Vista Speech is working off the Sphinx engine. Go to irc.freenode.net#cmusphinx and say hi to dhdfoo or nshm who seem to be the two most active maintainers right now. They're not always on, just hang awhile. I'm on as ohmu.

So we've got the engine... Why don't we have continuous speech?
Because it needs thousands of hours of training data. eg you need to record yourself saying 'Mary had a little lamb', feed it into the database together with the text. Do this for 1000 people at 10 hours each. Now Sphinx can chew on that data and make a decent engine. Nobody's done it. There's a current attempt at www.voxforge.com

The plan
  1. Use WINE to get Vista's Speech Engine operating in Linux
  2. Create a GUI that'll interface with this engine.
    The GUI will sporadically (unless the user disables the feature) send phrase-data to a central database (say VoxForge - I have contacted the maintainer and he is friendly)
  3. Once we have enough data, throw out the WINE-wrapped Vista Engine, and replace it with our own FOSS engine.
Where are we?
I have to contact the VoxForge maintainer again, and check that it's ok to pipe data to him. Shouldn't be a problem.

I've contacted RMS (Stallman), who says FSF will provide servers as long as they are not tarnished by non-free software. From the VoxForge guy: We may not need this.

I've contacted Nickolai (nshm on irc - Russian Sphinx guru) who is willing to adapt the Sphinx engine to accomplish stage 3.

Current Problem
Right now I need help with stage 1. Stage 1 is transferring Vista's speech recognition engine to Linux via wine. I have
  • Run Vista, examined the commandline behind the 'Speech Recognition' icon:
    %SystemRoot%\Speech\Common\sapisvr.exe -SpeechUX
    (I really just need the engine ported over, but if I can get this working in wine, that's engine + GUI - a good test the engine's ported ok)

  • Installed wine

  • Run winecfg, set global Applications -> 'Vista'

  • Copied relevant folders from VistaBox:
    /WINDOWS/Speech/* -> /(wine's C-Drive)/WINDOWS/Speech/*
    /WINDOWS/System32/Speech/* -> /(wine's C-Drive)/WINDOWS/System32/Speech/*

  • spud@spud-laptop:~/.wine/drive_c$ gedit SetPaths.bat
    ...and put in the following:
    set PATH=C:\WINDOWS

    set PATH=%PATH%;C:\WINDOWS\system32
    set PATH=%PATH%;C:\WINDOWS\system32\Speech\Engines\SR
    set PATH=%PATH%;C:\WINDOWS\system32\Speech\Engines\SR\en-US
    set PATH=%PATH%;C:\WINDOWS\system32\Speech\SpeechUX
    set PATH=%PATH%;C:\WINDOWS\system32\Speech\SpeechUX\en-gb
    set PATH=%PATH%;C:\WINDOWS\system32\Speech\SpeechUX\en-us

    set PATH=%PATH%;C:\WINDOWS\Speech\Common
    set PATH=%PATH%;C:\WINDOWS\Speech\Common\en-US
    set PATH=%PATH%;C:\WINDOWS\Speech\Engines\SR
    set PATH=%PATH%;C:\WINDOWS\Speech\Engines\SR\en-GB
    set PATH=%PATH%;C:\WINDOWS\Speech\Engines\SR\en-US
    set PATH=%PATH%;C:\WINDOWS\Speech\Engines\Lexicon\en-GB
    set PATH=%PATH%;C:\WINDOWS\Speech\Engines\Lexicon\en-US

    echo 'Paths set! Have a look!'
    PATH
  • Launch a DOS prompt
    spud@spud-laptop:~/.wine/drive_c$ wine cmd
    CMD Version 1.0

    C:\>SetPaths
    'Path set to:'
    PATH=C:\WINDOWS;C:\WINDOWS\system32;C:\WINDOWS\system32\Speech\Engines\SR;C:\WINDOWS\system32\Speech\Engines\SR\en-US;C:\WINDOWS\system32\Speech\SpeechUX;C:\WINDOWS\system32\Speech\SpeechUX\en-gb;C:\WINDOWS\system32\Speech\SpeechUX\en-us;C:\WINDOWS\Speech\Common;C:\WINDOWS\Speech\Common\en-US;C:\WINDOWS\Speech\Engines\SR;C:\WINDOWS\Speech\Engines\SR\en-GB;C:\WINDOWS\Speech\Engines\SR\en-US;C:\WINDOWS\Speech\Engines\Lexicon\en-GB;C:\WINDOWS\Speech\Engines\Lexicon\en-US

  • C:\>sapisvr -fish
    C:\>fixme:heap:HeapSetInformation (nil) 1 (nil) 0
    err:ole:CoUninitialize Mismatched CoUninitialize

    Good! Should fail - param is wrong.

  • C:\>sapisvr -SpeechUX
    C:\>fixme:heap:HeapSetInformation (nil) 1 (nil) 0
    err:ole:CoGetClassObject class {1b2afb92-0b5e-4a30-b5cc-353db4f9e150} not registered
    err:ole:CoGetClassObject class {1b2afb92-0b5e-4a30-b5cc-353db4f9e150} not registered
    err:ole:create_server class {1b2afb92-0b5e-4a30-b5cc-353db4f9e150} not registered
    fixme:ole:CoGetClassObject CLSCTX_REMOTE_SERVER not supported
    err:ole:CoGetClassObject no class object {1b2afb92-0b5e-4a30-b5cc-353db4f9e150} could be created for context 0x17

  • Googling {1b2afb92-0b5e-4a30-b5cc-353db4f9e150} gives
    SpSapiServer Class
    C:\Program Files\Common Files\Microsoft Shared\Speech\sapi.dll
    thx to http://www.myplugins.info/guids/guid.php?guid=1B
    There's no \Speech in \Microsoft Shared\, but there's a sapi.dll in the files I copied:
    c:/windows/system32/Speech/Common/sapi.dll
    so I guess I should register it.
    spud@spud-laptop:~/.wine/drive_c$ wine regsvr32 c:/windows/system32/Speech/Common/sapi.dll
    fixme:advapi:RegisterTraceGuidsW 0x34b0f75c 0x34b952d8 0x34ae1c2c 1 0x32f934 (null) (null) 0x34b952e0
    Successfully registered DLL c:/windows/system32/Speech/Common/sapi.dll
    Now try again:
    spud@spud-laptop:~/.wine/drive_c$ wine ./windows/Speech/Common/sapisvr -SpeechUX
    fixme:heap:HeapSetInformation (nil) 1 (nil) 0
    fixme:advapi:RegisterTraceGuidsW 0x34b0f75c 0x34b952d8 0x34ae1c2c 1 0x32f4e8 (null) (null) 0x34b952e0
    err:ntdll:NtQueryInformationToken Unhandled Token Information class 26!
    fixme:ole:CoCreateInstance no instance created for interface {31e99ed0-6ad8-431b-ae3c-652d9e8c7832} of class {1b2afb92-0b5e-4a30-b5cc-353db4f9e150}, hres is 0x80070001
  • Seems to be looking better. No idea how to proceed from here tho! I tried registering all the dlls I imported thus, but have got tangled and I'm not sure it's even the way to go. 3 succeed. Several Here's the sticking point:
    spud@spud-laptop:~/.wine/drive_c$ wine cmd
    CMD Version 1.0

    C:\>regsvr32 c:/windows/system32/Speech/Common/sapi.dll
    fixme:advapi:RegisterTraceGuidsW 0x34b0f75c 0x34b952d8 0x34ae1c2c 1 0x33f934 (null) (null) 0x34b952e0
    Successfully registered DLL c:/windows/system32/Speech/Common/sapi.dll

    C:\>regsvr32 c:/windows/system32/Speech/SpeechUX/speechuxcpl.dll
    err:module:import_dll Library msvcrt.dll (which is needed by L"C:\\windows\\system32\\Speech\\SpeechUX\\speechuxcpl.dll") not found
    err:module:import_dll Library msvcrt.dll (which is needed by L"C:\\windows\\system32\\DUser.dll") not found
    err:module:import_dll Library DUser.dll (which is needed by L"C:\\windows\\system32\\Speech\\SpeechUX\\speechuxcpl.dll") not found
    Failed to load DLL c:/windows/system32/Speech/SpeechUX/speechuxcpl.dll

    C:\>regsvr32 c:/windows/system32/Speech/SpeechUX/SpeechUXPS.dll
    err:module:import_dll Library msvcrt.dll (which is needed by L"C:\\windows\\system32\\Speech\\SpeechUX\\SpeechUXPS.dll") not found
    Failed to load DLL c:/windows/system32/Speech/SpeechUX/SpeechUXPS.dll

    C:\>regsvr32 c:/windows/Speech/Common/sqmapi.dll
    wine: Call from 0x70d22a1e to unimplemented function KERNEL32.dll.InitializeCriticalSectionEx, aborting
    fixme:ntdll:RtlNtStatusToDosErrorNoTeb no mapping for 80000100
    Failed to load DLL c:/windows/Speech/Common/sqmapi.dll

    C:\>regsvr32 c:/windows/Speech/Common/DUser.dll
    wine: Call from 0x70d22a1e to unimplemented function KERNEL32.dll.InitializeCriticalSectionEx, aborting
    fixme:ntdll:RtlNtStatusToDosErrorNoTeb no mapping for 80000100
    Failed to load DLL c:/windows/Speech/Common/DUser.dll

    C:\>regsvr32 c:/windows/system32/Speech/SpeechUX/en-gb/SpeechUXres.dll
    DllRegisterServer not implemented in DLL c:/windows/system32/Speech/SpeechUX/en-gb/SpeechUXres.dll

    C:\>regsvr32 c:/windows/system32/Speech/SpeechUX/en-us/SpeechUXres.dll
    DllRegisterServer not implemented in DLL c:/windows/system32/Speech/SpeechUX/en-us/SpeechUXres.dll

    C:\>regsvr32 c:/windows/system32/Speech/Engines/SR/spsreng.dll
    fixme:heap:HeapSetInformation 0x560000 1 (nil) 0
    Failed to register DLL c:/windows/system32/Speech/Engines/SR/spsreng.dll

    C:\>regsvr32 c:/windows/system32/Speech/Engines/SR/spsrx.dll
    fixme:heap:HeapSetInformation 0x560000 1 (nil) 0
    Failed to register DLL c:/windows/system32/Speech/Engines/SR/spsrx.dll

    C:\>regsvr32 c:/windows/system32/Speech/Engines/SR/srloc.dll
    fixme:heap:HeapSetInformation 0x560000 1 (nil) 0
    Failed to register DLL c:/windows/system32/Speech/Engines/SR/srloc.dll

    C:\>regsvr32 c:/windows/system32/Speech/SpeechUX/SpeechUX.dll
    fixme:advapi:RegisterTraceGuidsW 0x6cd15f38 0x6cd20180 0x6cd019f4 1 0x32f8d0 (null) (null) 0x6cd20188
    fixme:advapi:RegisterTraceGuidsA 0x6ec16eb9 0x6ec265e8 0x6ec026b0 1 0x32f8cc (null) (null) 0x6ec265f0
    fixme:advapi:RegisterTraceGuidsA 0x6ec16eb9 0x6ec26608 0x6ec026c0 1 0x32f8cc (null) (null) 0x6ec26610
    wine: Call from 0x4b4775ab to unimplemented function USER32.dll.ChangeWindowMessageFilter, aborting
    wine: Call from 0x4b46d706 to unimplemented function msvcrt.dll._except_handler4_common, aborting
    :
    (this line about 700 times)
    :
    wine: Call from 0x4b46d706 to unimplemented function msvcrt.dll._except_handler4_common, aborting
    wine: Call from 0x4b46d706 to unimplemented function msvcrt.dll._except_handler4_common, aborting
    err:seh:setup_exception_record stack overflow 1968 bytes in thread 0027 eip b7d271e3 esp 00230b80 stack 0x230000-0x231000-0x330000
    This is becoming a hydra. First time round it complained about sqmapi.dll and DUser.dll not being present. So I've copied them across from Vista's /system32. I've placed a copy in wine/s /system32 as well as in the folder sapisvr resides. Yet on registering speechuxcpl.dll it's still complaining msvcrt.dll and DUser.dll cannot be found. msvcrt.dll is there! And I have copied DUser.dllthere! wtf?

    As for SpeechUX.dll - I reckon I really need to get this registered, as the command line I'm trying to execute is 'sapisvr -SpeechUX'. It is complaining about msvcrt.dll. So I'm copying a native one over from Vista's /system32 into the same folder as sapisvr.exe. I go to winecfg global app settings and add it as a native dll.

    OK just realized I have to exit and reenter the DosShell to effect settings from winecfg. Here's the new output.
    C:\> regsvr32 c:/windows/system32/Speech/SpeechUX/SpeechUX.dll

    err:module:import_dll Library msvcrt.dll (which is needed by L"C:\\windows\\system32\\Speech\\SpeechUX\\SpeechUX.dll") not found
    err:module:import_dll Library msvcrt.dll (which is needed by L"C:\\windows\\system32\\sqmapi.dll") not found
    err:module:import_dll Library sqmapi.dll (which is needed by L"C:\\windows\\system32\\Speech\\SpeechUX\\SpeechUX.dll") not found
    err:module:import_dll Library msvcrt.dll (which is needed by L"C:\\windows\\system32\\DUser.dll") not found
    err:module:import_dll Library DUser.dll (which is needed by L"C:\\windows\\system32\\Speech\\SpeechUX\\SpeechUX.dll") not found
    Failed to load DLL c:/windows/system32/Speech/SpeechUX/SpeechUX.dll

    Maybe permissions is the problem?
    spud@spud-laptop:~/.wine/drive_c/windows/system32$ ls -l sqmapi.dll
    -rw-r--r-- 1 spud spud 134144 2008-11-02 22:06 sqmapi.dll
    spud@spud-laptop:~/.wine/drive_c/windows/system32$ chmod a+r sqmapi.dll
    I fixed this with all other copied files. No luck. Same readout. Stuck.

  • OK today I found out my wine is 1.0. So I upgraded. Now on 1.1.7 : Slightly different error:
    1. spud@spud-laptop:~/.wine/drive_c/windows/system32/Speech/SpeechUX$ wine regsvr32 ./SpeechUX.dll
    2. wine: Call from 0x70d22a1e to unimplemented function KERNEL32.dll.InitializeCriticalSectionEx, aborting
    3. fixme:ntdll:RtlNtStatusToDosErrorNoTeb no mapping for 80000100
    4. Failed to load DLL ./SpeechUX.dll
    5. Then I upgraded wine to 1.1.7 and tried again
    6. spud@spud-laptop:~/.wine/drive_c/windows/system32/Speech/SpeechUX$ wine regsvr32 ./SpeechUX.dll
    7. fixme:ntdll:RtlInitializeCriticalSectionEx (0x70dae0d0,4000,0x04000000) semi-stub
    8. fixme:ntdll:RtlInitializeCriticalSectionEx (0x70dae1c0,4000,0x04000000) semi-stub
    9. fixme:ntdll:RtlInitializeCriticalSectionEx (0x70dae2f0,4000,0x04000000) semi-stub
    10. fixme:ntdll:RtlInitializeCriticalSectionEx (0x70dae498,4000,0x04000000) semi-stub
    11. fixme:ntdll:RtlInitializeCriticalSectionEx (0x70dae2c8,4000,0x04000000) semi-stub
    12. fixme:ntdll:RtlInitializeCriticalSectionEx (0x70dae1d8,4000,0x04000000) semi-stub
    13. fixme:ntdll:RtlInitializeCriticalSectionEx (0x70dae1f0,4000,0x04000000) semi-stub
    14. fixme:ntdll:RtlInitializeCriticalSectionEx (0x70dae130,4000,0x04000000) semi-stub
    15. fixme:ntdll:RtlInitializeCriticalSectionEx (0x70dae288,4000,0x04000000) semi-stub
    16. fixme:ntdll:RtlInitializeCriticalSectionEx (0x70dae070,4000,0x04000000) semi-stub
    17. fixme:ntdll:RtlInitializeCriticalSectionEx (0x70dae1a0,4000,0x04000000) semi-stub
    18. fixme:ntdll:RtlInitializeCriticalSectionEx (0x70dae398,4000,0x04000000) semi-stub
    19. fixme:ntdll:RtlInitializeCriticalSectionEx (0x70dae148,4000,0x04000000) semi-stub
    20. fixme:ntdll:RtlInitializeCriticalSectionEx (0x70dae3b0,4000,0x04000000) semi-stub
    21. fixme:advapi:RegisterTraceGuidsW 0x6cd15f38 0x6cd20180 0x6cd019f4 1 0x32f8d0 (null) (null) 0x6cd20188
    22. fixme:advapi:RegisterTraceGuidsA 0x6ec16eb9 0x6ec265e8 0x6ec026b0 1 0x32f8cc (null) (null) 0x6ec265f0
    23. fixme:advapi:RegisterTraceGuidsA 0x6ec16eb9 0x6ec26608 0x6ec026c0 1 0x32f8cc (null) (null) 0x6ec26610
    24. wine: Call from 0x4b4775ab to unimplemented function USER32.dll.ChangeWindowMessageFilter, aborting
    25. fixme:ntdll:RtlNtStatusToDosErrorNoTeb no mapping for 80000100
    26. Failed to load DLL ./SpeechUX.dll

    Looked up ChangeWindowMessageFilter on MSDN. No way am I gona be able to implement that. This is the realm of upper-echelon wine core devs. Dayum.
Can anyone help me get to the next level?

Sam

PS If you're interested in taking over this project, joining in or helping, many thanks! Please find me in #cmusphinx on freenode. Or email me sunfish7@gmail.com

Peace out

Sam

No comments: