README.md

   1 Winthorpe - Let Your Applications Listen... And Talk
   2 ====================================================
   3
   4 What is Winthorpe ?
   5 -------------------
   6
   7 Winthorpe is a platform service for speech recognition and synthesis.
   8 Its main goal is to provide a framework for speech/voice enabling
   9 applications. It aims to provide an easy to use but versatile API to
  10 liberate developers from the low-level implementation details of speech
  11 recognition and synthesis, allowing them to instead focus their full
  12 attention on how to improve the usability of their applications by
  13 utilizing these new interaction mechanisms with the end user.
  14
  15 Winthorpe does not contain a recognition or synthesis engine as such.
  16 We stand on the shoulders of giants and rely for these tasks on the
  17 excellent work other folks have been and keep doing in these areas.
  18 Winthorpe provides the necessary mechanisms that allows existing engines
  19 to be plugged in as recognition and synthesis backends. Winthorpe already
  20 contains plugins for using CMU pocketsphinx for recognition and espeak
  21 and/or festival for synthesis.
  22
  23 Winthorpe can also utilize Murphy, the scriptable resource policy
  24 framework, to arbitrate between and bring context-awareness to speech-
  25 enabled applications. If configured so, Winthorpe will let Murphy make
  26 decisions about which applications and when are allowed to actively use
  27 the speech services. The Murphy policy configuration can then be used to
  28 resolve conflicts between applications that are otherwise completely
  29 unaware of each other. Moreover the policies can be used to dynamically
  30 enable and disable speech services, both globally or just for a subset of
  31 applications, adapting the speech subsystem intelligently to the context
  32 and overall state of the system.
  33
  34 Our idealistic long-term goal for Winthorpe is to build step-by-step the
  35 framework and using the framework a context-aware voice enabled personal
  36 assistant which allows relatively straightforward integration of new
  37 applications.
  38
  39 We realize that these goals are ambitious, far from being straightforward
  40 and we have only taken the first few tiptoeing baby steps experimenting
  41 with our ideas to achieve some of them. If you are interested in speech
  42 recognition, synthesis, or speech-enabling your application and would like
  43 to help us, please don't hesitate to contact us.
  44
  45
  46 Getting Winthorpe Up And Running
  47 --------------------------------
  48
  49 Winthorpe itself is hosted on github at http://github.com/01org/winthorpe.
  50 You can clone Winthorpe using git with the following command:
  51
  52     git clone git@github.com:01org/winthorpe.git
  53
  54 Additionally you will need the following prerequisites to get Winthorpe
  55 up and running with a reasonable set of plugins:
  56
  57 Murphy
  58
  59 Winthorpe reuses parts of Murphy for a large part of its basic infra. Thus
  60 you will need to get Murphy to compile Winthorpe. Murphy is hosted on
  61 github at http://github.com/01org/murphy. You can clone it with
  62
  63     git clone git@github.com:01org/murphy.git
  64
  65 See the (arguably sparse) Murphy documentation on how to compile it.
  66
  67 PulseAudio
  68
  69 The existing Winthorpe recognizer and synthesizer backends use PulseAudio
  70 for recording and rendering audio. Your distribution should provide packages
  71 both for the daemon and the necessary client libraries.
  72
  73 For instance for Fedora you'd need pulseaudio, pulseaudio-libs and
  74 pulseaudio-libs-devel (along with their dependencies).
  75
  76 CMU Pocketsphinx
  77
  78 Winthorpe provides a plugin that uses CMU Pocketsphinx as a speech
  79 recognition backend. Most desktop linux distributions provide packages
  80 for sphinxbase, pocketsphinx and some language models and dictionaries.
  81 Winthorpe should work both with version 0.7 and 0.8 of pocketsphinx.
  82
  83 For instance for Fedora you'd need sphinxbase-libs, sphinxbase-devel,
  84 pocketsphinx, pocketsphinx-devel, and pocketsphinx-models (along with
  85 their dependencies).
  86
  87 Espeak/flite
  88
  89 Winthorpe has a synthesizer backend based on espeak. If you want to play
  90 around with synthesizing, you will either need this or festival. Espeak
  91 usually provides voices for more languages than festival and the espeak
  92 backend is also a bit more versatile than the festival based one (which
  93 does not support voice pitch or rate control). Most desktop linux
  94 distributions provide packages for espeak.
  95
  96 For instance for Fedora you'd need espeak and espeak-devel (along with
  97 their dependencies).
  98
  99 Festival
 100
 101 Winthorpe has a synthesizer plugin that uses festival as the backend.
 102 If you want to play around with synthesizing, you might want to install
 103 this. You can load several synthesizer backends simulatanously to
 104 Winthorpe. They usually have different level of support for different
 105 languages, so if you want support for as many languages as possible,
 106 you should enable and load as many of them as you can (IOW, both as of
 107 now). Most desktop distributions provide packages for festival.
 108
 109 For instance on Fedora you'd need festival, festival-lib, and
 110 festival-devel (along with their dependencies).
 111
 112 libdbus
 113
 114 If you plan to use the D-Bus client API, you'll need libdbus. For
 115 instance on Fedora you'd need dbus-devel (along with its dependencies).
 116 If you choose to enable D-Bus support, don't forget to enable it also
 117 in Murphy.
 118
 119 GLib
 120
 121 Currently you need to install GLib (glib-2.0 and gobject-2.0) to compile
 122 Withorpe, although none of Winthorpes core, or core plugins use GLib in
 123 any way. The only plugin utilizing glib (via gdbus) is the demo Tizen WRT
 124 media player client plugin. The current build system checks for glib and
 125 gobject regardless of whether this plugin is enabled or not. This will be
 126 fixed in the future. However, since many of Winthropes the dependencies
 127 pull in GLib on desktop distros anyway this should not be an insurmountable
 128 problem for now...
 129
 130 For instance on Fedora you'd need glib2-devel (along with its dependencies).
 131
 132 Systemd
 133
 134 Additionally if you feel adventorous and plan to install Winthorpe and
 135 use systemd's socket-based activation as the mechanism to start Winthorpe
 136 you will need libsystemd-daemon from systemd.
 137
 138 For instance on Fedora you'd need systemd-devel (along with its dependencies).
 139
 140
 141 Configuring And Compiling
 142
 143 Once you have all the prerequisites installed you should compile Winthorpe.
 144 If you have installed all of the above, you can do this by running the
 145 following sequence of commands in the top Winthorpe directory:
 146
 147     ./bootstrap --prefix=/usr --sysconfdir=/etc --enable-gpl --enable-dbus \
 148                 --enable-sphinx --enable-espeak --enable-festival \
 149                 --enable-systemd
 150
 151     make
 152
 153 If everything goes well, you should end up with the Winthorpe daemon and a
 154 set of Winthorpe plugins successfully compiled. You can start up and test
 155 the daemon without installing it with the following command:
 156
 157     ./src/srs-daemon -P `pwd`/src/.libs -c speech-recognition.conf -f -vvv \
 158         -s sphinx.pulsesrc=alsa_input.pci-0000_00_1b.0.analog-stereo
 159
 160 Just replace the value sphinx.pulsesrc is set to with the name of the
 161 PulseAudio source corresponding to the mike you want to use. That's it,
 162 now you should have the daemon up and running. You can try synthesis
 163 for instance with the native client like this:
 164
 165     ./src/srs-native-client
 166     Using pa_manloop...
 167     disconnected>
 168     Connection to server established.
 169     connected> list voices
 170     ...
 171     connected> render tts "Is this able to speak now ?" -events
 172     0.000000 % (0 msec) of TTS #3 rendered...
 173     13.874767 % (249 msec) of TTS #3 rendered...
 174     19.852016 % (357 msec) of TTS #3 rendered...
 175     ...
 176     100.000000 % (1801 msec) of TTS #3 rendered...
 177     Rendering of TTS #3 completed.
 178     connected> render tts "Is this able to speak now ?" -events -voice:english-british-male-1
 179     Rendering of TTS #4 started...
 180     0.000000 % (0 msec) of TTS #4 rendered...
 181     13.870927 % (249 msec) of TTS #4 rendered...
 182     ...
 183     98.981004 % (1783 msec) of TTS #4 rendered...
 184     100.000000 % (1802 msec) of TTS #4 rendered...
 185     Rendering of TTS #4 completed.
 186     connected>