From: Jimmy Huang Date: Fri, 31 Aug 2012 21:12:33 +0000 (-0700) Subject: Initial import to Gerrit. X-Git-Tag: submit/trunk/20120831.211401^0 X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=70f92eeaa16426f888e59e8273ed580ebe849d38;p=profile%2Fivi%2Ffestival.git Initial import to Gerrit. Signed-off-by: Jimmy Huang --- 70f92eeaa16426f888e59e8273ed580ebe849d38 diff --git a/ACKNOWLEDGMENTS b/ACKNOWLEDGMENTS new file mode 100644 index 0000000..b6f320a --- /dev/null +++ b/ACKNOWLEDGMENTS @@ -0,0 +1,83 @@ +Festival is currently actively developed by: + + Alan W Black (Carnegie Mellon University) + Rob Clark (Edinburgh University) + Junichi Yamagishi (Edinburgh University) + Keiichiro Oura (Nagoya Institute of Technology) + +The following people and organisations have contributed to the +development of Festival in various ways. It is their work that makes +it all possible. + +Alan W Black Overall design, most of the front end and software control +Paul Taylor Overall design, most of the back end +Richard Caley for doing lots of difficult and boring bits +Rob Clark Intonation, multisyn voice building, general developement and + maintenance. +Keiichiro Oura Updated HTS engine and API +Junichi Yamagishi + HTS voices +Korin Richmond Multisyn engine, swig wrappers and general developement. +Heiga Zen HTS engine +Brian Foley Mac OSX support +Kevin Lenzo for speaking a bunch of different nonsense words, + design and improvements to the clunits module, + and co-author of the whole festvox project +Alistair Conkie various low level code points and some design work + Spanish synthesis, recording Roger +Steve Isard design of diphone schema, LPC diphone code, and + directorship +EPSRC who funded awb and pault +Carnegie Mellon University + who fund awb +David Huggins Daines (Cepstral, LLC) + configure, and lots of Linux associated bugs +Sun Microsystems Laboratories + For believing in us and their generosity. +AT&T Research Labs + For providing funding and using our work +Paradigm Assoc. and George Carrett + For Scheme In One Defun +CNET, France Telecom + for use of Donovan diphones and some code in + modules/donovan (used with permission) +The beta testers + Thanks for wanting to use the system, you make it + worth doing. (And thanks for helping me debug my code.) + You all responded to my requests fast and accurately + thanks, even when I dumped last minute changes on you +Andy Donovan for speaking a bunch of nonsense words +Roger Burroughes for speaking another bunch of nonsense words +Kurt Dusterhoff for speaking another bunch of nonsense words +Amy Isard for her SSML project and related synthesizer +Mike Macon for signal processing advice +Richard Tobin for answering all those difficult questions, + and the socket code, and rxp the XML parser +Simmule Turner and Rich Salz + command line editor: editline +Borja Etxebarria + For Spanish synthesis and answer signal processing + questions +Briony Williams Welsh synthesis +Jacques H. de Villiers + from CSLU at OGI, for the TCL interface. +ATR and Nick Campbell + for first allowing Paul and Alan to work together +Oxford Text Archive + For the computer users version of Oxford Advanced + Learners' Dictionary redistributed with permission +Reading University + for access to MARSEC from which the phrase break + model was trained. +Mari Ostendorf For giving access to the FM Radio Corpus from which + some models were trained. +LDC & Penn Tree Bank + from which the POS ragger was trained, redistribution + of the models is with permission from the LDC. +Grady Ward for the MOBY pronunciation lexicon +FSF for G++, make, .... + +and others too. + + + diff --git a/COPYING b/COPYING new file mode 100644 index 0000000..82dc313 --- /dev/null +++ b/COPYING @@ -0,0 +1,103 @@ +The system as a whole and most of the files in it are distributed +under the following copyright and conditions + + The Festival Speech Synthesis System + Centre for Speech Technology Research + University of Edinburgh, UK + Copyright (c) 1996-2004 + All Rights Reserved. + + Permission is hereby granted, free of charge, to use and distribute + this software and its documentation without restriction, including + without limitation the rights to use, copy, modify, merge, publish, + distribute, sublicense, and/or sell copies of this work, and to + permit persons to whom this work is furnished to do so, subject to + the following conditions: + 1. The code must retain the above copyright notice, this list of + conditions and the following disclaimer. + 2. Any modifications must be clearly marked as such. + 3. Original authors' names are not deleted. + 4. The authors' names are not used to endorse or promote products + derived from this software without specific prior written + permission. + + THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK + DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING + ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT + SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE + FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN + AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, + ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF + THIS SOFTWARE. + +Some further comments: + +Every effort has been made to ensure that Festival does not contain +any violation of intellectual property rights through disclosure of +trade secrets, copyright or patent violation. Considerable time and +effort has been spent to ensure that this is the case. However, +especially with patent problems, it is not always within our control +to know what has or has not been restricted. If you do suspect that +some part of Festival cannot be legally distributed please inform us +so that an alternative may be sought. Festival is only useful if it +is truly free to distribute. + +As of 1.4.0 the core distribution (and speech tools) is free. Unlike +previous versions which had a commercial restriction. You are free to +incorporate Festival in commercial (and of course non-commercial +systems), without any further communication or licence from us. +However if you are seriously using Festival within a commercial +application we would like to know, both so we know we are contributing +and so we can keep you informed of future developments. Also if you +require maintenance, support or wish us to provide consultancy feel +free to contact us. + +The voices however aren't all free. At present the US voices, kal and +ked are free. Our British voices are free themselves but they use OALD +which is restricted for non-commercial use. Our Spanish voice is also +so restricted. + +Note other modules that festival supports e.g MBROLA and OGI +extensions, may have different licencing please take care when using +the system to understand what you are actually using. + +-------------------------------------------------- + +A number of individual files in the system fall under a different +copyright from the above. All however are termed "free software" +but most people. + +./src/arch/festival/tcl.c + * Copyright (C)1997 Jacques H. de Villiers + * Copyright (C)1997 Center for Spoken Language Understanding, + * Oregon Graduate Institute of Science & Technology + See conditions in file. This is the standard TCL licence and hence + shouldn't cause problems from most people. + +./examples/festival_client.pl +# Copyright (C) 1997 +# Kevin A. Lenzo (lenzo@cs.cmu.edu) 7/97 + See condition in file + +./src/modules/clunits/* +./lib/*clunits* + Joint copyright University of Edinburgh and Carnegie Mellon University + Conditions remain as free software like the rest of distribution + +./src/modules/hts_engine/* +./lib/hts.scm + The HMM-based speech synthesis system (HTS) + hts_engine API version 1.04 (http://hts-engine.sourceforge.net/) + Copyright (C) 2001-2010 Nagoya Institute of Technology + 2001-2008 Tokyo Institute of Technology + All rights reserved. + distributed under a New and Simplified BSD licence. + +./lib/festival.el +;;; Copyright (C) Alan W Black 1996 +copyright under FSF General Public Licence + +Please also read the COPYING section of speech_tools/README for the +conditions on those files. + diff --git a/INSTALL b/INSTALL new file mode 100644 index 0000000..862a2b7 --- /dev/null +++ b/INSTALL @@ -0,0 +1,454 @@ +Installation +************ + + This section describes how to install Festival from source in a new +location and customize that installation. + +* Menu: + +* Requirements:: Software/Hardware requirements for Festival +* Configuration:: Setting up compilation +* Site initialization:: Settings for your particular site +* Checking an installation:: But does it work ... + +Requirements +============ + + In order to compile Festival you first need the following source +packages + +`festival-2.0-release.tar.gz' + Festival Speech Synthesis System source + +`speech_tools-1.2.4-release.tar.gz' + The Edinburgh Speech Tools Library + +`festlex_NAME.tar.gz' + The lexicon distribution, where possible, includes the lexicon + input file as well as the compiled form, for your convenience. + The lexicons have varying distribution policies, but are all free + except OALD, which is only free for non-commercial use (we are + working on a free replacement). In some cases only a pointer to + an ftp'able file plus a program to convert that file to the + Festival format is included. + +`festvox_NAME.tar.gz' + You'll need a speech database. A number are available (with + varying distribution policies). Each voice may have other + dependencies such as requiring particular lexicons + +`festdoc_2.0.tar.gz' + Full postscript, info and html documentation for Festival and the + Speech Tools. The source of the documentation is available in the + standard distributions but for your conveniences it has been + pre-generated. + + In addition to Festival specific sources you will also need + +_A UNIX machine_ + Currently we have compiled and tested the system under Solaris + (2.5(.1), 2.6, 2.7, 2.8 and 2.9), SunOS (4.1.3), FreeBSD 3.x, 4.x Linux + (Redhat 4.1, 5.0, 5.1, 5.2, 6.[012], 7.[01], 8.0, 9, FC1 and other Linux + distributions), and it should work under OSF (Dec Alphas) SGI + (Irix), HPs (HPUX). But any standard UNIX machine should be + acceptable. We have now successfully ported this version to + Windows XP, Windows NT and Windows 95 (using the Cygnus GNU win32 + environment). This is still a young port but seems to work. + +_A C++ compiler_ + Note that C++ is not very portable even between different versions + of the compiler from the same vendor. Although we've tried very + hard to make the system portable, we know it is very unlikely to + compile without change except with compilers that have already + been tested. The currently tested systems are + * Sun Sparc Solaris 2.5, 2.5.1, 2.6, 2.7, 2.9: GCC 2.95.1, GCC + 3.2 + + * FreeBSD for Intel 3.x and 4.x GCC 2.95.1, GCC 3.0 + + * Linux for Intel (RedHat 4.1/5.0/5.1/5.2/6.0/7.x/8.0): GCC + 2.7.2, GCC 2.7.2/egcs-1.0.2, egcs 1.1.1, egcs-1.1.2, GCC + 2.95.[123], GCC "2.96", GCC 3.0, GCC 3.0.1, GCC 3.2, GCC 3.2.1 + GCC 3.2.3, GCC 3.3.2 + + * Windows NT 4.0: GCC 2.7.2 plus egcs (from Cygnus GNU win32 + b19), Visual C++ PRO v5.0, Visual C++ v6.0 + Note if GCC works on one version of Unix it usually works on + others. + + We have compiled both the speech tools and Festival under Windows + NT 4.0 and Windows 95 using the GNU tools available from Cygnus. + http://www.cygwin.com/ + +_GNU make_ + Due to there being too many different `make' programs out there we + have tested the system using GNU make on all systems we use. + Others may work but we know GNU make does. + +_Audio hardware_ + You can use Festival without audio output hardware but it doesn't + sound very good (though admittedly you can hear less problems with + it). A number of audio systems are supported (directly inherited + from the audio support in the Edinburgh Speech Tools Library): + NCD's NAS (formerly called netaudio) a network transparent audio + system (which can be found at + `ftp://ftp.x.org/contrib/audio/nas/'); `/dev/audio' (at 8k ulaw + and 8/16bit linear), found on Suns, Linux machines and FreeBSD; + and a method allowing arbitrary UNIX commands. *Note Audio + output::. + + Earlier versions of Festival mistakenly offered a command line editor +interface to the GNU package readline, but due to conflicts with the GNU +Public Licence and Festival's licence this interface was removed in +version 1.3.1. Even Festival's new free licence would cause problems as +readline support would restrict Festival linking with non-free code. A +new command line interface based on editline was provided that offers +similar functionality. Editline remains a compilation option as it is +probably not yet as portable as we would like it to be. + + In addition to the above, in order to process the documentation you +will need `TeX', `dvips' (or similar), GNU's `makeinfo' (part of the +texinfo package) and `texi2html' which is available from +`http://wwwcn.cern.ch/dci/texi2html/'. + + However the document files are also available pre-processed into, +postscript, DVI, info and html as part of the distribution in +`festdoc-1.4.X.tar.gz'. + + Ensure you have a fully installed and working version of your C++ +compiler. Most of the problems people have had in installing Festival +have been due to incomplete or bad compiler installation. It might be +worth checking if the following program works if you don't know if +anyone has used your C++ installation before. + #include + int main (int argc, char **argv) + { + cout << "Hello world\n"; + } + + Unpack all the source files in a new directory. The directory will +then contain two subdirectories + speech_tools/ + festival/ + +Configuration +============= + + First ensure you have a compiled version of the Edinburgh Speech +Tools Library. See `speech_tools/INSTALL' for instructions. + + The system now supports the standard GNU `configure' method for set +up. In most cases this will automatically configure festival for your +particular system. In most cases you need only type + gmake + and the system will configure itself and conpile, (note you need to +have compiled the Edinburgh Speech Tools `speech_tools-1.2.4' first. + + In some case hand configure is require. All of the configuration +choise are held in the file `config/config' + + For the most part Festival configuration inherits the configuration +from your speech tools config file (`../speech_tools/config/config'). +Additional optional modules may be added by adding them to the end of +your config file e.g. + ALSO_INCLUDE += clunits + Adding and new module here will treat is as a new directory in the +`src/modules/' and compile it into the system in the same way the +`OTHER_DIRS' feature was used in previous versions. + + If the compilation directory being accessed by NFS or if you use an +automounter (e.g. amd) it is recommend to explicitly set the variable +`FESTIVAL_HOME' in `config/config'. The command `pwd' is not reliable +when a directory may have multiple names. + + There is a simple test suite with Festival but it requires the three +basic voices and their respective lexicons install before it will work. +Thus you need to install + festlex_CMU.tar.gz + festlex_OALD.tar.gz + festlex_POSLEX.tar.gz + festvox_don.tar.gz + festvox_kedlpc16k.tar.gz + festvox_rablpc16k.tar.gz + If these are installed you can test the installation with + gmake test + + To simply make it run with a male US English voiuce it is sufficient +to install just + festlex_CMU.tar.gz + festlex_POSLEX.tar.gz + festvox_kallpc16k.tar.gz + + Note that the single most common reason for problems in compilation +and linking found amongst the beta testers was a bad installation of GNU +C++. If you get many strange errors in G++ library header files or link +errors it is worth checking that your system has the compiler, header +files and runtime libraries properly installed. This may be checked by +compiling a simple program under C++ and also finding out if anyone at +your site has ever used the installation. Most of these installation +problems are caused by upgrading to a newer version of libg++ without +removing the older version so a mixed version of the `.h' files exist. + + Although we have tried very hard to ensure that Festival compiles +with no warnings this is not possible under some systems. + + Under SunOS the system include files do not declare a number of +system provided functions. This a bug in Sun's include files. This +will causes warnings like "implicit definition of fprintf". These are +harmless. + + Under Linux a warning at link time about reducing the size of some +symbols often is produced. This is harmless. There is often +occasional warnings about some socket system function having an +incorrect argument type, this is also harmless. + + The speech tools and festival compile under Windows95 or Windows NT +with Visual C++ v5.0 using the Microsoft `nmake' make program. We've +only done this with the Professonal edition, but have no reason to +believe that it relies on anything not in the standard edition. + + In accordance to VC++ conventions, object files are created with +extension .obj, executables with extension .exe and libraries with +extension .lib. This may mean that both unix and Win32 versions can be +built in the same directory tree, but I wouldn't rely on it. + + To do this you require nmake Makefiles for the system. These can be +generated from the gnumake Makefiles, using the command + gnumake VCMakefile + in the speech_tools and festival directories. I have only done this +under unix, it's possible it would work under the cygnus gnuwin32 +system. + + If `make.depend' files exist (i.e. if you have done `gnumake depend' +in unix) equivalent `vc_make.depend' files will be created, if not the +VCMakefiles will not contain dependency information for the `.cc' +files. The result will be that you can compile the system once, but +changes will not cause the correct things to be rebuilt. + + In order to compile from the DOS command line using Visual C++ you +need to have a collection of environment variables set. In Windows NT +there is an instalation option for Visual C++ which sets these +globally. Under Windows95 or if you don't ask for them to be set +globally under NT you need to run + vcvars32.bat + See the VC++ documentation for more details. + + Once you have the source trees with VCMakefiles somewhere visible +from Windows, you need to copy `peech_tools\config\vc_config-dist' to +`speech_tools\config\vc_config' and edit it to suit your local +situation. Then do the same with `festival\config\vc_config-dist'. + + The thing most likely to need changing is the definition of +`FESTIVAL_HOME' in `festival\config\vc_config_make_rules' which needs +to point to where you have put festival. + + Now you can compile. cd to the speech_tools directory and do + nmake /nologo /fVCMakefile +and the library, the programs in main and the test programs should be + compiled. + + The tests can't be run automatically under Windows. A simple test to +check that things are probably OK is: + main\na_play testsuite\data\ch_wave.wav +which reads and plays a waveform. + Next go into the festival directory and do + nmake /nologo /fVCMakefile +to build festival. When it's finished, and assuming you have the + voices and lexicons unpacked in the right place, festival should run +just as under unix. + + We should remind you that the NT/95 ports are still young and there +may yet be problems that we've not found yet. We only recommend the +use the speech tools and Festival under Windows if you have significant +experience in C++ under those platforms. + + Most of the modules `src/modules' are actually optional and the +system could be compiled without them. The basic set could be reduced +further if certain facilities are not desired. Particularly: `donovan' +which is only required if the donovan voice is used; `rxp' if no XML +parsing is required (e.g. Sable); and `parser' if no stochastic paring +is required (this parser isn't used for any of our currently released +voices). Actually even `UniSyn' and `UniSyn_diphone' could be removed +if some external waveform synthesizer is being used (e.g. MBROLA) or +some alternative one like `OGIresLPC'. Removing unused modules will +make the festival binary smaller and (potentially) start up faster but +don't expect too much. You can delete these by changing the +`BASE_DIRS' variable in `src/modules/Makefile'. + +Site initialization +=================== + + Once compiled Festival may be further customized for particular +sites. At start up time Festival loads the file `init.scm' from its +library directory. This file further loads other necessary files such +as phoneset descriptions, duration parameters, intonation parameters, +definitions of voices etc. It will also load the files `sitevars.scm' +and `siteinit.scm' if they exist. `sitevars.scm' is loaded after the +basic Scheme library functions are loaded but before any of the +festival related functions are loaded. This file is intended to set +various path names before various subsystems are loaded. Typically +variables such as `lexdir' (the directory where the lexicons are held), +and `voices_dir' (pointing to voice directories) should be reset here +if necessary. + + The default installation will try to find its lexicons and voices +automatically based on the value of `load-path' (this is derived from +`FESTIVAL_HOME' at compilation time or by using the `--libdir' at +run-time). If the voices and lexicons have been unpacked into +subdirectories of the library directory (the default) then no site +specific initialization of the above pathnames will be necessary. + + The second site specific file is `siteinit.scm'. Typical examples +of local initialization are as follows. The default audio output method +is NCD's NAS system if that is supported as that's what we use normally +in CSTR. If it is not supported, any hardware specific mode is the +default (e.g. sun16audio, freebas16audio, linux16audio or mplayeraudio). +But that default is just a setting in `init.scm'. If for example in +your environment you may wish the default audio output method to be 8k +mulaw through `/dev/audio' you should add the following line to your +`siteinit.scm' file + (Parameter.set 'Audio_Method 'sunaudio) + Note the use of `Parameter.set' rather than `Parameter.def' the +second function will not reset the value if it is already set. +Remember that you may use the audio methods `sun16audio'. +`linux16audio' or `freebsd16audio' only if `NATIVE_AUDIO' was selected +in `speech_tools/config/config' and your are on such machines. The +Festival variable `*modules*' contains a list of all supported +functions/modules in a particular installation including audio support. +Check the value of that variable if things aren't what you expect. + + If you are installing on a machine whose audio is not directly +supported by the speech tools library, an external command may be +executed to play a waveform. The following example is for an imaginary +machine that can play audio files through a program called `adplay' +with arguments for sample rate and file type. When playing waveforms, +Festival, by default, outputs as unheadered waveform in native byte +order. In this example you would set up the default audio playing +mechanism in `siteinit.scm' as follows + (Parameter.set 'Audio_Method 'Audio_Command) + (Parameter.set 'Audio_Command "adplay -raw -r $SR $FILE") + For `Audio_Command' method of playing waveforms Festival supports +two additional audio parameters. `Audio_Required_Rate' allows you to +use Festivals internal sample rate conversion function to any desired +rate. Note this may not be as good as playing the waveform at the +sample rate it is originally created in, but as some hardware devices +are restrictive in what sample rates they support, or have naive +resample functions this could be optimal. The second addition audio +parameter is `Audio_Required_Format' which can be used to specify the +desired output forms of the file. The default is unheadered raw, but +this may be any of the values supported by the speech tools (including +nist, esps, snd, riff, aiff, audlab, raw and, if you really want it, +ascii). + + For example suppose you run Festival on a remote machine and are not +running any network audio system and want Festival to copy files back to +your local machine and simply cat them to `/dev/audio'. The following +would do that (assuming permissions for rsh are allowed). + (Parameter.set 'Audio_Method 'Audio_Command) + ;; Make output file ulaw 8k (format ulaw implies 8k) + (Parameter.set 'Audio_Required_Format 'ulaw) + (Parameter.set 'Audio_Command + "userhost=`echo $DISPLAY | sed 's/:.*$//'`; rcp $FILE $userhost:$FILE; \ + rsh $userhost \"cat $FILE >/dev/audio\" ; rsh $userhost \"rm $FILE\"") + Note there are limits on how complex a command you want to put in the +`Audio_Command' string directly. It can get very confusing with respect +to quoting. It is therefore recommended that once you get past a +certain complexity consider writing a simple shell script and calling +it from the `Audio_Command' string. + + A second typical customization is setting the default speaker. +Speakers depend on many things but due to various licence (and resource) +restrictions you may only have some diphone/nphone databases available +in your installation. The function name that is the value of +`voice_default' is called immediately after `siteinit.scm' is loaded +offering the opportunity for you to change it. In the standard +distribution no change should be required. If you download all the +distributed voices `voice_rab_diphone' is the default voice. You may +change this for a site by adding the following to `siteinit.scm' or per +person by changing your `.festivalrc'. For example if you wish to +change the default voice to the American one `voice_ked_diphone' + (set! voice_default 'voice_ked_diphone) + Note the single quote, and note that unlike in early versions +`voice_default' is not a function you can call directly. + + A second level of customization is on a per user basis. After +loading `init.scm', which includes `sitevars.scm' and `siteinit.scm' +for local installation, Festival loads the file `.festivalrc' from the +user's home directory (if it exists). This file may contain arbitrary +Festival commands. + +Checking an installation +======================== + + Once compiled and site initialization is set up you should test to +see if Festival can speak or not. + + Start the system + $ bin/festival + Festival Speech Synthesis System 2.0:release July 2004 + Copyright (C) University of Edinburgh, 1996-2004. All rights reserved. + For details type `(festival_warranty)' + festival> ^D + If errors occur at this stage they are most likely to do with +pathname problems. If any error messages are printed about +non-existent files check that those pathnames point to where you +intended them to be. Most of the (default) pathnames are dependent on +the basic library path. Ensure that is correct. To find out what it +has been set to, start the system without loading the init files. + $ bin/festival -q + Festival Speech Synthesis System 1.4.3:release Jan 2003 + Copyright (C) University of Edinburgh, 1996-2003. All rights reserved. + For details type `(festival_warranty)' + festival> libdir + "/projects/festival/lib/" + festival> ^D + This should show the pathname you set in your `config/config'. + + If the system starts with no errors try to synthesize something + festival> (SayText "hello world") + Some files are only accessed at synthesis time so this may show up +other problem pathnames. If it talks, you're in business, if it +doesn't, here are some possible problems. + + If you get the error message + Can't access NAS server + You have selected NAS as the audio output but have no server running +on that machine or your `DISPLAY' or `AUDIOSERVER' environment variable +is not set properly for your output device. Either set these properly +or change the audio output device in `lib/siteinit.scm' as described +above. + + Ensure your audio device actually works the way you think it does. +On Suns, the audio output device can be switched into a number of +different output modes, speaker, jack, headphones. If this is set to +the wrong one you may not hear the output. Use one of Sun's tools to +change this (try `/usr/demo/SOUND/bin/soundtool'). Try to find an audio +file independent of Festival and get it to play on your audio. Once +you have done that ensure that the audio output method set in Festival +matches that. + + Once you have got it talking, test the audio spooling device. + festival> (intro) + This plays a short introduction of two sentences, spooling the audio +output. + + Finally exit from Festival (by end of file or `(quit)') and test the +script mode with. + $ examples/saytime + + A test suite is included with Festival but it makes certain +assumptions about which voices are installed. It assumes that +`voice_rab_diphone' (`festvox_rabxxxx.tar.gz') is the default voice and +that `voice_ked_diphone' and `voice_don_diphone' +(`festvox_kedxxxx.tar.gz' and `festvox_don.tar.gz') are installed. +Also local settings in your `festival/lib/siteinit.scm' may affect +these tests. However, after installation it may be worth trying + gnumake test + from the `festival/' directory. This will do various tests +including basic utterance tests and tokenization tests. It also checks +that voices are installed and that they don't interfere with each other. +These tests are primarily regression tests for the developers of +Festival, to ensure new enhancements don't mess up existing supported +features. They are not designed to test an installation is successful, +though if they run correctly it is most probable the installation has +worked. diff --git a/Makefile b/Makefile new file mode 100644 index 0000000..992877b --- /dev/null +++ b/Makefile @@ -0,0 +1,85 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996-2002 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## ## +## The Festival Speech Synthesis System ## +## ## +## Authors: Alan W Black, Paul Taylor, Richard Caley and others ## +## Date: January 2003 ## +## ## +########################################################################### +TOP=. +DIRNAME=. +BUILD_DIRS = src lib examples bin doc +ALL_DIRS=config $(BUILD_DIRS) testsuite +CONFIG=configure configure.in config.sub config.guess \ + missing install-sh mkinstalldirs +FILES = Makefile README ACKNOWLEDGMENTS NEWS COPYING INSTALL $(CONFIG) +VERSION=$(PROJECT_VERSION) + +LOCAL_CLEAN= Templates.DB + +ALL = .config_error $(BUILD_DIRS) + +# Try and say if config hasn't been created +config_dummy := $(shell test -f config/config || ( echo '*** '; echo '*** Making default config file ***'; echo '*** '; ./configure; ) >&2) + +# force a check on the system file +system_dummy := $(shell $(MAKE) -C $(TOP)/config -f make_system.mak TOP=.. system.mak) + +include $(TOP)/config/common_make_rules + +backup: time-stamp + @ $(RM) -f $(TOP)/FileList + @ $(MAKE) file-list + @ sed 's/^\.\///' .file-list-all + @ (cd ..; tar cvf - `cat festival/.file-list-all` festival/.time-stamp | gzip > festival/festival-$(VERSION)-$(PROJECT_STATE).tar.gz ) + @ $(RM) -f $(TOP)/.file-list-all + @ ls -l festival-$(VERSION)-$(PROJECT_STATE).tar.gz + +time-stamp : + @ echo festival $(VERSION) >.time-stamp +# Too many randomly different unices out there .... +# @ echo `id -u -n`@`hostname`.`domainname` >>.time-stamp + @ date >>.time-stamp + +test: + @ $(MAKE) --no-print-directory -C testsuite test + +config/config: config/config.in config.status + ./config.status + +configure: configure.in + autoconf + +include $(EST)/config/rules/top_level.mak +include $(EST)/config/rules/install.mak diff --git a/NEWS b/NEWS new file mode 100644 index 0000000..57eeb31 --- /dev/null +++ b/NEWS @@ -0,0 +1,219 @@ + +Note that not all features discussed in this file are included in +the standard distribution. + +HISTORY + +June 21st 2001 1.4.2 Release + Various new gcc's support + Visual C++ 6.0 support + uses configure (though could do so even more) + substantial updates to the clunits unit selection module + lots of wee bugs fixed + a few very hard bugs fixed + (client/server race condition) + (dropped bytes in reading files when machine overloaded) + (FreeBSD memory/gc problem) + default waverform type is now RIFF. + +Nov 21st 1999 1.4.1 Release + SSFF (for emulabel) track support + AIX support + Java fixes + various minor bug fixes + WFST with proper quoting + Wagon sample counts + gcc-2.95.1 support + +June 20th 1999 1.4.0 Release + becomes free software + +June 6th 1999 1.3.95 Beta + size/speed/memory leak overhaul (no memory leaks) + XML support for relation loading (for SOLE support) + JSAPI initial support + GalaxyCommunicator architecture interface + ked_mttilt_diphone voice built + Parser trained on MARSEC (prosodic) brackets rather than syntax + Unisyn_selection fully integrated + Unisyn_phonology fully integrated + viterbi cart/ngram/wfst base LTS prediction (did improve but BIG) + viterbi cart/ngram based accent prediction (didn't improve) + tilt working (again) + audioin (na_record) for many architectures + viterbi from Scheme (with cart, ngram, wfst models) + +January 26th 1999 1.3.1 Release + egcs-1.1.1 support + tobi_rules update (GM) + replace readline with editline (+ extensions) + Lots of little bug fixes + cluster code tidied up + kal voice + ked power normalization + updated lexicons with addenda for US and UK + New LTS models for US and UK English + "Building Voices in Festival" document + +August 24th 1998 1.3.0 Release + UniSun/groupfile optimizations + Java client support + Fixed ESPS so both track and wave output works + Retraining of most modules with new architecture (durations improved) + rxp, (Richard's XML parser) integrated and Sable XMLified + Fringe display program for labels and utterances + Metrical tree synthesis + A new utterance architecture (Relations and Items) + utterance save and load work properly now + Trainable LTS system + Lexicon cache system + Substantial optimization of front end (twice the speed) + UniSyn, new signal processing and generic waveform synthesis module + OLS code added + WFST support for kk rules, regular grammars etc, simple English morphology + +November 30th 1997 1.2.4 BETA + Tilt analysis and Tilt intonation modules added. + make_utts substantially improved (> 100 times faster) + text2wave script added + Pitch synchronous lpc analysis and support + rab consonant clusters labelled + New duration tree (wagon stepwise) much smaller if not better + SCFG grammar and parser (scfg_parse_text added as festival script) + change config stuff (again) + +October 1st 1997 1.2.1 RELEASE + preliminary support for Visual C++ + Use path-append rather than string-append (in buckets of places) + Minor bugs fixes throughout the code (end silences are now *always* + inserted in tts) + Linux socket bug fixed (get_url didn't work) + native irix audio support + +September 5th 1997 1.2.0 RELEASE + Proclaim modules and voices + automatic detection of voices + Phonset, lexicon, ltsrules listing and printing + 16 bit linear native support for Solaris i386 (sb16) + Update Festival Tutorial to 1.2.0 + +Aygust 15th 1997 1.1.99 Beta release + Win NT (and 95) initial support Cygnus win32 and Visual C++ + 100 more pages of documentation + LPC analysis for voices now ESPS independent + Spanish el voices tidy up (Borja) + ToBI by rule implementation + Confirmed support for gcc-2.7.2, gcc-2.6.3, Linux, FreeBSD, SunOS + Alpha and SunCC port on Solaris + reference card added + return s-expressions in server/client mode + OGI markup mode added. + Native support for sun16, linux16 and freebsd (compile time option) + Changed names of .C files to .cc files for bILL + wagon integrated into speech tools (plus docs) + auto-text-mode-alist for automatic selection of text mode from file name + Associated token tests added + Many more tokens dealt with (numbers, money, roman, phone, etc.) + (analysed databases to see what coverage is like) + A probablistic chart parser (no significant grammars though) + RJC's new database/units/join/modify modules taking shape + Some more examples added to the tutorial (with answers) + Integrated CSLU changes for OGItoolkit including TCL support + stml support for phrase types and words inline + ssml -> stml + Postlexical rules done in Scheme rather than C++ + Rest of functions to allow any manipulation of utterance from scheme + New duration models trained for both English and American + New lexicon (CMU based) + Consonant cluster support (for kd) + American diphone set + Cluster unit selection algorithm more robust + Ngram backoff smoothing + Token pos, for numbers (97.5%) but does poor on phone numbers + New lexicon with final Rs and r deletion as postlex rule + Update pos prediction (ts39) and phrase break ngrams (faster to load) + New ngram format (binary files, and smoothing) + Vowel reduction module + Sun CC port + New string class (rjc) remove dependence on libg++ + Update of course notes and new section on building models from dbs + Yarowsky homograph disambiguation + +Jan 24th 1997 1.1.1 release (first public release) + a number of configuration and INSTALL documentation bugs fixed + SSML tidied up and a festival script provided for it. + Diphones, again, checked and copyright explicitly added + +Jan 6th 1997 1.1.0 release + Roger diphones now default speaker + A new unit clustering algorithm with acoustic costs and + optimal coupling + BSD socket client/server support + A format function in Scheme (fprintf-like) + A short course on Speech Synthesis in Festival + (with course notes and exercises) + A programmable form of text modes including externally customizable + token to word rules. + Fully programmable intonation module (for ToBI-like theories) + Backtrace facility in Lisp + Externally specified Utterance end (for all tts modes) with lookahead + Roger diphones, first draft + +Nov 8th 1.0.0 release + Substantial bug fixes, stabilization and documentation updates + Added residual excited LPC synthesizer and removed PSOLA code. + Made sucs and taylor optional modules, new modules + can be added without modifying the base code + MOBY lexicon (not as good as cuvoald but free) + New diphone grouping software + A new diphone database module (free from adc) + +Sept 30th 0.1 release + MBROLA support (good example of external module) + latest news: read out the latest news (from Time Warner, Pathfinder) + audio spooler + --language option on command line + Spanish synthesis + Letter to sound rules as external system (replacing all the NRL code) + Welsh synthesis, making the whole system more language independent + sucs spoke in reasonably way (gsw_450 and f2b dbs) + document strings for functions (built in and user) and variables + access from command line and dumped automatically into texinfo + cleaned up SSML implementation + break prediction integrated using viterbi and pos + sucs module started (selection of units for concatenative synthesis) + a part-of-speech tagging system (ngram/viterbi based) + viterbi code added + fixes in SIOD for running batch and stdin, also sub_prompts added + saytime example + Memory leaks fixed, no leaks for tts + +July 30th 0.0 release (just for the sake of it) + a significiant start at documentation (texinfo -> info & html) + festival scripts using #! on first line + donovan diphone support + can compile (with too many warnings) under g++ 2.7.2 + copyrights on all files + memory leak checks (only 8 bytes for "unknown" words) + SSML (and tts file modes) + cuvoald cmu and beep lexicons + lexicon compilation + web page, emacs interface +June 2nd + Klatt duration module + syllabification in phones from letter to sound rules + Linear Regression model for F0 prediction (from ToBI labels) + CART (wagon) built trees for duration (zscores), phrase boundaries, + accent and endtone prediction. + ffeatures allowing specification of features of an utterance +May + integrated Taylor diphone module + US Naval Research letter to sound rules + CSTR lexicon + +12th April first words "hello" + + start with speech_tools library, scheme-in-one-defun and readline + and external CSTR diphone synthesizer + +7th April 1996 work started diff --git a/README b/README new file mode 100644 index 0000000..f47376e --- /dev/null +++ b/README @@ -0,0 +1,54 @@ + + The Festival Speech Synthesis System + version 2.1 RELEASE November 2010 + +This directory contains the Festival Speech Synthesis System, +developed at CSTR, University of Edinburgh. The project was originally +started by Alan W Black and Paul Taylor but many others have been +involved (see ACKNOWLEDGEMENTS file for full list). + +Festival offers a general framework for building speech synthesis +systems as well as including examples of various modules. As a whole +it offers full text to speech through a number APIs: from shell level, +though a Scheme command interpreter, as a C++ library, and an Emacs +interface. Festival is multi-lingual (currently English (US and UK) +and Spanish are distributed but a host of other voices have been +developed by others) though English is the most advanced. + +The system is written in C++ and uses the Edinburgh Speech Tools +for low level architecture and has a Scheme (SIOD) based command +interpreter for control. Documentation is given in the FSF texinfo +format which can generate, a printed manual, info files and HTML. + +COPYING + +Festival is free. Earlier versions were restricted to non-commercial +use but we have now relaxed those conditions. The licence is an X11 +style licence thus it can be incorporated in commercial products +and free source products without restriction. See COPYING for the +actual details. + +INSTALL + +Festival should run on any standard Unix platform. It has already run +on Solaris, SunOS, Linux and FreeBSD. It requires a C++ compiler (GCC +2.7.2, 2.8.1, 2.95.[123], 3.2.3 3.3.2 RedHat "gcc-2.96", gcc 3.3, gcc +4.4.x and gcc-4.5.x are our standard compilers) to install. A port to +Windows XP/NT/95/98 and 2000 using either Cygnus GNUWIN32, this is +still new but many people are successfully using it. + +A detailed description of installation and requirements for the whole +system is given in the file INSTALL read that for details. + +NEWS + +Keep abreast of Festival News by regularly checking the Festival homepage + http://www.cstr.ed.ac.uk/projects/festival/ +or the US site + http://festvox.org/festival/ + +New in Festival 2.1 + Support for various new GCC compilers + Improved support for hts, clustergen, clunits and multisyn voices + lots of wee bugs fixed + diff --git a/bin/Makefile b/bin/Makefile new file mode 100644 index 0000000..f788344 --- /dev/null +++ b/bin/Makefile @@ -0,0 +1,57 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1994,1995,1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: June 1997 ## + ## --------------------------------------------------------------------- ## + ## Makefile for bin directory. ## + ## ## + ## If we are staticly linked, we link to things in main, otherwise we ## + ## write scripts which set LD_LIBRARY_PATH. ## + ## ## + ## Things in scripts are preprocessed. ## + ## ## + ########################################################################### + +TOP=.. +DIRNAME=bin +FILES= Makefile VCLocalRules + +ALL = .remove_links .link_main .process_scripts text2wave + +include $(TOP)/config/common_make_rules +include $(EST)/config/rules/bin_process.mak + +text2wave: + @ cp -p $(TOP)/examples/text2wave . + diff --git a/bin/VCLocalRules b/bin/VCLocalRules new file mode 100644 index 0000000..c3ae385 --- /dev/null +++ b/bin/VCLocalRules @@ -0,0 +1,4 @@ +# don't make text2wave + +text2wave: + @echo "no text to wave" diff --git a/config.guess b/config.guess new file mode 100755 index 0000000..83c544d --- /dev/null +++ b/config.guess @@ -0,0 +1,1327 @@ +#! /bin/sh +# Attempt to guess a canonical system name. +# Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, +# 2000, 2001, 2002 Free Software Foundation, Inc. + +timestamp='2002-01-30' + +# This file is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. +# +# As a special exception to the GNU General Public License, if you +# distribute this file as part of a program that contains a +# configuration script generated by Autoconf, you may include it under +# the same distribution terms that you use for the rest of that program. + +# Originally written by Per Bothner . +# Please send patches to . Submit a context +# diff and a properly formatted ChangeLog entry. +# +# This script attempts to guess a canonical system name similar to +# config.sub. If it succeeds, it prints the system name on stdout, and +# exits with 0. Otherwise, it exits with 1. +# +# The plan is that this can be called by configure scripts if you +# don't specify an explicit build system type. + +me=`echo "$0" | sed -e 's,.*/,,'` + +usage="\ +Usage: $0 [OPTION] + +Output the configuration name of the system \`$me' is run on. + +Operation modes: + -h, --help print this help, then exit + -t, --time-stamp print date of last modification, then exit + -v, --version print version number, then exit + +Report bugs and patches to ." + +version="\ +GNU config.guess ($timestamp) + +Originally written by Per Bothner. +Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001 +Free Software Foundation, Inc. + +This is free software; see the source for copying conditions. There is NO +warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE." + +help=" +Try \`$me --help' for more information." + +# Parse command line +while test $# -gt 0 ; do + case $1 in + --time-stamp | --time* | -t ) + echo "$timestamp" ; exit 0 ;; + --version | -v ) + echo "$version" ; exit 0 ;; + --help | --h* | -h ) + echo "$usage"; exit 0 ;; + -- ) # Stop option processing + shift; break ;; + - ) # Use stdin as input. + break ;; + -* ) + echo "$me: invalid option $1$help" >&2 + exit 1 ;; + * ) + break ;; + esac +done + +if test $# != 0; then + echo "$me: too many arguments$help" >&2 + exit 1 +fi + + +dummy=dummy-$$ +trap 'rm -f $dummy.c $dummy.o $dummy.rel $dummy; exit 1' 1 2 15 + +# CC_FOR_BUILD -- compiler used by this script. +# Historically, `CC_FOR_BUILD' used to be named `HOST_CC'. We still +# use `HOST_CC' if defined, but it is deprecated. + +set_cc_for_build='case $CC_FOR_BUILD,$HOST_CC,$CC in + ,,) echo "int dummy(){}" > $dummy.c ; + for c in cc gcc c89 ; do + ($c $dummy.c -c -o $dummy.o) >/dev/null 2>&1 ; + if test $? = 0 ; then + CC_FOR_BUILD="$c"; break ; + fi ; + done ; + rm -f $dummy.c $dummy.o $dummy.rel ; + if test x"$CC_FOR_BUILD" = x ; then + CC_FOR_BUILD=no_compiler_found ; + fi + ;; + ,,*) CC_FOR_BUILD=$CC ;; + ,*,*) CC_FOR_BUILD=$HOST_CC ;; +esac' + +# This is needed to find uname on a Pyramid OSx when run in the BSD universe. +# (ghazi@noc.rutgers.edu 1994-08-24) +if (test -f /.attbin/uname) >/dev/null 2>&1 ; then + PATH=$PATH:/.attbin ; export PATH +fi + +UNAME_MACHINE=`(uname -m) 2>/dev/null` || UNAME_MACHINE=unknown +UNAME_RELEASE=`(uname -r) 2>/dev/null` || UNAME_RELEASE=unknown +UNAME_SYSTEM=`(uname -s) 2>/dev/null` || UNAME_SYSTEM=unknown +UNAME_VERSION=`(uname -v) 2>/dev/null` || UNAME_VERSION=unknown + +# Note: order is significant - the case branches are not exclusive. + +case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in + *:NetBSD:*:*) + # NetBSD (nbsd) targets should (where applicable) match one or + # more of the tupples: *-*-netbsdelf*, *-*-netbsdaout*, + # *-*-netbsdecoff* and *-*-netbsd*. For targets that recently + # switched to ELF, *-*-netbsd* would select the old + # object file format. This provides both forward + # compatibility and a consistent mechanism for selecting the + # object file format. + # + # Note: NetBSD doesn't particularly care about the vendor + # portion of the name. We always set it to "unknown". + UNAME_MACHINE_ARCH=`(uname -p) 2>/dev/null` || \ + UNAME_MACHINE_ARCH=unknown + case "${UNAME_MACHINE_ARCH}" in + arm*) machine=arm-unknown ;; + sh3el) machine=shl-unknown ;; + sh3eb) machine=sh-unknown ;; + *) machine=${UNAME_MACHINE_ARCH}-unknown ;; + esac + # The Operating System including object format, if it has switched + # to ELF recently, or will in the future. + case "${UNAME_MACHINE_ARCH}" in + arm*|i386|m68k|ns32k|sh3*|sparc|vax) + eval $set_cc_for_build + if echo __ELF__ | $CC_FOR_BUILD -E - 2>/dev/null \ + | grep __ELF__ >/dev/null + then + # Once all utilities can be ECOFF (netbsdecoff) or a.out (netbsdaout). + # Return netbsd for either. FIX? + os=netbsd + else + os=netbsdelf + fi + ;; + *) + os=netbsd + ;; + esac + # The OS release + release=`echo ${UNAME_RELEASE}|sed -e 's/[-_].*/\./'` + # Since CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM: + # contains redundant information, the shorter form: + # CPU_TYPE-MANUFACTURER-OPERATING_SYSTEM is used. + echo "${machine}-${os}${release}" + exit 0 ;; + amiga:OpenBSD:*:*) + echo m68k-unknown-openbsd${UNAME_RELEASE} + exit 0 ;; + arc:OpenBSD:*:*) + echo mipsel-unknown-openbsd${UNAME_RELEASE} + exit 0 ;; + hp300:OpenBSD:*:*) + echo m68k-unknown-openbsd${UNAME_RELEASE} + exit 0 ;; + mac68k:OpenBSD:*:*) + echo m68k-unknown-openbsd${UNAME_RELEASE} + exit 0 ;; + macppc:OpenBSD:*:*) + echo powerpc-unknown-openbsd${UNAME_RELEASE} + exit 0 ;; + mvme68k:OpenBSD:*:*) + echo m68k-unknown-openbsd${UNAME_RELEASE} + exit 0 ;; + mvme88k:OpenBSD:*:*) + echo m88k-unknown-openbsd${UNAME_RELEASE} + exit 0 ;; + mvmeppc:OpenBSD:*:*) + echo powerpc-unknown-openbsd${UNAME_RELEASE} + exit 0 ;; + pmax:OpenBSD:*:*) + echo mipsel-unknown-openbsd${UNAME_RELEASE} + exit 0 ;; + sgi:OpenBSD:*:*) + echo mipseb-unknown-openbsd${UNAME_RELEASE} + exit 0 ;; + sun3:OpenBSD:*:*) + echo m68k-unknown-openbsd${UNAME_RELEASE} + exit 0 ;; + wgrisc:OpenBSD:*:*) + echo mipsel-unknown-openbsd${UNAME_RELEASE} + exit 0 ;; + *:OpenBSD:*:*) + echo ${UNAME_MACHINE}-unknown-openbsd${UNAME_RELEASE} + exit 0 ;; + alpha:OSF1:*:*) + if test $UNAME_RELEASE = "V4.0"; then + UNAME_RELEASE=`/usr/sbin/sizer -v | awk '{print $3}'` + fi + # A Vn.n version is a released version. + # A Tn.n version is a released field test version. + # A Xn.n version is an unreleased experimental baselevel. + # 1.2 uses "1.2" for uname -r. + cat <$dummy.s + .data +\$Lformat: + .byte 37,100,45,37,120,10,0 # "%d-%x\n" + + .text + .globl main + .align 4 + .ent main +main: + .frame \$30,16,\$26,0 + ldgp \$29,0(\$27) + .prologue 1 + .long 0x47e03d80 # implver \$0 + lda \$2,-1 + .long 0x47e20c21 # amask \$2,\$1 + lda \$16,\$Lformat + mov \$0,\$17 + not \$1,\$18 + jsr \$26,printf + ldgp \$29,0(\$26) + mov 0,\$16 + jsr \$26,exit + .end main +EOF + eval $set_cc_for_build + $CC_FOR_BUILD $dummy.s -o $dummy 2>/dev/null + if test "$?" = 0 ; then + case `./$dummy` in + 0-0) + UNAME_MACHINE="alpha" + ;; + 1-0) + UNAME_MACHINE="alphaev5" + ;; + 1-1) + UNAME_MACHINE="alphaev56" + ;; + 1-101) + UNAME_MACHINE="alphapca56" + ;; + 2-303) + UNAME_MACHINE="alphaev6" + ;; + 2-307) + UNAME_MACHINE="alphaev67" + ;; + 2-1307) + UNAME_MACHINE="alphaev68" + ;; + esac + fi + rm -f $dummy.s $dummy + echo ${UNAME_MACHINE}-dec-osf`echo ${UNAME_RELEASE} | sed -e 's/^[VTX]//' | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz'` + exit 0 ;; + Alpha\ *:Windows_NT*:*) + # How do we know it's Interix rather than the generic POSIX subsystem? + # Should we change UNAME_MACHINE based on the output of uname instead + # of the specific Alpha model? + echo alpha-pc-interix + exit 0 ;; + 21064:Windows_NT:50:3) + echo alpha-dec-winnt3.5 + exit 0 ;; + Amiga*:UNIX_System_V:4.0:*) + echo m68k-unknown-sysv4 + exit 0;; + *:[Aa]miga[Oo][Ss]:*:*) + echo ${UNAME_MACHINE}-unknown-amigaos + exit 0 ;; + *:[Mm]orph[Oo][Ss]:*:*) + echo ${UNAME_MACHINE}-unknown-morphos + exit 0 ;; + *:OS/390:*:*) + echo i370-ibm-openedition + exit 0 ;; + arm:RISC*:1.[012]*:*|arm:riscix:1.[012]*:*) + echo arm-acorn-riscix${UNAME_RELEASE} + exit 0;; + SR2?01:HI-UX/MPP:*:* | SR8000:HI-UX/MPP:*:*) + echo hppa1.1-hitachi-hiuxmpp + exit 0;; + Pyramid*:OSx*:*:* | MIS*:OSx*:*:* | MIS*:SMP_DC-OSx*:*:*) + # akee@wpdis03.wpafb.af.mil (Earle F. Ake) contributed MIS and NILE. + if test "`(/bin/universe) 2>/dev/null`" = att ; then + echo pyramid-pyramid-sysv3 + else + echo pyramid-pyramid-bsd + fi + exit 0 ;; + NILE*:*:*:dcosx) + echo pyramid-pyramid-svr4 + exit 0 ;; + sun4H:SunOS:5.*:*) + echo sparc-hal-solaris2`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'` + exit 0 ;; + sun4*:SunOS:5.*:* | tadpole*:SunOS:5.*:*) + echo sparc-sun-solaris2`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'` + exit 0 ;; + i86pc:SunOS:5.*:*) + echo i386-pc-solaris2`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'` + exit 0 ;; + sun4*:SunOS:6*:*) + # According to config.sub, this is the proper way to canonicalize + # SunOS6. Hard to guess exactly what SunOS6 will be like, but + # it's likely to be more like Solaris than SunOS4. + echo sparc-sun-solaris3`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'` + exit 0 ;; + sun4*:SunOS:*:*) + case "`/usr/bin/arch -k`" in + Series*|S4*) + UNAME_RELEASE=`uname -v` + ;; + esac + # Japanese Language versions have a version number like `4.1.3-JL'. + echo sparc-sun-sunos`echo ${UNAME_RELEASE}|sed -e 's/-/_/'` + exit 0 ;; + sun3*:SunOS:*:*) + echo m68k-sun-sunos${UNAME_RELEASE} + exit 0 ;; + sun*:*:4.2BSD:*) + UNAME_RELEASE=`(head -1 /etc/motd | awk '{print substr($5,1,3)}') 2>/dev/null` + test "x${UNAME_RELEASE}" = "x" && UNAME_RELEASE=3 + case "`/bin/arch`" in + sun3) + echo m68k-sun-sunos${UNAME_RELEASE} + ;; + sun4) + echo sparc-sun-sunos${UNAME_RELEASE} + ;; + esac + exit 0 ;; + aushp:SunOS:*:*) + echo sparc-auspex-sunos${UNAME_RELEASE} + exit 0 ;; + # The situation for MiNT is a little confusing. The machine name + # can be virtually everything (everything which is not + # "atarist" or "atariste" at least should have a processor + # > m68000). The system name ranges from "MiNT" over "FreeMiNT" + # to the lowercase version "mint" (or "freemint"). Finally + # the system name "TOS" denotes a system which is actually not + # MiNT. But MiNT is downward compatible to TOS, so this should + # be no problem. + atarist[e]:*MiNT:*:* | atarist[e]:*mint:*:* | atarist[e]:*TOS:*:*) + echo m68k-atari-mint${UNAME_RELEASE} + exit 0 ;; + atari*:*MiNT:*:* | atari*:*mint:*:* | atarist[e]:*TOS:*:*) + echo m68k-atari-mint${UNAME_RELEASE} + exit 0 ;; + *falcon*:*MiNT:*:* | *falcon*:*mint:*:* | *falcon*:*TOS:*:*) + echo m68k-atari-mint${UNAME_RELEASE} + exit 0 ;; + milan*:*MiNT:*:* | milan*:*mint:*:* | *milan*:*TOS:*:*) + echo m68k-milan-mint${UNAME_RELEASE} + exit 0 ;; + hades*:*MiNT:*:* | hades*:*mint:*:* | *hades*:*TOS:*:*) + echo m68k-hades-mint${UNAME_RELEASE} + exit 0 ;; + *:*MiNT:*:* | *:*mint:*:* | *:*TOS:*:*) + echo m68k-unknown-mint${UNAME_RELEASE} + exit 0 ;; + powerpc:machten:*:*) + echo powerpc-apple-machten${UNAME_RELEASE} + exit 0 ;; + RISC*:Mach:*:*) + echo mips-dec-mach_bsd4.3 + exit 0 ;; + RISC*:ULTRIX:*:*) + echo mips-dec-ultrix${UNAME_RELEASE} + exit 0 ;; + VAX*:ULTRIX*:*:*) + echo vax-dec-ultrix${UNAME_RELEASE} + exit 0 ;; + 2020:CLIX:*:* | 2430:CLIX:*:*) + echo clipper-intergraph-clix${UNAME_RELEASE} + exit 0 ;; + mips:*:*:UMIPS | mips:*:*:RISCos) + eval $set_cc_for_build + sed 's/^ //' << EOF >$dummy.c +#ifdef __cplusplus +#include /* for printf() prototype */ + int main (int argc, char *argv[]) { +#else + int main (argc, argv) int argc; char *argv[]; { +#endif + #if defined (host_mips) && defined (MIPSEB) + #if defined (SYSTYPE_SYSV) + printf ("mips-mips-riscos%ssysv\n", argv[1]); exit (0); + #endif + #if defined (SYSTYPE_SVR4) + printf ("mips-mips-riscos%ssvr4\n", argv[1]); exit (0); + #endif + #if defined (SYSTYPE_BSD43) || defined(SYSTYPE_BSD) + printf ("mips-mips-riscos%sbsd\n", argv[1]); exit (0); + #endif + #endif + exit (-1); + } +EOF + $CC_FOR_BUILD $dummy.c -o $dummy \ + && ./$dummy `echo "${UNAME_RELEASE}" | sed -n 's/\([0-9]*\).*/\1/p'` \ + && rm -f $dummy.c $dummy && exit 0 + rm -f $dummy.c $dummy + echo mips-mips-riscos${UNAME_RELEASE} + exit 0 ;; + Motorola:PowerMAX_OS:*:*) + echo powerpc-motorola-powermax + exit 0 ;; + Night_Hawk:Power_UNIX:*:*) + echo powerpc-harris-powerunix + exit 0 ;; + m88k:CX/UX:7*:*) + echo m88k-harris-cxux7 + exit 0 ;; + m88k:*:4*:R4*) + echo m88k-motorola-sysv4 + exit 0 ;; + m88k:*:3*:R3*) + echo m88k-motorola-sysv3 + exit 0 ;; + AViiON:dgux:*:*) + # DG/UX returns AViiON for all architectures + UNAME_PROCESSOR=`/usr/bin/uname -p` + if [ $UNAME_PROCESSOR = mc88100 ] || [ $UNAME_PROCESSOR = mc88110 ] + then + if [ ${TARGET_BINARY_INTERFACE}x = m88kdguxelfx ] || \ + [ ${TARGET_BINARY_INTERFACE}x = x ] + then + echo m88k-dg-dgux${UNAME_RELEASE} + else + echo m88k-dg-dguxbcs${UNAME_RELEASE} + fi + else + echo i586-dg-dgux${UNAME_RELEASE} + fi + exit 0 ;; + M88*:DolphinOS:*:*) # DolphinOS (SVR3) + echo m88k-dolphin-sysv3 + exit 0 ;; + M88*:*:R3*:*) + # Delta 88k system running SVR3 + echo m88k-motorola-sysv3 + exit 0 ;; + XD88*:*:*:*) # Tektronix XD88 system running UTekV (SVR3) + echo m88k-tektronix-sysv3 + exit 0 ;; + Tek43[0-9][0-9]:UTek:*:*) # Tektronix 4300 system running UTek (BSD) + echo m68k-tektronix-bsd + exit 0 ;; + *:IRIX*:*:*) + echo mips-sgi-irix`echo ${UNAME_RELEASE}|sed -e 's/-/_/g'` + exit 0 ;; + ????????:AIX?:[12].1:2) # AIX 2.2.1 or AIX 2.1.1 is RT/PC AIX. + echo romp-ibm-aix # uname -m gives an 8 hex-code CPU id + exit 0 ;; # Note that: echo "'`uname -s`'" gives 'AIX ' + i*86:AIX:*:*) + echo i386-ibm-aix + exit 0 ;; + ia64:AIX:*:*) + if [ -x /usr/bin/oslevel ] ; then + IBM_REV=`/usr/bin/oslevel` + else + IBM_REV=${UNAME_VERSION}.${UNAME_RELEASE} + fi + echo ${UNAME_MACHINE}-ibm-aix${IBM_REV} + exit 0 ;; + *:AIX:2:3) + if grep bos325 /usr/include/stdio.h >/dev/null 2>&1; then + eval $set_cc_for_build + sed 's/^ //' << EOF >$dummy.c + #include + + main() + { + if (!__power_pc()) + exit(1); + puts("powerpc-ibm-aix3.2.5"); + exit(0); + } +EOF + $CC_FOR_BUILD $dummy.c -o $dummy && ./$dummy && rm -f $dummy.c $dummy && exit 0 + rm -f $dummy.c $dummy + echo rs6000-ibm-aix3.2.5 + elif grep bos324 /usr/include/stdio.h >/dev/null 2>&1; then + echo rs6000-ibm-aix3.2.4 + else + echo rs6000-ibm-aix3.2 + fi + exit 0 ;; + *:AIX:*:[45]) + IBM_CPU_ID=`/usr/sbin/lsdev -C -c processor -S available | head -1 | awk '{ print $1 }'` + if /usr/sbin/lsattr -El ${IBM_CPU_ID} | grep ' POWER' >/dev/null 2>&1; then + IBM_ARCH=rs6000 + else + IBM_ARCH=powerpc + fi + if [ -x /usr/bin/oslevel ] ; then + IBM_REV=`/usr/bin/oslevel` + else + IBM_REV=${UNAME_VERSION}.${UNAME_RELEASE} + fi + echo ${IBM_ARCH}-ibm-aix${IBM_REV} + exit 0 ;; + *:AIX:*:*) + echo rs6000-ibm-aix + exit 0 ;; + ibmrt:4.4BSD:*|romp-ibm:BSD:*) + echo romp-ibm-bsd4.4 + exit 0 ;; + ibmrt:*BSD:*|romp-ibm:BSD:*) # covers RT/PC BSD and + echo romp-ibm-bsd${UNAME_RELEASE} # 4.3 with uname added to + exit 0 ;; # report: romp-ibm BSD 4.3 + *:BOSX:*:*) + echo rs6000-bull-bosx + exit 0 ;; + DPX/2?00:B.O.S.:*:*) + echo m68k-bull-sysv3 + exit 0 ;; + 9000/[34]??:4.3bsd:1.*:*) + echo m68k-hp-bsd + exit 0 ;; + hp300:4.4BSD:*:* | 9000/[34]??:4.3bsd:2.*:*) + echo m68k-hp-bsd4.4 + exit 0 ;; + 9000/[34678]??:HP-UX:*:*) + HPUX_REV=`echo ${UNAME_RELEASE}|sed -e 's/[^.]*.[0B]*//'` + case "${UNAME_MACHINE}" in + 9000/31? ) HP_ARCH=m68000 ;; + 9000/[34]?? ) HP_ARCH=m68k ;; + 9000/[678][0-9][0-9]) + if [ -x /usr/bin/getconf ]; then + sc_cpu_version=`/usr/bin/getconf SC_CPU_VERSION 2>/dev/null` + sc_kernel_bits=`/usr/bin/getconf SC_KERNEL_BITS 2>/dev/null` + case "${sc_cpu_version}" in + 523) HP_ARCH="hppa1.0" ;; # CPU_PA_RISC1_0 + 528) HP_ARCH="hppa1.1" ;; # CPU_PA_RISC1_1 + 532) # CPU_PA_RISC2_0 + case "${sc_kernel_bits}" in + 32) HP_ARCH="hppa2.0n" ;; + 64) HP_ARCH="hppa2.0w" ;; + '') HP_ARCH="hppa2.0" ;; # HP-UX 10.20 + esac ;; + esac + fi + if [ "${HP_ARCH}" = "" ]; then + eval $set_cc_for_build + sed 's/^ //' << EOF >$dummy.c + + #define _HPUX_SOURCE + #include + #include + + int main () + { + #if defined(_SC_KERNEL_BITS) + long bits = sysconf(_SC_KERNEL_BITS); + #endif + long cpu = sysconf (_SC_CPU_VERSION); + + switch (cpu) + { + case CPU_PA_RISC1_0: puts ("hppa1.0"); break; + case CPU_PA_RISC1_1: puts ("hppa1.1"); break; + case CPU_PA_RISC2_0: + #if defined(_SC_KERNEL_BITS) + switch (bits) + { + case 64: puts ("hppa2.0w"); break; + case 32: puts ("hppa2.0n"); break; + default: puts ("hppa2.0"); break; + } break; + #else /* !defined(_SC_KERNEL_BITS) */ + puts ("hppa2.0"); break; + #endif + default: puts ("hppa1.0"); break; + } + exit (0); + } +EOF + (CCOPTS= $CC_FOR_BUILD $dummy.c -o $dummy 2>/dev/null) && HP_ARCH=`./$dummy` + if test -z "$HP_ARCH"; then HP_ARCH=hppa; fi + rm -f $dummy.c $dummy + fi ;; + esac + echo ${HP_ARCH}-hp-hpux${HPUX_REV} + exit 0 ;; + ia64:HP-UX:*:*) + HPUX_REV=`echo ${UNAME_RELEASE}|sed -e 's/[^.]*.[0B]*//'` + echo ia64-hp-hpux${HPUX_REV} + exit 0 ;; + 3050*:HI-UX:*:*) + eval $set_cc_for_build + sed 's/^ //' << EOF >$dummy.c + #include + int + main () + { + long cpu = sysconf (_SC_CPU_VERSION); + /* The order matters, because CPU_IS_HP_MC68K erroneously returns + true for CPU_PA_RISC1_0. CPU_IS_PA_RISC returns correct + results, however. */ + if (CPU_IS_PA_RISC (cpu)) + { + switch (cpu) + { + case CPU_PA_RISC1_0: puts ("hppa1.0-hitachi-hiuxwe2"); break; + case CPU_PA_RISC1_1: puts ("hppa1.1-hitachi-hiuxwe2"); break; + case CPU_PA_RISC2_0: puts ("hppa2.0-hitachi-hiuxwe2"); break; + default: puts ("hppa-hitachi-hiuxwe2"); break; + } + } + else if (CPU_IS_HP_MC68K (cpu)) + puts ("m68k-hitachi-hiuxwe2"); + else puts ("unknown-hitachi-hiuxwe2"); + exit (0); + } +EOF + $CC_FOR_BUILD $dummy.c -o $dummy && ./$dummy && rm -f $dummy.c $dummy && exit 0 + rm -f $dummy.c $dummy + echo unknown-hitachi-hiuxwe2 + exit 0 ;; + 9000/7??:4.3bsd:*:* | 9000/8?[79]:4.3bsd:*:* ) + echo hppa1.1-hp-bsd + exit 0 ;; + 9000/8??:4.3bsd:*:*) + echo hppa1.0-hp-bsd + exit 0 ;; + *9??*:MPE/iX:*:* | *3000*:MPE/iX:*:*) + echo hppa1.0-hp-mpeix + exit 0 ;; + hp7??:OSF1:*:* | hp8?[79]:OSF1:*:* ) + echo hppa1.1-hp-osf + exit 0 ;; + hp8??:OSF1:*:*) + echo hppa1.0-hp-osf + exit 0 ;; + i*86:OSF1:*:*) + if [ -x /usr/sbin/sysversion ] ; then + echo ${UNAME_MACHINE}-unknown-osf1mk + else + echo ${UNAME_MACHINE}-unknown-osf1 + fi + exit 0 ;; + parisc*:Lites*:*:*) + echo hppa1.1-hp-lites + exit 0 ;; + C1*:ConvexOS:*:* | convex:ConvexOS:C1*:*) + echo c1-convex-bsd + exit 0 ;; + C2*:ConvexOS:*:* | convex:ConvexOS:C2*:*) + if getsysinfo -f scalar_acc + then echo c32-convex-bsd + else echo c2-convex-bsd + fi + exit 0 ;; + C34*:ConvexOS:*:* | convex:ConvexOS:C34*:*) + echo c34-convex-bsd + exit 0 ;; + C38*:ConvexOS:*:* | convex:ConvexOS:C38*:*) + echo c38-convex-bsd + exit 0 ;; + C4*:ConvexOS:*:* | convex:ConvexOS:C4*:*) + echo c4-convex-bsd + exit 0 ;; + CRAY*X-MP:*:*:*) + echo xmp-cray-unicos + exit 0 ;; + CRAY*Y-MP:*:*:*) + echo ymp-cray-unicos${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/' + exit 0 ;; + CRAY*[A-Z]90:*:*:*) + echo ${UNAME_MACHINE}-cray-unicos${UNAME_RELEASE} \ + | sed -e 's/CRAY.*\([A-Z]90\)/\1/' \ + -e y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/ \ + -e 's/\.[^.]*$/.X/' + exit 0 ;; + CRAY*TS:*:*:*) + echo t90-cray-unicos${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/' + exit 0 ;; + CRAY*T3D:*:*:*) + echo alpha-cray-unicosmk${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/' + exit 0 ;; + CRAY*T3E:*:*:*) + echo alphaev5-cray-unicosmk${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/' + exit 0 ;; + CRAY*SV1:*:*:*) + echo sv1-cray-unicos${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/' + exit 0 ;; + CRAY-2:*:*:*) + echo cray2-cray-unicos + exit 0 ;; + F30[01]:UNIX_System_V:*:* | F700:UNIX_System_V:*:*) + FUJITSU_PROC=`uname -m | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz'` + FUJITSU_SYS=`uname -p | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' | sed -e 's/\///'` + FUJITSU_REL=`echo ${UNAME_RELEASE} | sed -e 's/ /_/'` + echo "${FUJITSU_PROC}-fujitsu-${FUJITSU_SYS}${FUJITSU_REL}" + exit 0 ;; + i*86:BSD/386:*:* | i*86:BSD/OS:*:* | *:Ascend\ Embedded/OS:*:*) + echo ${UNAME_MACHINE}-pc-bsdi${UNAME_RELEASE} + exit 0 ;; + sparc*:BSD/OS:*:*) + echo sparc-unknown-bsdi${UNAME_RELEASE} + exit 0 ;; + *:BSD/OS:*:*) + echo ${UNAME_MACHINE}-unknown-bsdi${UNAME_RELEASE} + exit 0 ;; + *:FreeBSD:*:*) + echo ${UNAME_MACHINE}-unknown-freebsd`echo ${UNAME_RELEASE}|sed -e 's/[-(].*//'` + exit 0 ;; + i*:CYGWIN*:*) + echo ${UNAME_MACHINE}-pc-cygwin + exit 0 ;; + i*:MINGW*:*) + echo ${UNAME_MACHINE}-pc-mingw32 + exit 0 ;; + i*:PW*:*) + echo ${UNAME_MACHINE}-pc-pw32 + exit 0 ;; + x86:Interix*:3*) + echo i386-pc-interix3 + exit 0 ;; + i*:Windows_NT*:* | Pentium*:Windows_NT*:*) + # How do we know it's Interix rather than the generic POSIX subsystem? + # It also conflicts with pre-2.0 versions of AT&T UWIN. Should we + # UNAME_MACHINE based on the output of uname instead of i386? + echo i386-pc-interix + exit 0 ;; + i*:UWIN*:*) + echo ${UNAME_MACHINE}-pc-uwin + exit 0 ;; + p*:CYGWIN*:*) + echo powerpcle-unknown-cygwin + exit 0 ;; + prep*:SunOS:5.*:*) + echo powerpcle-unknown-solaris2`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'` + exit 0 ;; + *:GNU:*:*) + echo `echo ${UNAME_MACHINE}|sed -e 's,[-/].*$,,'`-unknown-gnu`echo ${UNAME_RELEASE}|sed -e 's,/.*$,,'` + exit 0 ;; + i*86:Minix:*:*) + echo ${UNAME_MACHINE}-pc-minix + exit 0 ;; + arm*:Linux:*:*) + echo ${UNAME_MACHINE}-unknown-linux-gnu + exit 0 ;; + ia64:Linux:*:*) + echo ${UNAME_MACHINE}-unknown-linux + exit 0 ;; + m68*:Linux:*:*) + echo ${UNAME_MACHINE}-unknown-linux-gnu + exit 0 ;; + mips:Linux:*:*) + eval $set_cc_for_build + sed 's/^ //' << EOF >$dummy.c + #undef CPU + #undef mips + #undef mipsel + #if defined(__MIPSEL__) || defined(__MIPSEL) || defined(_MIPSEL) || defined(MIPSEL) + CPU=mipsel + #else + #if defined(__MIPSEB__) || defined(__MIPSEB) || defined(_MIPSEB) || defined(MIPSEB) + CPU=mips + #else + CPU= + #endif + #endif +EOF + eval `$CC_FOR_BUILD -E $dummy.c 2>/dev/null | grep ^CPU=` + rm -f $dummy.c + test x"${CPU}" != x && echo "${CPU}-pc-linux-gnu" && exit 0 + ;; + ppc:Linux:*:*) + echo powerpc-unknown-linux-gnu + exit 0 ;; + ppc64:Linux:*:*) + echo powerpc64-unknown-linux-gnu + exit 0 ;; + alpha:Linux:*:*) + case `sed -n '/^cpu model/s/^.*: \(.*\)/\1/p' < /proc/cpuinfo` in + EV5) UNAME_MACHINE=alphaev5 ;; + EV56) UNAME_MACHINE=alphaev56 ;; + PCA56) UNAME_MACHINE=alphapca56 ;; + PCA57) UNAME_MACHINE=alphapca56 ;; + EV6) UNAME_MACHINE=alphaev6 ;; + EV67) UNAME_MACHINE=alphaev67 ;; + EV68*) UNAME_MACHINE=alphaev68 ;; + esac + objdump --private-headers /bin/sh | grep ld.so.1 >/dev/null + if test "$?" = 0 ; then LIBC="libc1" ; else LIBC="" ; fi + echo ${UNAME_MACHINE}-unknown-linux-gnu${LIBC} + exit 0 ;; + parisc:Linux:*:* | hppa:Linux:*:*) + # Look for CPU level + case `grep '^cpu[^a-z]*:' /proc/cpuinfo 2>/dev/null | cut -d' ' -f2` in + PA7*) echo hppa1.1-unknown-linux-gnu ;; + PA8*) echo hppa2.0-unknown-linux-gnu ;; + *) echo hppa-unknown-linux-gnu ;; + esac + exit 0 ;; + parisc64:Linux:*:* | hppa64:Linux:*:*) + echo hppa64-unknown-linux-gnu + exit 0 ;; + s390:Linux:*:* | s390x:Linux:*:*) + echo ${UNAME_MACHINE}-ibm-linux + exit 0 ;; + sh*:Linux:*:*) + echo ${UNAME_MACHINE}-unknown-linux-gnu + exit 0 ;; + sparc:Linux:*:* | sparc64:Linux:*:*) + echo ${UNAME_MACHINE}-unknown-linux-gnu + exit 0 ;; + x86_64:Linux:*:*) + echo x86_64-unknown-linux-gnu + exit 0 ;; + i*86:Linux:*:*) + # The BFD linker knows what the default object file format is, so + # first see if it will tell us. cd to the root directory to prevent + # problems with other programs or directories called `ld' in the path. + # Export LANG=C to prevent ld from outputting information in other + # languages. + ld_supported_targets=`LANG=C; export LANG; cd /; ld --help 2>&1 \ + | sed -ne '/supported targets:/!d + s/[ ][ ]*/ /g + s/.*supported targets: *// + s/ .*// + p'` + case "$ld_supported_targets" in + elf32-i386) + TENTATIVE="${UNAME_MACHINE}-pc-linux-gnu" + ;; + a.out-i386-linux) + echo "${UNAME_MACHINE}-pc-linux-gnuaout" + exit 0 ;; + coff-i386) + echo "${UNAME_MACHINE}-pc-linux-gnucoff" + exit 0 ;; + "") + # Either a pre-BFD a.out linker (linux-gnuoldld) or + # one that does not give us useful --help. + echo "${UNAME_MACHINE}-pc-linux-gnuoldld" + exit 0 ;; + esac + # Determine whether the default compiler is a.out or elf + eval $set_cc_for_build + sed 's/^ //' << EOF >$dummy.c + #include + #ifdef __ELF__ + # ifdef __GLIBC__ + # if __GLIBC__ >= 2 + LIBC=gnu + # else + LIBC=gnulibc1 + # endif + # else + LIBC=gnulibc1 + # endif + #else + #ifdef __INTEL_COMPILER + LIBC=gnu + #else + LIBC=gnuaout + #endif + #endif +EOF + eval `$CC_FOR_BUILD -E $dummy.c 2>/dev/null | grep ^LIBC=` + rm -f $dummy.c + test x"${LIBC}" != x && echo "${UNAME_MACHINE}-pc-linux-${LIBC}" && exit 0 + test x"${TENTATIVE}" != x && echo "${TENTATIVE}" && exit 0 + ;; + i*86:DYNIX/ptx:4*:*) + # ptx 4.0 does uname -s correctly, with DYNIX/ptx in there. + # earlier versions are messed up and put the nodename in both + # sysname and nodename. + echo i386-sequent-sysv4 + exit 0 ;; + i*86:UNIX_SV:4.2MP:2.*) + # Unixware is an offshoot of SVR4, but it has its own version + # number series starting with 2... + # I am not positive that other SVR4 systems won't match this, + # I just have to hope. -- rms. + # Use sysv4.2uw... so that sysv4* matches it. + echo ${UNAME_MACHINE}-pc-sysv4.2uw${UNAME_VERSION} + exit 0 ;; + i*86:*:4.*:* | i*86:SYSTEM_V:4.*:*) + UNAME_REL=`echo ${UNAME_RELEASE} | sed 's/\/MP$//'` + if grep Novell /usr/include/link.h >/dev/null 2>/dev/null; then + echo ${UNAME_MACHINE}-univel-sysv${UNAME_REL} + else + echo ${UNAME_MACHINE}-pc-sysv${UNAME_REL} + fi + exit 0 ;; + i*86:*:5:[78]*) + case `/bin/uname -X | grep "^Machine"` in + *486*) UNAME_MACHINE=i486 ;; + *Pentium) UNAME_MACHINE=i586 ;; + *Pent*|*Celeron) UNAME_MACHINE=i686 ;; + esac + echo ${UNAME_MACHINE}-unknown-sysv${UNAME_RELEASE}${UNAME_SYSTEM}${UNAME_VERSION} + exit 0 ;; + i*86:*:3.2:*) + if test -f /usr/options/cb.name; then + UNAME_REL=`sed -n 's/.*Version //p' /dev/null >/dev/null ; then + UNAME_REL=`(/bin/uname -X|egrep Release|sed -e 's/.*= //')` + (/bin/uname -X|egrep i80486 >/dev/null) && UNAME_MACHINE=i486 + (/bin/uname -X|egrep '^Machine.*Pentium' >/dev/null) \ + && UNAME_MACHINE=i586 + (/bin/uname -X|egrep '^Machine.*Pent ?II' >/dev/null) \ + && UNAME_MACHINE=i686 + (/bin/uname -X|egrep '^Machine.*Pentium Pro' >/dev/null) \ + && UNAME_MACHINE=i686 + echo ${UNAME_MACHINE}-pc-sco$UNAME_REL + else + echo ${UNAME_MACHINE}-pc-sysv32 + fi + exit 0 ;; + i*86:*DOS:*:*) + echo ${UNAME_MACHINE}-pc-msdosdjgpp + exit 0 ;; + pc:*:*:*) + # Left here for compatibility: + # uname -m prints for DJGPP always 'pc', but it prints nothing about + # the processor, so we play safe by assuming i386. + echo i386-pc-msdosdjgpp + exit 0 ;; + Intel:Mach:3*:*) + echo i386-pc-mach3 + exit 0 ;; + paragon:*:*:*) + echo i860-intel-osf1 + exit 0 ;; + i860:*:4.*:*) # i860-SVR4 + if grep Stardent /usr/include/sys/uadmin.h >/dev/null 2>&1 ; then + echo i860-stardent-sysv${UNAME_RELEASE} # Stardent Vistra i860-SVR4 + else # Add other i860-SVR4 vendors below as they are discovered. + echo i860-unknown-sysv${UNAME_RELEASE} # Unknown i860-SVR4 + fi + exit 0 ;; + mini*:CTIX:SYS*5:*) + # "miniframe" + echo m68010-convergent-sysv + exit 0 ;; + M68*:*:R3V[567]*:*) + test -r /sysV68 && echo 'm68k-motorola-sysv' && exit 0 ;; + 3[34]??:*:4.0:3.0 | 3[34]??A:*:4.0:3.0 | 3[34]??,*:*:4.0:3.0 | 3[34]??/*:*:4.0:3.0 | 4850:*:4.0:3.0 | SKA40:*:4.0:3.0) + OS_REL='' + test -r /etc/.relid \ + && OS_REL=.`sed -n 's/[^ ]* [^ ]* \([0-9][0-9]\).*/\1/p' < /etc/.relid` + /bin/uname -p 2>/dev/null | grep 86 >/dev/null \ + && echo i486-ncr-sysv4.3${OS_REL} && exit 0 + /bin/uname -p 2>/dev/null | /bin/grep entium >/dev/null \ + && echo i586-ncr-sysv4.3${OS_REL} && exit 0 ;; + 3[34]??:*:4.0:* | 3[34]??,*:*:4.0:*) + /bin/uname -p 2>/dev/null | grep 86 >/dev/null \ + && echo i486-ncr-sysv4 && exit 0 ;; + m68*:LynxOS:2.*:* | m68*:LynxOS:3.0*:*) + echo m68k-unknown-lynxos${UNAME_RELEASE} + exit 0 ;; + mc68030:UNIX_System_V:4.*:*) + echo m68k-atari-sysv4 + exit 0 ;; + i*86:LynxOS:2.*:* | i*86:LynxOS:3.[01]*:* | i*86:LynxOS:4.0*:*) + echo i386-unknown-lynxos${UNAME_RELEASE} + exit 0 ;; + TSUNAMI:LynxOS:2.*:*) + echo sparc-unknown-lynxos${UNAME_RELEASE} + exit 0 ;; + rs6000:LynxOS:2.*:*) + echo rs6000-unknown-lynxos${UNAME_RELEASE} + exit 0 ;; + PowerPC:LynxOS:2.*:* | PowerPC:LynxOS:3.[01]*:* | PowerPC:LynxOS:4.0*:*) + echo powerpc-unknown-lynxos${UNAME_RELEASE} + exit 0 ;; + SM[BE]S:UNIX_SV:*:*) + echo mips-dde-sysv${UNAME_RELEASE} + exit 0 ;; + RM*:ReliantUNIX-*:*:*) + echo mips-sni-sysv4 + exit 0 ;; + RM*:SINIX-*:*:*) + echo mips-sni-sysv4 + exit 0 ;; + *:SINIX-*:*:*) + if uname -p 2>/dev/null >/dev/null ; then + UNAME_MACHINE=`(uname -p) 2>/dev/null` + echo ${UNAME_MACHINE}-sni-sysv4 + else + echo ns32k-sni-sysv + fi + exit 0 ;; + PENTIUM:*:4.0*:*) # Unisys `ClearPath HMP IX 4000' SVR4/MP effort + # says + echo i586-unisys-sysv4 + exit 0 ;; + *:UNIX_System_V:4*:FTX*) + # From Gerald Hewes . + # How about differentiating between stratus architectures? -djm + echo hppa1.1-stratus-sysv4 + exit 0 ;; + *:*:*:FTX*) + # From seanf@swdc.stratus.com. + echo i860-stratus-sysv4 + exit 0 ;; + *:VOS:*:*) + # From Paul.Green@stratus.com. + echo hppa1.1-stratus-vos + exit 0 ;; + mc68*:A/UX:*:*) + echo m68k-apple-aux${UNAME_RELEASE} + exit 0 ;; + news*:NEWS-OS:6*:*) + echo mips-sony-newsos6 + exit 0 ;; + R[34]000:*System_V*:*:* | R4000:UNIX_SYSV:*:* | R*000:UNIX_SV:*:*) + if [ -d /usr/nec ]; then + echo mips-nec-sysv${UNAME_RELEASE} + else + echo mips-unknown-sysv${UNAME_RELEASE} + fi + exit 0 ;; + BeBox:BeOS:*:*) # BeOS running on hardware made by Be, PPC only. + echo powerpc-be-beos + exit 0 ;; + BeMac:BeOS:*:*) # BeOS running on Mac or Mac clone, PPC only. + echo powerpc-apple-beos + exit 0 ;; + BePC:BeOS:*:*) # BeOS running on Intel PC compatible. + echo i586-pc-beos + exit 0 ;; + SX-4:SUPER-UX:*:*) + echo sx4-nec-superux${UNAME_RELEASE} + exit 0 ;; + SX-5:SUPER-UX:*:*) + echo sx5-nec-superux${UNAME_RELEASE} + exit 0 ;; + Power*:Rhapsody:*:*) + echo powerpc-apple-rhapsody${UNAME_RELEASE} + exit 0 ;; + *:Rhapsody:*:*) + echo ${UNAME_MACHINE}-apple-rhapsody${UNAME_RELEASE} + exit 0 ;; + *:Darwin:*:*) + echo `uname -p`-apple-darwin${UNAME_RELEASE} + exit 0 ;; + *:procnto*:*:* | *:QNX:[0123456789]*:*) + if test "${UNAME_MACHINE}" = "x86pc"; then + UNAME_MACHINE=pc + echo i386-${UNAME_MACHINE}-nto-qnx + else + echo `uname -p`-${UNAME_MACHINE}-nto-qnx + fi + exit 0 ;; + *:QNX:*:4*) + echo i386-pc-qnx + exit 0 ;; + NSR-[GKLNPTVW]:NONSTOP_KERNEL:*:*) + echo nsr-tandem-nsk${UNAME_RELEASE} + exit 0 ;; + *:NonStop-UX:*:*) + echo mips-compaq-nonstopux + exit 0 ;; + BS2000:POSIX*:*:*) + echo bs2000-siemens-sysv + exit 0 ;; + DS/*:UNIX_System_V:*:*) + echo ${UNAME_MACHINE}-${UNAME_SYSTEM}-${UNAME_RELEASE} + exit 0 ;; + *:Plan9:*:*) + # "uname -m" is not consistent, so use $cputype instead. 386 + # is converted to i386 for consistency with other x86 + # operating systems. + if test "$cputype" = "386"; then + UNAME_MACHINE=i386 + else + UNAME_MACHINE="$cputype" + fi + echo ${UNAME_MACHINE}-unknown-plan9 + exit 0 ;; + i*86:OS/2:*:*) + # If we were able to find `uname', then EMX Unix compatibility + # is probably installed. + echo ${UNAME_MACHINE}-pc-os2-emx + exit 0 ;; + *:TOPS-10:*:*) + echo pdp10-unknown-tops10 + exit 0 ;; + *:TENEX:*:*) + echo pdp10-unknown-tenex + exit 0 ;; + KS10:TOPS-20:*:* | KL10:TOPS-20:*:* | TYPE4:TOPS-20:*:*) + echo pdp10-dec-tops20 + exit 0 ;; + XKL-1:TOPS-20:*:* | TYPE5:TOPS-20:*:*) + echo pdp10-xkl-tops20 + exit 0 ;; + *:TOPS-20:*:*) + echo pdp10-unknown-tops20 + exit 0 ;; + *:ITS:*:*) + echo pdp10-unknown-its + exit 0 ;; + i*86:XTS-300:*:STOP) + echo ${UNAME_MACHINE}-unknown-stop + exit 0 ;; + i*86:atheos:*:*) + echo ${UNAME_MACHINE}-unknown-atheos + exit 0 ;; +esac + +#echo '(No uname command or uname output not recognized.)' 1>&2 +#echo "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" 1>&2 + +eval $set_cc_for_build +cat >$dummy.c < +# include +#endif +main () +{ +#if defined (sony) +#if defined (MIPSEB) + /* BFD wants "bsd" instead of "newsos". Perhaps BFD should be changed, + I don't know.... */ + printf ("mips-sony-bsd\n"); exit (0); +#else +#include + printf ("m68k-sony-newsos%s\n", +#ifdef NEWSOS4 + "4" +#else + "" +#endif + ); exit (0); +#endif +#endif + +#if defined (__arm) && defined (__acorn) && defined (__unix) + printf ("arm-acorn-riscix"); exit (0); +#endif + +#if defined (hp300) && !defined (hpux) + printf ("m68k-hp-bsd\n"); exit (0); +#endif + +#if defined (NeXT) +#if !defined (__ARCHITECTURE__) +#define __ARCHITECTURE__ "m68k" +#endif + int version; + version=`(hostinfo | sed -n 's/.*NeXT Mach \([0-9]*\).*/\1/p') 2>/dev/null`; + if (version < 4) + printf ("%s-next-nextstep%d\n", __ARCHITECTURE__, version); + else + printf ("%s-next-openstep%d\n", __ARCHITECTURE__, version); + exit (0); +#endif + +#if defined (MULTIMAX) || defined (n16) +#if defined (UMAXV) + printf ("ns32k-encore-sysv\n"); exit (0); +#else +#if defined (CMU) + printf ("ns32k-encore-mach\n"); exit (0); +#else + printf ("ns32k-encore-bsd\n"); exit (0); +#endif +#endif +#endif + +#if defined (__386BSD__) + printf ("i386-pc-bsd\n"); exit (0); +#endif + +#if defined (sequent) +#if defined (i386) + printf ("i386-sequent-dynix\n"); exit (0); +#endif +#if defined (ns32000) + printf ("ns32k-sequent-dynix\n"); exit (0); +#endif +#endif + +#if defined (_SEQUENT_) + struct utsname un; + + uname(&un); + + if (strncmp(un.version, "V2", 2) == 0) { + printf ("i386-sequent-ptx2\n"); exit (0); + } + if (strncmp(un.version, "V1", 2) == 0) { /* XXX is V1 correct? */ + printf ("i386-sequent-ptx1\n"); exit (0); + } + printf ("i386-sequent-ptx\n"); exit (0); + +#endif + +#if defined (vax) +# if !defined (ultrix) +# include +# if defined (BSD) +# if BSD == 43 + printf ("vax-dec-bsd4.3\n"); exit (0); +# else +# if BSD == 199006 + printf ("vax-dec-bsd4.3reno\n"); exit (0); +# else + printf ("vax-dec-bsd\n"); exit (0); +# endif +# endif +# else + printf ("vax-dec-bsd\n"); exit (0); +# endif +# else + printf ("vax-dec-ultrix\n"); exit (0); +# endif +#endif + +#if defined (alliant) && defined (i860) + printf ("i860-alliant-bsd\n"); exit (0); +#endif + + exit (1); +} +EOF + +$CC_FOR_BUILD $dummy.c -o $dummy 2>/dev/null && ./$dummy && rm -f $dummy.c $dummy && exit 0 +rm -f $dummy.c $dummy + +# Apollos put the system type in the environment. + +test -d /usr/apollo && { echo ${ISP}-apollo-${SYSTYPE}; exit 0; } + +# Convex versions that predate uname can use getsysinfo(1) + +if [ -x /usr/convex/getsysinfo ] +then + case `getsysinfo -f cpu_type` in + c1*) + echo c1-convex-bsd + exit 0 ;; + c2*) + if getsysinfo -f scalar_acc + then echo c32-convex-bsd + else echo c2-convex-bsd + fi + exit 0 ;; + c34*) + echo c34-convex-bsd + exit 0 ;; + c38*) + echo c38-convex-bsd + exit 0 ;; + c4*) + echo c4-convex-bsd + exit 0 ;; + esac +fi + +cat >&2 < in order to provide the needed +information to handle your system. + +config.guess timestamp = $timestamp + +uname -m = `(uname -m) 2>/dev/null || echo unknown` +uname -r = `(uname -r) 2>/dev/null || echo unknown` +uname -s = `(uname -s) 2>/dev/null || echo unknown` +uname -v = `(uname -v) 2>/dev/null || echo unknown` + +/usr/bin/uname -p = `(/usr/bin/uname -p) 2>/dev/null` +/bin/uname -X = `(/bin/uname -X) 2>/dev/null` + +hostinfo = `(hostinfo) 2>/dev/null` +/bin/universe = `(/bin/universe) 2>/dev/null` +/usr/bin/arch -k = `(/usr/bin/arch -k) 2>/dev/null` +/bin/arch = `(/bin/arch) 2>/dev/null` +/usr/bin/oslevel = `(/usr/bin/oslevel) 2>/dev/null` +/usr/convex/getsysinfo = `(/usr/convex/getsysinfo) 2>/dev/null` + +UNAME_MACHINE = ${UNAME_MACHINE} +UNAME_RELEASE = ${UNAME_RELEASE} +UNAME_SYSTEM = ${UNAME_SYSTEM} +UNAME_VERSION = ${UNAME_VERSION} +EOF + +exit 1 + +# Local variables: +# eval: (add-hook 'write-file-hooks 'time-stamp) +# time-stamp-start: "timestamp='" +# time-stamp-format: "%:y-%02m-%02d" +# time-stamp-end: "'" +# End: diff --git a/config.sub b/config.sub new file mode 100755 index 0000000..c840398 --- /dev/null +++ b/config.sub @@ -0,0 +1,1450 @@ +#! /bin/sh +# Configuration validation subroutine script. +# Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, +# 2000, 2001, 2002 Free Software Foundation, Inc. + +timestamp='2002-02-01' + +# This file is (in principle) common to ALL GNU software. +# The presence of a machine in this file suggests that SOME GNU software +# can handle that machine. It does not imply ALL GNU software can. +# +# This file is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, +# Boston, MA 02111-1307, USA. + +# As a special exception to the GNU General Public License, if you +# distribute this file as part of a program that contains a +# configuration script generated by Autoconf, you may include it under +# the same distribution terms that you use for the rest of that program. + +# Please send patches to . Submit a context +# diff and a properly formatted ChangeLog entry. +# +# Configuration subroutine to validate and canonicalize a configuration type. +# Supply the specified configuration type as an argument. +# If it is invalid, we print an error message on stderr and exit with code 1. +# Otherwise, we print the canonical config type on stdout and succeed. + +# This file is supposed to be the same for all GNU packages +# and recognize all the CPU types, system types and aliases +# that are meaningful with *any* GNU software. +# Each package is responsible for reporting which valid configurations +# it does not support. The user should be able to distinguish +# a failure to support a valid configuration from a meaningless +# configuration. + +# The goal of this file is to map all the various variations of a given +# machine specification into a single specification in the form: +# CPU_TYPE-MANUFACTURER-OPERATING_SYSTEM +# or in some cases, the newer four-part form: +# CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM +# It is wrong to echo any other type of specification. + +me=`echo "$0" | sed -e 's,.*/,,'` + +usage="\ +Usage: $0 [OPTION] CPU-MFR-OPSYS + $0 [OPTION] ALIAS + +Canonicalize a configuration name. + +Operation modes: + -h, --help print this help, then exit + -t, --time-stamp print date of last modification, then exit + -v, --version print version number, then exit + +Report bugs and patches to ." + +version="\ +GNU config.sub ($timestamp) + +Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001 +Free Software Foundation, Inc. + +This is free software; see the source for copying conditions. There is NO +warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE." + +help=" +Try \`$me --help' for more information." + +# Parse command line +while test $# -gt 0 ; do + case $1 in + --time-stamp | --time* | -t ) + echo "$timestamp" ; exit 0 ;; + --version | -v ) + echo "$version" ; exit 0 ;; + --help | --h* | -h ) + echo "$usage"; exit 0 ;; + -- ) # Stop option processing + shift; break ;; + - ) # Use stdin as input. + break ;; + -* ) + echo "$me: invalid option $1$help" + exit 1 ;; + + *local*) + # First pass through any local machine types. + echo $1 + exit 0;; + + * ) + break ;; + esac +done + +case $# in + 0) echo "$me: missing argument$help" >&2 + exit 1;; + 1) ;; + *) echo "$me: too many arguments$help" >&2 + exit 1;; +esac + +# Separate what the user gave into CPU-COMPANY and OS or KERNEL-OS (if any). +# Here we must recognize all the valid KERNEL-OS combinations. +maybe_os=`echo $1 | sed 's/^\(.*\)-\([^-]*-[^-]*\)$/\2/'` +case $maybe_os in + nto-qnx* | linux-gnu* | storm-chaos* | os2-emx* | windows32-*) + os=-$maybe_os + basic_machine=`echo $1 | sed 's/^\(.*\)-\([^-]*-[^-]*\)$/\1/'` + ;; + *) + basic_machine=`echo $1 | sed 's/-[^-]*$//'` + if [ $basic_machine != $1 ] + then os=`echo $1 | sed 's/.*-/-/'` + else os=; fi + ;; +esac + +### Let's recognize common machines as not being operating systems so +### that things like config.sub decstation-3100 work. We also +### recognize some manufacturers as not being operating systems, so we +### can provide default operating systems below. +case $os in + -sun*os*) + # Prevent following clause from handling this invalid input. + ;; + -dec* | -mips* | -sequent* | -encore* | -pc532* | -sgi* | -sony* | \ + -att* | -7300* | -3300* | -delta* | -motorola* | -sun[234]* | \ + -unicom* | -ibm* | -next | -hp | -isi* | -apollo | -altos* | \ + -convergent* | -ncr* | -news | -32* | -3600* | -3100* | -hitachi* |\ + -c[123]* | -convex* | -sun | -crds | -omron* | -dg | -ultra | -tti* | \ + -harris | -dolphin | -highlevel | -gould | -cbm | -ns | -masscomp | \ + -apple | -axis) + os= + basic_machine=$1 + ;; + -sim | -cisco | -oki | -wec | -winbond) + os= + basic_machine=$1 + ;; + -scout) + ;; + -wrs) + os=-vxworks + basic_machine=$1 + ;; + -chorusos*) + os=-chorusos + basic_machine=$1 + ;; + -chorusrdb) + os=-chorusrdb + basic_machine=$1 + ;; + -hiux*) + os=-hiuxwe2 + ;; + -sco5) + os=-sco3.2v5 + basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` + ;; + -sco4) + os=-sco3.2v4 + basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` + ;; + -sco3.2.[4-9]*) + os=`echo $os | sed -e 's/sco3.2./sco3.2v/'` + basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` + ;; + -sco3.2v[4-9]*) + # Don't forget version if it is 3.2v4 or newer. + basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` + ;; + -sco*) + os=-sco3.2v2 + basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` + ;; + -udk*) + basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` + ;; + -isc) + os=-isc2.2 + basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` + ;; + -clix*) + basic_machine=clipper-intergraph + ;; + -isc*) + basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` + ;; + -lynx*) + os=-lynxos + ;; + -ptx*) + basic_machine=`echo $1 | sed -e 's/86-.*/86-sequent/'` + ;; + -windowsnt*) + os=`echo $os | sed -e 's/windowsnt/winnt/'` + ;; + -psos*) + os=-psos + ;; + -mint | -mint[0-9]*) + basic_machine=m68k-atari + os=-mint + ;; +esac + +# Decode aliases for certain CPU-COMPANY combinations. +case $basic_machine in + # Recognize the basic CPU types without company name. + # Some are omitted here because they have special meanings below. + 1750a | 580 \ + | a29k \ + | alpha | alphaev[4-8] | alphaev56 | alphaev6[78] | alphapca5[67] \ + | alpha64 | alpha64ev[4-8] | alpha64ev56 | alpha64ev6[78] | alpha64pca5[67] \ + | arc | arm | arm[bl]e | arme[lb] | armv[2345] | armv[345][lb] | avr \ + | c4x | clipper \ + | d10v | d30v | dsp16xx \ + | fr30 \ + | h8300 | h8500 | hppa | hppa1.[01] | hppa2.0 | hppa2.0[nw] | hppa64 \ + | i370 | i860 | i960 | ia64 \ + | m32r | m68000 | m68k | m88k | mcore \ + | mips16 | mips64 | mips64el | mips64orion | mips64orionel \ + | mips64vr4100 | mips64vr4100el | mips64vr4300 \ + | mips64vr4300el | mips64vr5000 | mips64vr5000el \ + | mipsbe | mipseb | mipsel | mipsle | mipstx39 | mipstx39el \ + | mipsisa32 \ + | mn10200 | mn10300 \ + | ns16k | ns32k \ + | openrisc | or32 \ + | pdp10 | pdp11 | pj | pjl \ + | powerpc | powerpc64 | powerpc64le | powerpcle | ppcbe \ + | pyramid \ + | sh | sh[34] | sh[34]eb | shbe | shle | sh64 \ + | sparc | sparc64 | sparclet | sparclite | sparcv9 | sparcv9b \ + | strongarm \ + | tahoe | thumb | tic80 | tron \ + | v850 | v850e \ + | we32k \ + | x86 | xscale | xstormy16 | xtensa \ + | z8k) + basic_machine=$basic_machine-unknown + ;; + m6811 | m68hc11 | m6812 | m68hc12) + # Motorola 68HC11/12. + basic_machine=$basic_machine-unknown + os=-none + ;; + m88110 | m680[12346]0 | m683?2 | m68360 | m5200 | v70 | w65 | z8k) + ;; + + # We use `pc' rather than `unknown' + # because (1) that's what they normally are, and + # (2) the word "unknown" tends to confuse beginning users. + i*86 | x86_64) + basic_machine=$basic_machine-pc + ;; + # Object if more than one company name word. + *-*-*) + echo Invalid configuration \`$1\': machine \`$basic_machine\' not recognized 1>&2 + exit 1 + ;; + # Recognize the basic CPU types with company name. + 580-* \ + | a29k-* \ + | alpha-* | alphaev[4-8]-* | alphaev56-* | alphaev6[78]-* \ + | alpha64-* | alpha64ev[4-8]-* | alpha64ev56-* | alpha64ev6[78]-* \ + | alphapca5[67]-* | alpha64pca5[67]-* | arc-* \ + | arm-* | armbe-* | armle-* | armv*-* \ + | avr-* \ + | bs2000-* \ + | c[123]* | c30-* | [cjt]90-* | c54x-* \ + | clipper-* | cray2-* | cydra-* \ + | d10v-* | d30v-* \ + | elxsi-* \ + | f30[01]-* | f700-* | fr30-* | fx80-* \ + | h8300-* | h8500-* \ + | hppa-* | hppa1.[01]-* | hppa2.0-* | hppa2.0[nw]-* | hppa64-* \ + | i*86-* | i860-* | i960-* | ia64-* \ + | m32r-* \ + | m68000-* | m680[01234]0-* | m68360-* | m683?2-* | m68k-* \ + | m88110-* | m88k-* | mcore-* \ + | mips-* | mips16-* | mips64-* | mips64el-* | mips64orion-* \ + | mips64orionel-* | mips64vr4100-* | mips64vr4100el-* \ + | mips64vr4300-* | mips64vr4300el-* | mipsbe-* | mipseb-* \ + | mipsle-* | mipsel-* | mipstx39-* | mipstx39el-* \ + | none-* | np1-* | ns16k-* | ns32k-* \ + | orion-* \ + | pdp10-* | pdp11-* | pj-* | pjl-* | pn-* | power-* \ + | powerpc-* | powerpc64-* | powerpc64le-* | powerpcle-* | ppcbe-* \ + | pyramid-* \ + | romp-* | rs6000-* \ + | sh-* | sh[34]-* | sh[34]eb-* | shbe-* | shle-* | sh64-* \ + | sparc-* | sparc64-* | sparc86x-* | sparclite-* \ + | sparcv9-* | sparcv9b-* | strongarm-* | sv1-* \ + | t3e-* | tahoe-* | thumb-* | tic30-* | tic54x-* | tic80-* | tron-* \ + | v850-* | v850e-* | vax-* \ + | we32k-* \ + | x86-* | x86_64-* | xmp-* | xps100-* | xscale-* | xstormy16-* \ + | xtensa-* \ + | ymp-* \ + | z8k-*) + ;; + # Recognize the various machine names and aliases which stand + # for a CPU type and a company and sometimes even an OS. + 386bsd) + basic_machine=i386-unknown + os=-bsd + ;; + 3b1 | 7300 | 7300-att | att-7300 | pc7300 | safari | unixpc) + basic_machine=m68000-att + ;; + 3b*) + basic_machine=we32k-att + ;; + a29khif) + basic_machine=a29k-amd + os=-udi + ;; + adobe68k) + basic_machine=m68010-adobe + os=-scout + ;; + alliant | fx80) + basic_machine=fx80-alliant + ;; + altos | altos3068) + basic_machine=m68k-altos + ;; + am29k) + basic_machine=a29k-none + os=-bsd + ;; + amdahl) + basic_machine=580-amdahl + os=-sysv + ;; + amiga | amiga-*) + basic_machine=m68k-unknown + ;; + amigaos | amigados) + basic_machine=m68k-unknown + os=-amigaos + ;; + amigaunix | amix) + basic_machine=m68k-unknown + os=-sysv4 + ;; + apollo68) + basic_machine=m68k-apollo + os=-sysv + ;; + apollo68bsd) + basic_machine=m68k-apollo + os=-bsd + ;; + aux) + basic_machine=m68k-apple + os=-aux + ;; + balance) + basic_machine=ns32k-sequent + os=-dynix + ;; + convex-c1) + basic_machine=c1-convex + os=-bsd + ;; + convex-c2) + basic_machine=c2-convex + os=-bsd + ;; + convex-c32) + basic_machine=c32-convex + os=-bsd + ;; + convex-c34) + basic_machine=c34-convex + os=-bsd + ;; + convex-c38) + basic_machine=c38-convex + os=-bsd + ;; + cray | ymp) + basic_machine=ymp-cray + os=-unicos + ;; + cray2) + basic_machine=cray2-cray + os=-unicos + ;; + [cjt]90) + basic_machine=${basic_machine}-cray + os=-unicos + ;; + crds | unos) + basic_machine=m68k-crds + ;; + cris | cris-* | etrax*) + basic_machine=cris-axis + ;; + da30 | da30-*) + basic_machine=m68k-da30 + ;; + decstation | decstation-3100 | pmax | pmax-* | pmin | dec3100 | decstatn) + basic_machine=mips-dec + ;; + decsystem10* | dec10*) + basic_machine=pdp10-dec + os=-tops10 + ;; + decsystem20* | dec20*) + basic_machine=pdp10-dec + os=-tops20 + ;; + delta | 3300 | motorola-3300 | motorola-delta \ + | 3300-motorola | delta-motorola) + basic_machine=m68k-motorola + ;; + delta88) + basic_machine=m88k-motorola + os=-sysv3 + ;; + dpx20 | dpx20-*) + basic_machine=rs6000-bull + os=-bosx + ;; + dpx2* | dpx2*-bull) + basic_machine=m68k-bull + os=-sysv3 + ;; + ebmon29k) + basic_machine=a29k-amd + os=-ebmon + ;; + elxsi) + basic_machine=elxsi-elxsi + os=-bsd + ;; + encore | umax | mmax) + basic_machine=ns32k-encore + ;; + es1800 | OSE68k | ose68k | ose | OSE) + basic_machine=m68k-ericsson + os=-ose + ;; + fx2800) + basic_machine=i860-alliant + ;; + genix) + basic_machine=ns32k-ns + ;; + gmicro) + basic_machine=tron-gmicro + os=-sysv + ;; + go32) + basic_machine=i386-pc + os=-go32 + ;; + h3050r* | hiux*) + basic_machine=hppa1.1-hitachi + os=-hiuxwe2 + ;; + h8300hms) + basic_machine=h8300-hitachi + os=-hms + ;; + h8300xray) + basic_machine=h8300-hitachi + os=-xray + ;; + h8500hms) + basic_machine=h8500-hitachi + os=-hms + ;; + harris) + basic_machine=m88k-harris + os=-sysv3 + ;; + hp300-*) + basic_machine=m68k-hp + ;; + hp300bsd) + basic_machine=m68k-hp + os=-bsd + ;; + hp300hpux) + basic_machine=m68k-hp + os=-hpux + ;; + hp3k9[0-9][0-9] | hp9[0-9][0-9]) + basic_machine=hppa1.0-hp + ;; + hp9k2[0-9][0-9] | hp9k31[0-9]) + basic_machine=m68000-hp + ;; + hp9k3[2-9][0-9]) + basic_machine=m68k-hp + ;; + hp9k6[0-9][0-9] | hp6[0-9][0-9]) + basic_machine=hppa1.0-hp + ;; + hp9k7[0-79][0-9] | hp7[0-79][0-9]) + basic_machine=hppa1.1-hp + ;; + hp9k78[0-9] | hp78[0-9]) + # FIXME: really hppa2.0-hp + basic_machine=hppa1.1-hp + ;; + hp9k8[67]1 | hp8[67]1 | hp9k80[24] | hp80[24] | hp9k8[78]9 | hp8[78]9 | hp9k893 | hp893) + # FIXME: really hppa2.0-hp + basic_machine=hppa1.1-hp + ;; + hp9k8[0-9][13679] | hp8[0-9][13679]) + basic_machine=hppa1.1-hp + ;; + hp9k8[0-9][0-9] | hp8[0-9][0-9]) + basic_machine=hppa1.0-hp + ;; + hppa-next) + os=-nextstep3 + ;; + hppaosf) + basic_machine=hppa1.1-hp + os=-osf + ;; + hppro) + basic_machine=hppa1.1-hp + os=-proelf + ;; + i370-ibm* | ibm*) + basic_machine=i370-ibm + ;; +# I'm not sure what "Sysv32" means. Should this be sysv3.2? + i*86v32) + basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'` + os=-sysv32 + ;; + i*86v4*) + basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'` + os=-sysv4 + ;; + i*86v) + basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'` + os=-sysv + ;; + i*86sol2) + basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'` + os=-solaris2 + ;; + i386mach) + basic_machine=i386-mach + os=-mach + ;; + i386-vsta | vsta) + basic_machine=i386-unknown + os=-vsta + ;; + iris | iris4d) + basic_machine=mips-sgi + case $os in + -irix*) + ;; + *) + os=-irix4 + ;; + esac + ;; + isi68 | isi) + basic_machine=m68k-isi + os=-sysv + ;; + m88k-omron*) + basic_machine=m88k-omron + ;; + magnum | m3230) + basic_machine=mips-mips + os=-sysv + ;; + merlin) + basic_machine=ns32k-utek + os=-sysv + ;; + mingw32) + basic_machine=i386-pc + os=-mingw32 + ;; + miniframe) + basic_machine=m68000-convergent + ;; + *mint | -mint[0-9]* | *MiNT | *MiNT[0-9]*) + basic_machine=m68k-atari + os=-mint + ;; + mipsel*-linux*) + basic_machine=mipsel-unknown + os=-linux-gnu + ;; + mips*-linux*) + basic_machine=mips-unknown + os=-linux-gnu + ;; + mips3*-*) + basic_machine=`echo $basic_machine | sed -e 's/mips3/mips64/'` + ;; + mips3*) + basic_machine=`echo $basic_machine | sed -e 's/mips3/mips64/'`-unknown + ;; + mmix*) + basic_machine=mmix-knuth + os=-mmixware + ;; + monitor) + basic_machine=m68k-rom68k + os=-coff + ;; + morphos) + basic_machine=powerpc-unknown + os=-morphos + ;; + msdos) + basic_machine=i386-pc + os=-msdos + ;; + mvs) + basic_machine=i370-ibm + os=-mvs + ;; + ncr3000) + basic_machine=i486-ncr + os=-sysv4 + ;; + netbsd386) + basic_machine=i386-unknown + os=-netbsd + ;; + netwinder) + basic_machine=armv4l-rebel + os=-linux + ;; + news | news700 | news800 | news900) + basic_machine=m68k-sony + os=-newsos + ;; + news1000) + basic_machine=m68030-sony + os=-newsos + ;; + news-3600 | risc-news) + basic_machine=mips-sony + os=-newsos + ;; + necv70) + basic_machine=v70-nec + os=-sysv + ;; + next | m*-next ) + basic_machine=m68k-next + case $os in + -nextstep* ) + ;; + -ns2*) + os=-nextstep2 + ;; + *) + os=-nextstep3 + ;; + esac + ;; + nh3000) + basic_machine=m68k-harris + os=-cxux + ;; + nh[45]000) + basic_machine=m88k-harris + os=-cxux + ;; + nindy960) + basic_machine=i960-intel + os=-nindy + ;; + mon960) + basic_machine=i960-intel + os=-mon960 + ;; + nonstopux) + basic_machine=mips-compaq + os=-nonstopux + ;; + np1) + basic_machine=np1-gould + ;; + nsr-tandem) + basic_machine=nsr-tandem + ;; + op50n-* | op60c-*) + basic_machine=hppa1.1-oki + os=-proelf + ;; + or32 | or32-*) + basic_machine=or32-unknown + os=-coff + ;; + OSE68000 | ose68000) + basic_machine=m68000-ericsson + os=-ose + ;; + os68k) + basic_machine=m68k-none + os=-os68k + ;; + pa-hitachi) + basic_machine=hppa1.1-hitachi + os=-hiuxwe2 + ;; + paragon) + basic_machine=i860-intel + os=-osf + ;; + pbd) + basic_machine=sparc-tti + ;; + pbb) + basic_machine=m68k-tti + ;; + pc532 | pc532-*) + basic_machine=ns32k-pc532 + ;; + pentium | p5 | k5 | k6 | nexgen | viac3) + basic_machine=i586-pc + ;; + pentiumpro | p6 | 6x86 | athlon) + basic_machine=i686-pc + ;; + pentiumii | pentium2) + basic_machine=i686-pc + ;; + pentium-* | p5-* | k5-* | k6-* | nexgen-* | viac3-*) + basic_machine=i586-`echo $basic_machine | sed 's/^[^-]*-//'` + ;; + pentiumpro-* | p6-* | 6x86-* | athlon-*) + basic_machine=i686-`echo $basic_machine | sed 's/^[^-]*-//'` + ;; + pentiumii-* | pentium2-*) + basic_machine=i686-`echo $basic_machine | sed 's/^[^-]*-//'` + ;; + pn) + basic_machine=pn-gould + ;; + power) basic_machine=power-ibm + ;; + ppc) basic_machine=powerpc-unknown + ;; + ppc-*) basic_machine=powerpc-`echo $basic_machine | sed 's/^[^-]*-//'` + ;; + ppcle | powerpclittle | ppc-le | powerpc-little) + basic_machine=powerpcle-unknown + ;; + ppcle-* | powerpclittle-*) + basic_machine=powerpcle-`echo $basic_machine | sed 's/^[^-]*-//'` + ;; + ppc64) basic_machine=powerpc64-unknown + ;; + ppc64-*) basic_machine=powerpc64-`echo $basic_machine | sed 's/^[^-]*-//'` + ;; + ppc64le | powerpc64little | ppc64-le | powerpc64-little) + basic_machine=powerpc64le-unknown + ;; + ppc64le-* | powerpc64little-*) + basic_machine=powerpc64le-`echo $basic_machine | sed 's/^[^-]*-//'` + ;; + ps2) + basic_machine=i386-ibm + ;; + pw32) + basic_machine=i586-unknown + os=-pw32 + ;; + rom68k) + basic_machine=m68k-rom68k + os=-coff + ;; + rm[46]00) + basic_machine=mips-siemens + ;; + rtpc | rtpc-*) + basic_machine=romp-ibm + ;; + s390 | s390-*) + basic_machine=s390-ibm + ;; + s390x | s390x-*) + basic_machine=s390x-ibm + ;; + sa29200) + basic_machine=a29k-amd + os=-udi + ;; + sequent) + basic_machine=i386-sequent + ;; + sh) + basic_machine=sh-hitachi + os=-hms + ;; + sparclite-wrs | simso-wrs) + basic_machine=sparclite-wrs + os=-vxworks + ;; + sps7) + basic_machine=m68k-bull + os=-sysv2 + ;; + spur) + basic_machine=spur-unknown + ;; + st2000) + basic_machine=m68k-tandem + ;; + stratus) + basic_machine=i860-stratus + os=-sysv4 + ;; + sun2) + basic_machine=m68000-sun + ;; + sun2os3) + basic_machine=m68000-sun + os=-sunos3 + ;; + sun2os4) + basic_machine=m68000-sun + os=-sunos4 + ;; + sun3os3) + basic_machine=m68k-sun + os=-sunos3 + ;; + sun3os4) + basic_machine=m68k-sun + os=-sunos4 + ;; + sun4os3) + basic_machine=sparc-sun + os=-sunos3 + ;; + sun4os4) + basic_machine=sparc-sun + os=-sunos4 + ;; + sun4sol2) + basic_machine=sparc-sun + os=-solaris2 + ;; + sun3 | sun3-*) + basic_machine=m68k-sun + ;; + sun4) + basic_machine=sparc-sun + ;; + sun386 | sun386i | roadrunner) + basic_machine=i386-sun + ;; + sv1) + basic_machine=sv1-cray + os=-unicos + ;; + symmetry) + basic_machine=i386-sequent + os=-dynix + ;; + t3e) + basic_machine=t3e-cray + os=-unicos + ;; + tic54x | c54x*) + basic_machine=tic54x-unknown + os=-coff + ;; + tx39) + basic_machine=mipstx39-unknown + ;; + tx39el) + basic_machine=mipstx39el-unknown + ;; + toad1) + basic_machine=pdp10-xkl + os=-tops20 + ;; + tower | tower-32) + basic_machine=m68k-ncr + ;; + udi29k) + basic_machine=a29k-amd + os=-udi + ;; + ultra3) + basic_machine=a29k-nyu + os=-sym1 + ;; + v810 | necv810) + basic_machine=v810-nec + os=-none + ;; + vaxv) + basic_machine=vax-dec + os=-sysv + ;; + vms) + basic_machine=vax-dec + os=-vms + ;; + vpp*|vx|vx-*) + basic_machine=f301-fujitsu + ;; + vxworks960) + basic_machine=i960-wrs + os=-vxworks + ;; + vxworks68) + basic_machine=m68k-wrs + os=-vxworks + ;; + vxworks29k) + basic_machine=a29k-wrs + os=-vxworks + ;; + w65*) + basic_machine=w65-wdc + os=-none + ;; + w89k-*) + basic_machine=hppa1.1-winbond + os=-proelf + ;; + windows32) + basic_machine=i386-pc + os=-windows32-msvcrt + ;; + xmp) + basic_machine=xmp-cray + os=-unicos + ;; + xps | xps100) + basic_machine=xps100-honeywell + ;; + z8k-*-coff) + basic_machine=z8k-unknown + os=-sim + ;; + none) + basic_machine=none-none + os=-none + ;; + +# Here we handle the default manufacturer of certain CPU types. It is in +# some cases the only manufacturer, in others, it is the most popular. + w89k) + basic_machine=hppa1.1-winbond + ;; + op50n) + basic_machine=hppa1.1-oki + ;; + op60c) + basic_machine=hppa1.1-oki + ;; + mips) + if [ x$os = x-linux-gnu ]; then + basic_machine=mips-unknown + else + basic_machine=mips-mips + fi + ;; + romp) + basic_machine=romp-ibm + ;; + rs6000) + basic_machine=rs6000-ibm + ;; + vax) + basic_machine=vax-dec + ;; + pdp10) + # there are many clones, so DEC is not a safe bet + basic_machine=pdp10-unknown + ;; + pdp11) + basic_machine=pdp11-dec + ;; + we32k) + basic_machine=we32k-att + ;; + sh3 | sh4 | sh3eb | sh4eb) + basic_machine=sh-unknown + ;; + sh64) + basic_machine=sh64-unknown + ;; + sparc | sparcv9 | sparcv9b) + basic_machine=sparc-sun + ;; + cydra) + basic_machine=cydra-cydrome + ;; + orion) + basic_machine=orion-highlevel + ;; + orion105) + basic_machine=clipper-highlevel + ;; + mac | mpw | mac-mpw) + basic_machine=m68k-apple + ;; + pmac | pmac-mpw) + basic_machine=powerpc-apple + ;; + c4x*) + basic_machine=c4x-none + os=-coff + ;; + *-unknown) + # Make sure to match an already-canonicalized machine name. + ;; + *) + echo Invalid configuration \`$1\': machine \`$basic_machine\' not recognized 1>&2 + exit 1 + ;; +esac + +# Here we canonicalize certain aliases for manufacturers. +case $basic_machine in + *-digital*) + basic_machine=`echo $basic_machine | sed 's/digital.*/dec/'` + ;; + *-commodore*) + basic_machine=`echo $basic_machine | sed 's/commodore.*/cbm/'` + ;; + *) + ;; +esac + +# Decode manufacturer-specific aliases for certain operating systems. + +if [ x"$os" != x"" ] +then +case $os in + # First match some system type aliases + # that might get confused with valid system types. + # -solaris* is a basic system type, with this one exception. + -solaris1 | -solaris1.*) + os=`echo $os | sed -e 's|solaris1|sunos4|'` + ;; + -solaris) + os=-solaris2 + ;; + -svr4*) + os=-sysv4 + ;; + -unixware*) + os=-sysv4.2uw + ;; + -gnu/linux*) + os=`echo $os | sed -e 's|gnu/linux|linux-gnu|'` + ;; + # First accept the basic system types. + # The portable systems comes first. + # Each alternative MUST END IN A *, to match a version number. + # -sysv* is not here because it comes later, after sysvr4. + -gnu* | -bsd* | -mach* | -minix* | -genix* | -ultrix* | -irix* \ + | -*vms* | -sco* | -esix* | -isc* | -aix* | -sunos | -sunos[34]*\ + | -hpux* | -unos* | -osf* | -luna* | -dgux* | -solaris* | -sym* \ + | -amigaos* | -amigados* | -msdos* | -newsos* | -unicos* | -aof* \ + | -aos* \ + | -nindy* | -vxsim* | -vxworks* | -ebmon* | -hms* | -mvs* \ + | -clix* | -riscos* | -uniplus* | -iris* | -rtu* | -xenix* \ + | -hiux* | -386bsd* | -netbsd* | -openbsd* | -freebsd* | -riscix* \ + | -lynxos* | -bosx* | -nextstep* | -cxux* | -aout* | -elf* | -oabi* \ + | -ptx* | -coff* | -ecoff* | -winnt* | -domain* | -vsta* \ + | -udi* | -eabi* | -lites* | -ieee* | -go32* | -aux* \ + | -chorusos* | -chorusrdb* \ + | -cygwin* | -pe* | -psos* | -moss* | -proelf* | -rtems* \ + | -mingw32* | -linux-gnu* | -uxpv* | -beos* | -mpeix* | -udk* \ + | -interix* | -uwin* | -rhapsody* | -darwin* | -opened* \ + | -openstep* | -oskit* | -conix* | -pw32* | -nonstopux* \ + | -storm-chaos* | -tops10* | -tenex* | -tops20* | -its* \ + | -os2* | -vos* | -palmos* | -uclinux* | -nucleus* | -morphos*) + # Remember, each alternative MUST END IN *, to match a version number. + ;; + -qnx*) + case $basic_machine in + x86-* | i*86-*) + ;; + *) + os=-nto$os + ;; + esac + ;; + -nto*) + os=-nto-qnx + ;; + -sim | -es1800* | -hms* | -xray | -os68k* | -none* | -v88r* \ + | -windows* | -osx | -abug | -netware* | -os9* | -beos* \ + | -macos* | -mpw* | -magic* | -mmixware* | -mon960* | -lnews*) + ;; + -mac*) + os=`echo $os | sed -e 's|mac|macos|'` + ;; + -linux*) + os=`echo $os | sed -e 's|linux|linux-gnu|'` + ;; + -sunos5*) + os=`echo $os | sed -e 's|sunos5|solaris2|'` + ;; + -sunos6*) + os=`echo $os | sed -e 's|sunos6|solaris3|'` + ;; + -opened*) + os=-openedition + ;; + -wince*) + os=-wince + ;; + -osfrose*) + os=-osfrose + ;; + -osf*) + os=-osf + ;; + -utek*) + os=-bsd + ;; + -dynix*) + os=-bsd + ;; + -acis*) + os=-aos + ;; + -atheos*) + os=-atheos + ;; + -386bsd) + os=-bsd + ;; + -ctix* | -uts*) + os=-sysv + ;; + -ns2 ) + os=-nextstep2 + ;; + -nsk*) + os=-nsk + ;; + # Preserve the version number of sinix5. + -sinix5.*) + os=`echo $os | sed -e 's|sinix|sysv|'` + ;; + -sinix*) + os=-sysv4 + ;; + -triton*) + os=-sysv3 + ;; + -oss*) + os=-sysv3 + ;; + -svr4) + os=-sysv4 + ;; + -svr3) + os=-sysv3 + ;; + -sysvr4) + os=-sysv4 + ;; + # This must come after -sysvr4. + -sysv*) + ;; + -ose*) + os=-ose + ;; + -es1800*) + os=-ose + ;; + -xenix) + os=-xenix + ;; + -*mint | -mint[0-9]* | -*MiNT | -MiNT[0-9]*) + os=-mint + ;; + -none) + ;; + *) + # Get rid of the `-' at the beginning of $os. + os=`echo $os | sed 's/[^-]*-//'` + echo Invalid configuration \`$1\': system \`$os\' not recognized 1>&2 + exit 1 + ;; +esac +else + +# Here we handle the default operating systems that come with various machines. +# The value should be what the vendor currently ships out the door with their +# machine or put another way, the most popular os provided with the machine. + +# Note that if you're going to try to match "-MANUFACTURER" here (say, +# "-sun"), then you have to tell the case statement up towards the top +# that MANUFACTURER isn't an operating system. Otherwise, code above +# will signal an error saying that MANUFACTURER isn't an operating +# system, and we'll never get to this point. + +case $basic_machine in + *-acorn) + os=-riscix1.2 + ;; + arm*-rebel) + os=-linux + ;; + arm*-semi) + os=-aout + ;; + # This must come before the *-dec entry. + pdp10-*) + os=-tops20 + ;; + pdp11-*) + os=-none + ;; + *-dec | vax-*) + os=-ultrix4.2 + ;; + m68*-apollo) + os=-domain + ;; + i386-sun) + os=-sunos4.0.2 + ;; + m68000-sun) + os=-sunos3 + # This also exists in the configure program, but was not the + # default. + # os=-sunos4 + ;; + m68*-cisco) + os=-aout + ;; + mips*-cisco) + os=-elf + ;; + mips*-*) + os=-elf + ;; + or32-*) + os=-coff + ;; + *-tti) # must be before sparc entry or we get the wrong os. + os=-sysv3 + ;; + sparc-* | *-sun) + os=-sunos4.1.1 + ;; + *-be) + os=-beos + ;; + *-ibm) + os=-aix + ;; + *-wec) + os=-proelf + ;; + *-winbond) + os=-proelf + ;; + *-oki) + os=-proelf + ;; + *-hp) + os=-hpux + ;; + *-hitachi) + os=-hiux + ;; + i860-* | *-att | *-ncr | *-altos | *-motorola | *-convergent) + os=-sysv + ;; + *-cbm) + os=-amigaos + ;; + *-dg) + os=-dgux + ;; + *-dolphin) + os=-sysv3 + ;; + m68k-ccur) + os=-rtu + ;; + m88k-omron*) + os=-luna + ;; + *-next ) + os=-nextstep + ;; + *-sequent) + os=-ptx + ;; + *-crds) + os=-unos + ;; + *-ns) + os=-genix + ;; + i370-*) + os=-mvs + ;; + *-next) + os=-nextstep3 + ;; + *-gould) + os=-sysv + ;; + *-highlevel) + os=-bsd + ;; + *-encore) + os=-bsd + ;; + *-sgi) + os=-irix + ;; + *-siemens) + os=-sysv4 + ;; + *-masscomp) + os=-rtu + ;; + f30[01]-fujitsu | f700-fujitsu) + os=-uxpv + ;; + *-rom68k) + os=-coff + ;; + *-*bug) + os=-coff + ;; + *-apple) + os=-macos + ;; + *-atari*) + os=-mint + ;; + *) + os=-none + ;; +esac +fi + +# Here we handle the case where we know the os, and the CPU type, but not the +# manufacturer. We pick the logical manufacturer. +vendor=unknown +case $basic_machine in + *-unknown) + case $os in + -riscix*) + vendor=acorn + ;; + -sunos*) + vendor=sun + ;; + -aix*) + vendor=ibm + ;; + -beos*) + vendor=be + ;; + -hpux*) + vendor=hp + ;; + -mpeix*) + vendor=hp + ;; + -hiux*) + vendor=hitachi + ;; + -unos*) + vendor=crds + ;; + -dgux*) + vendor=dg + ;; + -luna*) + vendor=omron + ;; + -genix*) + vendor=ns + ;; + -mvs* | -opened*) + vendor=ibm + ;; + -ptx*) + vendor=sequent + ;; + -vxsim* | -vxworks*) + vendor=wrs + ;; + -aux*) + vendor=apple + ;; + -hms*) + vendor=hitachi + ;; + -mpw* | -macos*) + vendor=apple + ;; + -*mint | -mint[0-9]* | -*MiNT | -MiNT[0-9]*) + vendor=atari + ;; + -vos*) + vendor=stratus + ;; + esac + basic_machine=`echo $basic_machine | sed "s/unknown/$vendor/"` + ;; +esac + +echo $basic_machine$os +exit 0 + +# Local variables: +# eval: (add-hook 'write-file-hooks 'time-stamp) +# time-stamp-start: "timestamp='" +# time-stamp-format: "%:y-%02m-%02d" +# time-stamp-end: "'" +# End: diff --git a/config/Makefile b/config/Makefile new file mode 100644 index 0000000..977fff9 --- /dev/null +++ b/config/Makefile @@ -0,0 +1,48 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## ## +## Makefile for config directory ## +## ## +########################################################################### +TOP=.. +DIRNAME=config + +CONFIGS = config.in vc_config_make_rules-dist +FILES=Makefile common_make_rules vc_common_make_rules \ + test_make_rules project.mak project_config_check.mak \ + system.sh make_system.mak \ + $(CONFIGS) +ALL_DIRS = modules systems + +include $(TOP)/config/common_make_rules + diff --git a/config/common_make_rules b/config/common_make_rules new file mode 100644 index 0000000..ef2775c --- /dev/null +++ b/config/common_make_rules @@ -0,0 +1,67 @@ + ########################################################-*-mode:Makefile-*- + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Various People ## + ## : Reorganised (and probably broken) ## + ## : by Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: June 1997 ## + ## --------------------------------------------------------------------- ## + ## Default Makefile rules includede everywhere. ## + ## ## + ########################################################################### + +# This is the default rule +all: $(ALL) .sub_directories + @ : Do nothing but shut up make + +# Include generated system description + +include $(TOP)/config/system.mak + +ifeq ($(SYSTEM_LOADED),) + MACHINETYPE=unknown + OSTYPE=unknown + OSREV= +endif + +# Include project specific rules + +-include $(TOP)/config/project.mak + +# Include installation specific information + +-include $(TOP)/config/config + +# indirect to shared rule sets directory + +include $(EST)/config/rules/common_make_rules.mak diff --git a/config/config.in b/config/config.in new file mode 100644 index 0000000..d7a28e5 --- /dev/null +++ b/config/config.in @@ -0,0 +1,76 @@ +########################################################-*-mode:Makefile-*- +## ## +## Festival: local configuration file ## +## ## +########################################################################### +## +## Specific config file for local installation +## + +########################################################################### +## Which speech tools to use + +EST=$(TOP)/../speech_tools + +########################################################################### +## Where the festival tree will be installed. +## +## The default is that festival will remain where it is compiled. +## +## You may need to set this explicitly if automounter or NFS +## side effects cause problems + +FESTIVAL_HOME := $(shell (cd $(TOP); pwd)) + +########################################################################### +## Feature selection. +## +## Select modules to include. + +## Non Free PSOLA synthesis. This isn't distributed with festival because +## of a patent, if you have src/modules/diphone/di_psolaTM.cc you can +## include this feature. +# INCLUDE_PSOLA_TM=1 + +## Support for TCL. So that festival may eval TCL commands and TCL may eval +## festival commands. This was added to support the CSLU toolkit but +## others may want it too. +# INCLUDE_TCL=1 + +########################################################################### +## Take most settings from speech tools. + +include $(EST)/config/config + +########################################################################### +## Add any extra modules you wish to include + +## These sub modules are *optional* and unless you know what they are +## you probabaly don't want them or need them. They are typically +## new code that isn't yet stable yet and being used for research or +## old code left in for compatibility for some users + +## Experimental UniSyn, metrical tree, phonological structure matching +## code +# ALSO_INCLUDE += UniSyn_phonology UniSyn_selection +## Cluster unit selection code as described in "Building Voices in +## Festival", again experimental and suitable for research purposes only. +ALSO_INCLUDE += clunits clustergen MultiSyn + +## NITECH and Tokyo Institute of Technologies HTS support +ALSO_INCLUDE += hts_engine + +## Old diphone code that will be delete, left in only for some +## compatibility +# ALSO_INCLUDE += diphone + +## Other (non-Edinburgh) modules may also be specified here (e.g. OGI code), + +ALSO_INCLUDE += + +########################################################################### +## +## Describe your local system below by redefining things defined +## in config/configs/default.mak. + + diff --git a/config/make_system.mak b/config/make_system.mak new file mode 100644 index 0000000..5acf40a --- /dev/null +++ b/config/make_system.mak @@ -0,0 +1,44 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Thu Oct 2 1997 ## + ## -------------------------------------------------------------------- ## + ## Guess what kind of system we are on. ## + ## ## + ########################################################################### + +system.mak : config + @echo Check system type >&2 + @/bin/sh $(TOP)/config/system.sh $(TOP)/config/systems > system.mak + diff --git a/config/modules/Makefile b/config/modules/Makefile new file mode 100644 index 0000000..b950f75 --- /dev/null +++ b/config/modules/Makefile @@ -0,0 +1,50 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## ## +## Makefile for config directory ## +## ## +########################################################################### +TOP=../.. +DIRNAME=config/modules + +RULESETS = efence.mak dmalloc.mak \ + psola_tm.mak editline.mak tcl.mak \ + freebsd16_audio.mak irix_audio.mak linux16_audio.mak \ + sun16_audio.mak win32_audio.mak macosx_audio.mak \ + mplayer_audio.mak nas_audio.mak esd_audio.mak native_audio.mak \ + siod.mak wagon.mak scfg.mak wfst.mak ols.mak debugging.mak + +FILES = Makefile descriptions $(RULESETS) + +include $(TOP)/config/common_make_rules + diff --git a/config/modules/debugging.mak b/config/modules/debugging.mak new file mode 100644 index 0000000..30f5db7 --- /dev/null +++ b/config/modules/debugging.mak @@ -0,0 +1,50 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Turn on various debugging facilities. ## + ## ## + ########################################################################### + +ifndef INCLUDE_DEBUGGING + INCLUDE_DEBUGGING=1 +endif + + +MOD_DESC_DEBUGGING=Compile in some debugging facilities + + +MODULE_DEFINES += -DEST_DEBUGGING + + diff --git a/config/modules/descriptions b/config/modules/descriptions new file mode 100644 index 0000000..5b60ed4 --- /dev/null +++ b/config/modules/descriptions @@ -0,0 +1,14 @@ + + ########################################################################### + ## ## + ## Descriptions of uninteresting modules. ## + ## ## + ########################################################################### + +desc_asr="(From EST) Speech recognition code" +desc_native_audio="(from EST) Native audio module for your system" +desc_ols="(from EST) Ordinary Least Squares support" +desc_scfg="(from EST) Stochastic context free grammars" +desc_siod="(from EST) Scheme In One Defun" +desc_wagon="(from EST) Wagon CART tree system" +desc_wfst="(from EST) Weighted Finite State Automata" diff --git a/config/modules/dmalloc.mak b/config/modules/dmalloc.mak new file mode 100644 index 0000000..63e4ada --- /dev/null +++ b/config/modules/dmalloc.mak @@ -0,0 +1,54 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Wed Jun 3 1998 ## + ## -------------------------------------------------------------------- ## + ## Link in the dmalloc library and include the header. ## + ## ## + ########################################################################### + +ifndef INCLUDE_DMALLOC + INCLUDE_DMALLOC=1 +endif + +MOD_DESC_DMALLOC=Compile with debugging malloc library + +DEBUG_LIBS += -L$(DMALLOC_LIB) -ldmalloc + +ifdef DMALLOC_INCLUDE + DEBUG_DEFINES += -I $(DMALLOC_INCLUDE) +endif + +DEBUG_DEFINES += -DINCLUDE_DMALLOC + diff --git a/config/modules/editline.mak b/config/modules/editline.mak new file mode 100644 index 0000000..fe83c37 --- /dev/null +++ b/config/modules/editline.mak @@ -0,0 +1,52 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Alan W Black (awb@cstr.ed.ac.uk) ## + ## Date: December 1998 ## + ## -------------------------------------------------------------------- ## + ## Command line editor based on editline ## + ## ## + ########################################################################### + +INCLUDE_EDITLINE=1 + +MOD_DESC_EDITLINE=Use editline for command line editing and history + +IO_DEFINES += -DSUPPORT_EDITLINE $(MODULE_EDITLINE_OPTIONS:%=-DEDITLINE_%) +MODULE_LIBS += $(TERMCAPLIB) + +ifeq ($(DIRNAME),siod) + CSRCS := $(CSRCS) el_complete.c editline.c el_sys_unix.c +endif + + diff --git a/config/modules/efence.mak b/config/modules/efence.mak new file mode 100644 index 0000000..0a01e4f --- /dev/null +++ b/config/modules/efence.mak @@ -0,0 +1,46 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Wed May 27 1998 ## + ## -------------------------------------------------------------------- ## + ## Defenitions for whan compiling with efence. ## + ## ## + ########################################################################### + +INCLUDE_EFENCE=1 + +MOD_DESC_EFENCE=Compile with efence memory error detection package + +DEBUG_LIBS += -L$(EFENCE_LIB) -lefence + diff --git a/config/modules/esd_audio.mak b/config/modules/esd_audio.mak new file mode 100644 index 0000000..e113ffc --- /dev/null +++ b/config/modules/esd_audio.mak @@ -0,0 +1,49 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Wed May 27 1998 ## + ## -------------------------------------------------------------------- ## + ## Defenitions for ESD audio support. ## + ## ## + ########################################################################### + +INCLUDE_ESD_AUDIO=1 + +MOD_DESC_ESD_AUDIO=(from EST) Use ESD Audio + +AUDIO_DEFINES += -DSUPPORT_ESD +AUDIO_INCLUDES += -I$(ESD_INCLUDE) +MODULE_LIBS += -L$(ESD_LIB) -lesd -laudiofile + + diff --git a/config/modules/freebsd16_audio.mak b/config/modules/freebsd16_audio.mak new file mode 100644 index 0000000..e1bfc99 --- /dev/null +++ b/config/modules/freebsd16_audio.mak @@ -0,0 +1,46 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Wed May 27 1998 ## + ## -------------------------------------------------------------------- ## + ## Defenitions for Freebsd 16 bit audio support. ## + ## ## + ########################################################################### + +INCLUDE_FREEBSD16_AUDIO=1 + +MOD_DESC_FREEBSD16_AUDIO=(from EST) Native audio module for FreeBSD systems + +AUDIO_DEFINES += -DSUPPORT_FREEBSD16 + diff --git a/config/modules/irix_audio.mak b/config/modules/irix_audio.mak new file mode 100644 index 0000000..6d08ed0 --- /dev/null +++ b/config/modules/irix_audio.mak @@ -0,0 +1,46 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Wed May 27 1998 ## + ## -------------------------------------------------------------------- ## + ## Defenitions for Irix audio support. ## + ## ## + ########################################################################### + +INCLUDE_IRIX_AUDIO=1 + +MOD_DESC_IRIX_AUDIO=(from EST) Native audio module for Irix systems + +AUDIO_DEFINES += -DSUPPORT_IRIX + diff --git a/config/modules/linux16_audio.mak b/config/modules/linux16_audio.mak new file mode 100644 index 0000000..e19ea82 --- /dev/null +++ b/config/modules/linux16_audio.mak @@ -0,0 +1,57 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Wed May 27 1998 ## + ## -------------------------------------------------------------------- ## + ## Defenitions for Linux 16 bit audio support. ## + ## ## + ########################################################################### + + +INCLUDE_LINUX16_AUDIO=1 + +MOD_DESC_LINUX16_AUDIO=(from EST) Native audio module for Linux systems + +ifeq ($(LINUXAUDIO),alsa) + AUDIO_DEFINES += -DSUPPORT_ALSALINUX + MODULE_LIBS += -lasound +endif + +ifeq ($(LINUXAUDIO),none) + AUDIO_DEFINES += -DSUPPORT_VOXWARE +endif + +ifdef INCLUDE_JAVA_CPP + MODULE_LIBS += -lpthread +endif diff --git a/config/modules/macosx_audio.mak b/config/modules/macosx_audio.mak new file mode 100644 index 0000000..3c6b48d --- /dev/null +++ b/config/modules/macosx_audio.mak @@ -0,0 +1,16 @@ + ########################################################################### + ## ## + ## Author: Brian Foley (bfoley@compsoc.nuigalway.ie) ## + ## Date: Wed Feb 17 2004 ## + ## -------------------------------------------------------------------- ## + ## Definitions for MacOS X audio support. ## + ## ## + ########################################################################### + +INCLUDE_MACOSX_AUDIO=1 + +MOD_DESC_MACOSX_AUDIO=(from EST) CoreAudio audio module for MacOS X systems + +AUDIO_DEFINES += -DSUPPORT_MACOSX_AUDIO + +MODULE_LIBS += -framework CoreAudio -framework AudioUnit -framework AudioToolbox -framework Carbon diff --git a/config/modules/mplayer_audio.mak b/config/modules/mplayer_audio.mak new file mode 100644 index 0000000..cfd20b1 --- /dev/null +++ b/config/modules/mplayer_audio.mak @@ -0,0 +1,46 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Wed May 27 1998 ## + ## -------------------------------------------------------------------- ## + ## Defenitions for Mplayer audio support. ## + ## ## + ########################################################################### + +INCLUDE_MPLAYER_AUDIO=1 + +MOD_DESC_MPLAYER_AUDIO=(from EST) Audio module for calling windows mplayer + +AUDIO_DEFINES += -DSUPPORT_MPLAYER + diff --git a/config/modules/nas_audio.mak b/config/modules/nas_audio.mak new file mode 100644 index 0000000..04740e3 --- /dev/null +++ b/config/modules/nas_audio.mak @@ -0,0 +1,49 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Wed May 27 1998 ## + ## -------------------------------------------------------------------- ## + ## Defenitions for NAS audio support. ## + ## ## + ########################################################################### + +INCLUDE_NAS_AUDIO=1 + +MOD_DESC_NAS_AUDIO=(from EST) Use Network Audio + +AUDIO_DEFINES += -DSUPPORT_NAS +AUDIO_INCLUDES += -I$(NAS_INCLUDE) +MODULE_LIBS += -L$(NAS_LIB) -laudio -L$(X11_LIB) -lX11 -lXt + + diff --git a/config/modules/native_audio.mak b/config/modules/native_audio.mak new file mode 100644 index 0000000..f893963 --- /dev/null +++ b/config/modules/native_audio.mak @@ -0,0 +1,45 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Dummy module to document native audio selection. ## + ## ## + ########################################################################### + + +INCLUDE_NATIVE_AUDIO=1 + +MOD_DESC_NATIVE_AUDIO=(from EST) Native audio module for your system + + diff --git a/config/modules/ols.mak b/config/modules/ols.mak new file mode 100644 index 0000000..9dae697 --- /dev/null +++ b/config/modules/ols.mak @@ -0,0 +1,46 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Mon Jun 1 1998 ## + ## -------------------------------------------------------------------- ## + ## NOthing special to do. ## + ## ## + ########################################################################### + +INCLUDE_OLS=1 + +MOD_DESC_OLS=(from EST) Ordinary Least Squares support + + + diff --git a/config/modules/psola_tm.mak b/config/modules/psola_tm.mak new file mode 100644 index 0000000..748f575 --- /dev/null +++ b/config/modules/psola_tm.mak @@ -0,0 +1,47 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Wed May 27 1998 ## + ## -------------------------------------------------------------------- ## + ## Defenitions for psola(tm) audio support. ## + ## ## + ########################################################################### + +INCLUDE_PSOLA_TM=1 + +MOD_DESC_PSOLA_TM=Include PSOLA(tm) synthesis code. + +MODULE_DIPHONE_DEFINES += -DSUPPORT_PSOLA_TM $(MODULE_PSOLA_TM_OPTIONS:%=-DPSOLA_TM_%) + + diff --git a/config/modules/scfg.mak b/config/modules/scfg.mak new file mode 100644 index 0000000..5e5602c --- /dev/null +++ b/config/modules/scfg.mak @@ -0,0 +1,43 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Mon Jun 1 1998 ## + ## -------------------------------------------------------------------- ## + ## NOthing special to do. ## + ## ## + ########################################################################### + +INCLUDE_SCFG=1 + +MOD_DESC_SCFG=(from EST) Stochastic context free grammars diff --git a/config/modules/siod.mak b/config/modules/siod.mak new file mode 100644 index 0000000..9a74c39 --- /dev/null +++ b/config/modules/siod.mak @@ -0,0 +1,44 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Mon Jun 1 1998 ## + ## -------------------------------------------------------------------- ## + ## NOthing special to do. ## + ## ## + ########################################################################### + +INCLUDE_SIOD=1 + +MOD_DESC_SIOD=(from EST) Scheme In One Defun + diff --git a/config/modules/sun16_audio.mak b/config/modules/sun16_audio.mak new file mode 100644 index 0000000..f694552 --- /dev/null +++ b/config/modules/sun16_audio.mak @@ -0,0 +1,46 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Wed May 27 1998 ## + ## -------------------------------------------------------------------- ## + ## Defenitions for Sun 16 bit audio support. ## + ## ## + ########################################################################### + +INCLUDE_SUN16_AUDIO=1 + +MOD_DESC_SUN16_AUDIO=(from EST) Native audio module for Solaris systems + +AUDIO_DEFINES += -DSUPPORT_SUN16 + diff --git a/config/modules/tcl.mak b/config/modules/tcl.mak new file mode 100644 index 0000000..2778751 --- /dev/null +++ b/config/modules/tcl.mak @@ -0,0 +1,49 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Wed May 27 1998 ## + ## -------------------------------------------------------------------- ## + ## Defenitions for TCL audio support. ## + ## ## + ########################################################################### + +INCLUDE_TCL=1 + +MOD_DESC_TCL=(from EST) Include TCL interface + +FESTIVAL_DEFINES += -DSUPPORT_TCL +FESTIVAL_INCLUDES += -I$(TCL_INCLUDE) +MODULE_LIBS += -L$(TCL_LIB) -ltcl7.6 + + diff --git a/config/modules/wagon.mak b/config/modules/wagon.mak new file mode 100644 index 0000000..fb52d6b --- /dev/null +++ b/config/modules/wagon.mak @@ -0,0 +1,46 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Mon Jun 1 1998 ## + ## -------------------------------------------------------------------- ## + ## NOthing special to do. ## + ## ## + ########################################################################### + +INCLUDE_WAGON=1 + +MOD_DESC_WAGON=(from EST) Wagon CART tree system + + + diff --git a/config/modules/wfst.mak b/config/modules/wfst.mak new file mode 100644 index 0000000..be89651 --- /dev/null +++ b/config/modules/wfst.mak @@ -0,0 +1,44 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Mon Jun 1 1998 ## + ## -------------------------------------------------------------------- ## + ## NOthing special to do. ## + ## ## + ########################################################################### + +INCLUDE_WFST=1 + +MOD_DESC_WFST=(from EST) Weighted Finite State Automata + diff --git a/config/modules/win32_audio.mak b/config/modules/win32_audio.mak new file mode 100644 index 0000000..5f0536a --- /dev/null +++ b/config/modules/win32_audio.mak @@ -0,0 +1,46 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Wed May 27 1998 ## + ## -------------------------------------------------------------------- ## + ## Defenitions for Win32 audio support. ## + ## ## + ########################################################################### + +INCLUDE_WIN32_AUDIO=1 + +MOD_DESC_WIN32_AUDIO=(from EST) Audio for Win32 systems + +AUDIO_DEFINES += -DSUPPORT_WIN32AUDIO + diff --git a/config/project.mak b/config/project.mak new file mode 100644 index 0000000..de0cee3 --- /dev/null +++ b/config/project.mak @@ -0,0 +1,118 @@ + ########################################################-*-mode:Makefile-*- + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Tue Oct 7 1997 ## + ## -------------------------------------------------------------------- ## + ## Description of festival. ## + ## ## + ########################################################################### + +PROJECT_NAME = Festival Speech Synthesis System +PROJECT_PREFIX = FESTIVAL +PROJECT_VERSION = 2.1 +PROJECT_DATE = November 2010 +PROJECT_STATE = release + + +# config files of projects we depend on + +PROJECT_OTHER_CONFIGS = $(EST)/config/config + +# Place to find the optional modules for this project. + +MODULE_DIRECTORY = $(TOP)/src/modules + +DISTRIBUTED_MODULES = \ + JAVA + +DEVELOPMENT_MODULES = RJC_SYNTHESIS \ + UNISYN_PHONOLOGY + +UTILITY_MODULES = + +ALL_REAL_MODULES = \ + $(DISTRIBUTED_MODULES) \ + $(DEVELOPMENT_MODULES) + +ALL_MODULES = \ + $(ALL_REAL_MODULES) \ + $(UTILITY_MODULES) + +# Place where programs are compiled + +PROJECT_MAIN_DIR=$(FESTIVAL_HOME)/src/main +PROJECT_SCRIPTS_DIR=$(FESTIVAL_HOME)/src/scripts + +# Where the main RCS tree is, probably only used within CSTR + +CENTRAL_DIR = $(LOCAL_REPOSITORY)/festival/code_base/festival + +# Libraries defined in this project + +PROJECT_LIBRARIES = Festival +PROJECT_LIBRARY_DIR_Festival = $(TOP)/src/lib +PROJECT_DEFAULT_LIBRARY = Festival + +# Libraries used from other projects + +REQUIRED_LIBRARIES = estools estbase eststring +REQUIRED_LIBRARY_DIR_estools = $(EST)/lib +REQUIRED_LIBRARY_DIR_estbase = $(EST)/lib +REQUIRED_LIBRARY_DIR_eststring = $(EST)/lib + +REQUIRED_MAKE_INCLUDE = $(EST)/make.include + +# Includes for this and related projects + +PROJECT_INCLUDES = -I$(TOP)/src/include -I$(EST)/include + +PROJECT_TEMPLATE_DIRS = src/arch/festival +PROJECT_TEMPLATE_DBS = $(TOP) $(EST) + +LIBRARY_TEMPLATE_DIRS_estools = $(LIBRARY_TEMPLATE_DIRS:%=$(EST)/%) + +JAVA_CLASS_LIBRARY = $(TOP)/src/lib/festival.jar + +JAVA_CLASSPATH= $(TOP)/lib/festival.jar:$(EST_HOME)/lib/est_$(EST_JAVA_VERSION).jar:$(SYSTEM_JAVA_CLASSPATH) + +PROJECT_JAVA_ROOT=$(TOP)/src/modules/java + +# Places to look for documentation + +DOCXX_DIRS = $(TOP)/src +MODULE_TO_DOCXX = perl $(TOP)/src/modules/utilities/extract_module_doc++.prl + +FTLIBDIR = $(FESTIVAL_HOME)/lib + + diff --git a/config/project_config_check.mak b/config/project_config_check.mak new file mode 100644 index 0000000..82c0041 --- /dev/null +++ b/config/project_config_check.mak @@ -0,0 +1,41 @@ + +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### + +ifndef INCLUDE_SIOD +.config_error:: FORCE + @echo "+--------------------------------------------------" + @echo "| Must compile speech tools with SIOD support" + @echo "+--------------------------------------------------" + @exit 1 +endif diff --git a/config/system.sh b/config/system.sh new file mode 100644 index 0000000..af07781 --- /dev/null +++ b/config/system.sh @@ -0,0 +1,115 @@ +#!/bin/sh + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Guess what kind of system we are on. ## + ## ## + ########################################################################### + +# Where the Makefile fragments live +SYSTEMS=$1 + +# Drop and _xxx from the end +OSTYPE=`uname -s | + sed -e '/^\([^_]*\).*/s//\1/' -e '/\//s///g'` + +# CPU, downcased, /s and some uninteresting details eliminated +MACHINETYPE=`{ mach || uname -m || echo unknown ; } 2>/dev/null | + tr ABCDEFGHIJKLMNOPQRSTUVWXYZ/ abcdefghijklmnopqrstuvwxyz_ | + sed -e 's/i[0-9]86/ix86/' \ + -e 's/sun4/sparc/' \ + -e 's/ip[0-9]*/ip/'\ + -e 's/ /_/g'\ + -e 's/9000_7../hp9000/' + ` + +# OR revision, only take first two numbers. +OSREV=`{ uname -r || echo ""; } 2> /dev/null | + sed -e 's/^\([^.]*\)\(\.[^-. ]*\).*/\1\2/'` + +# Sort out various flavours of Linux +if [ "$OSTYPE" = Linux ] + then + if [ -f "/etc/redhat-release" ] + then + OSTYPE=RedHatLinux + OSREV=`cat /etc/redhat-release | sed -e 's/[^0-9]*\([0-9.]*\).*/\1/'` + elif [ -f "/etc/debian_version" ] + then + OSTYPE=DebianGNULinux + OSREV=`cat /etc/debian_version` + else + # Generic unknown GNU/Linux system. + OSTYPE=Linux + fi +fi + +# Make sure we actually have a .mak file for it, otherwise fall back +# to sensible defaults (for example, kernel version and architecture +# are completely irrelevant on Linux) +if [ ! -f "${SYSTEMS}/${MACHINETYPE}_${OSTYPE}${OSREV}.mak" ]; then + if [ -f "${SYSTEMS}/${OSTYPE}${OSREV}.mak" ]; then + MACHINETYPE=unknown + elif [ -f "${SYSTEMS}/${MACHINETYPE}_${OSTYPE}.mak" ]; then + OSREV= + elif [ -f "${SYSTEMS}/unknown_${OSTYPE}.mak" ]; then + MACHINETYPE=unknown + OSREV= + elif [ "$OSTYPE" = "RedHatLinux" -o "$OSTYPE" = "DebianGNULinux" ]; then + MACHINETYPE=unknown + OSTYPE=Linux + OSREV= + elif [ "$OSTYPE" = "Darwin" ]; then + OSREV= + else + OSTYPE=unknown + OSREV= + fi +fi + +echo ' ###########################################################################' +echo ' ## This file is created automatically from your config file.' +echo ' ## Do not hand edit.' +echo ' ## Created:'`date` +echo ' ###########################################################################' + +echo '' + +echo "OSTYPE:=$OSTYPE" +echo "MACHINETYPE:=$MACHINETYPE" +echo "OSREV:=$OSREV" +echo "SYSTEM_LOADED:=1" + +exit 0 diff --git a/config/systems/DebianGNULinux.mak b/config/systems/DebianGNULinux.mak new file mode 100644 index 0000000..9a0a23e --- /dev/null +++ b/config/systems/DebianGNULinux.mak @@ -0,0 +1,41 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: David Huggins-Daines ## + ## -------------------------------------------------------------------- ## + ## Settings for Debian GNU/Linux distributions. ## + ## ## + ########################################################################### + +# Debian does not use termcap +OS_LIBS = -ldl -lncurses diff --git a/config/systems/Linux.mak b/config/systems/Linux.mak new file mode 100644 index 0000000..923c9a7 --- /dev/null +++ b/config/systems/Linux.mak @@ -0,0 +1,63 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Thu Oct 2 1997 ## + ## -------------------------------------------------------------------- ## + ## Settings for Linux. ## + ## ## + ########################################################################### + +include $(EST)/config/systems/default.mak + +DEFAULT_JAVA_HOME=/usr/lib/jdk-1.1.6 +JAVA=/usr/bin/java +JAVAC=/usr/bin/javac +JAVAH=/usr/bin/javah + +TCL_LIBRARY = -ltcl +OS_LIBS = -ldl + +## the native audio module for this type of system +NATIVE_AUDIO_MODULE = LINUX16 + +## echo -n doesn't work +ECHO_N = /usr/bin/printf "%s" + +GNUTEST=test + +## awk is gawk, so it does all we could desire and then more. +NAWK=awk + + + diff --git a/config/systems/Makefile b/config/systems/Makefile new file mode 100644 index 0000000..6b9fdaf --- /dev/null +++ b/config/systems/Makefile @@ -0,0 +1,107 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## ## +## Makefile for config directory ## +## ## +########################################################################### +TOP=../.. +DIRNAME=config/systems + +SYSTEMS = \ + Linux.mak \ + RedHatLinux.mak \ + DebianGNULinux.mak \ + alpha_Linux.mak \ + alpha_OSF1V4.0.mak \ + alpha_RedHatLinux.mak \ + hp9000_HP-UX.mak \ + hp9000_HP-UXB.10.mak \ + ip_IRIX.mak \ + ip_IRIX5.3.mak \ + ip_IRIX6.3.mak \ + ip_IRIX6.4.mak \ + ip_IRIX646.4.mak \ + ix86_CYGWIN1.0.mak \ + ix86_CYGWIN1.1.mak \ + ix86_CYGWIN1.3.mak \ + ix86_CYGWIN1.4.mak \ + ix86_CYGWIN1.5.mak \ + ix86_CYGWIN1.7.mak \ + ix86_CYGWIN20.1.mak \ + ix86_CYGWIN32.mak \ + ix86_CYGWIN324.0.mak \ + ix86_Darwin.mak \ + ix86_FreeBSD.mak \ + ix86_FreeBSD2.1.mak \ + ix86_FreeBSD2.2.mak \ + ix86_FreeBSD3.0.mak \ + ix86_FreeBSD3.1.mak \ + ix86_FreeBSD3.2.mak \ + ix86_FreeBSD3.3.mak \ + ix86_FreeBSD4.0.mak \ + ix86_OS22.mak \ + ix86_RedHatLinux4.0.mak \ + ix86_RedHatLinux4.1.mak \ + ix86_RedHatLinux4.2.mak \ + ix86_RedHatLinux5.0.mak \ + ix86_RedHatLinux5.1.mak \ + ix86_RedHatLinux5.2.mak \ + ix86_RedHatLinux6.0.mak \ + ix86_RedHatLinux6.1.mak \ + ix86_RedHatLinux6.2.mak \ + ix86_RedHatLinux7.0.mak \ + ix86_SunOS5.5.mak \ + ix86_SunOS5.6.mak \ + ix86_SunOS5.7.mak \ + ix86_SunOS5.8.mak \ + ix86_SunOS5.mak \ + rs6000_AIX4.1.mak \ + sparc_SunOS4.1.mak \ + sparc_SunOS4.mak \ + sparc_SunOS5.5.mak \ + sparc_SunOS5.6.mak \ + sparc_SunOS5.7.mak \ + sparc_SunOS5.8.mak \ + sparc_SunOS5.mak \ + unknown_DebianGNULinux.mak \ + unknown_Linux.mak \ + unknown_RedHatLinux.mak \ + power_macintosh_Darwin.mak \ + unknown_unknown.mak \ + x86_64_Darwin.mak \ + + +FILES = Makefile default.mak $(SYSTEMS) + +include $(TOP)/config/common_make_rules + diff --git a/config/systems/RedHatLinux.mak b/config/systems/RedHatLinux.mak new file mode 100644 index 0000000..a751a67 --- /dev/null +++ b/config/systems/RedHatLinux.mak @@ -0,0 +1,42 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Red Hat Linux distributions. ## + ## ## + ########################################################################### + + + + diff --git a/config/systems/alpha_Linux.mak b/config/systems/alpha_Linux.mak new file mode 100644 index 0000000..ffccade --- /dev/null +++ b/config/systems/alpha_Linux.mak @@ -0,0 +1,45 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Red Hat Linux distributions. ## + ## ## + ########################################################################### + +include $(EST)/config/systems/Linux.mak + +CFLAGS += -mieee + + + diff --git a/config/systems/alpha_OSF1V4.0.mak b/config/systems/alpha_OSF1V4.0.mak new file mode 100644 index 0000000..2b61df6 --- /dev/null +++ b/config/systems/alpha_OSF1V4.0.mak @@ -0,0 +1,48 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Alan W Black (awb@cstr.ed.ac.uk) ## + ## Date: Sat Oct 18 1997 ## + ## -------------------------------------------------------------------- ## + ## Settings for DEC Alpha OSF1 V4.0 ## + ## ## + ########################################################################### + +include $(EST)/config/systems/default.mak + +## echo -n doesn't work (well only sometimes ?) +ECHO_N = /bin/printf "%s" + + + + diff --git a/config/systems/alpha_RedHatLinux.mak b/config/systems/alpha_RedHatLinux.mak new file mode 100644 index 0000000..9c45ebf --- /dev/null +++ b/config/systems/alpha_RedHatLinux.mak @@ -0,0 +1,48 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Red Hat Linux on Alpha. ## + ## ## + ########################################################################### + +include $(EST)/config/systems/alpha_Linux.mak + +ifndef GCC + GCC=egcs +endif +EGCS_CC=gcc +EGCS_CXX=g++ + + diff --git a/config/systems/default.mak b/config/systems/default.mak new file mode 100644 index 0000000..69e8835 --- /dev/null +++ b/config/systems/default.mak @@ -0,0 +1,144 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Very common settings to avoid repetition. ## + ## ## + ########################################################################### + +########################################################################### +## Installation directories + +INSTALL_PREFIX=/usr/local + +BINDIR=$(INSTALL_PREFIX)/bin +LIBDIR=$(INSTALL_PREFIX)/lib +INCDIR=$(INSTALL_PREFIX)/include +MANDIR=$(INSTALL_PREFIX)/man + +########################################################################### +## Where the central RCS masters are stored. +## +## Used for development at CSTR, you can probably ignore it. + +LOCAL_REPOSITORY = + +########################################################################### +## Where to find Network Audio + +NAS_INCLUDE = /usr/X11R6/include +NAS_LIB = /usr/X11R6/lib + +########################################################################### +## Where to find Enlightenment Speech Demon + +ESD_INCLUDE = /usr/local/include +ESD_LIB = /usr/local/lib + +########################################################################### +## Where to find X11 + +X11_INCLUDE = /usr/X11R6/include +X11_LIB = /usr/X11R6/lib + +########################################################################### +## TCL support + +TCL_INCLUDE = /usr/local/include +TCL_LIB = /usr/local/lib +TCL_LIBRARY = -ltcl7.6 + +########################################################################### +## Efence library for malloc debugging + +EFENCE_LIB = /usr/local/lib + +########################################################################### +## Commands. + +## Must support -nt +GNUTEST = gnutest + +## +INSTALL_PROG = install + +## Used to index libraries +RANLIB = ranlib + +## echo without a newline +ECHO_N = echo -n + +## make depend for when we haven't specified a compiler +MAKE_DEPEND = makedepend $(INCLUDES) $(TEMPLATES) $(TEMPLATE_SPECIFIC) + +## Generic library building +BUILD_LIB =$(AR) cruv + +## generic library indexing +INDEX_LIB = $(RANLIB) + +## shrink executables +STRIP = strip + +## Useful sloth +DO_NOTHING = true +DO_NOTHING_ARGS = : + +## different types of awk. For our purposes gawk can be used for nawk +AWK = awk +NAWK = nawk + +## Perl. Not used in build, but we have some perl scripts. +PERL=/usr/bin/perl + +## Just in case someone has a broken test +TEST = test + +## Must understand -nt +GNUTEST = gnutest + +## Avoid clever RMs people may have on their path +RM = /bin/rm + +########################################################################### +## Arguments for DOC++ for creating documentation + +DOCXX = doc++_sane +DOCXX_ARGS = -a -f -B banner.inc -M sane -D 'SYSTEM "$(EST_HOME)/doc/sane.dtd"' + + +COMPILER_VERSION_COMMAND=true +JAVA_COMPILER_VERSION_COMMAND=true + +JAVA_SYSTEM_INCLUDES = -I$(JAVA_HOME)/include/genunix diff --git a/config/systems/hp9000_HP-UX.mak b/config/systems/hp9000_HP-UX.mak new file mode 100644 index 0000000..336ce20 --- /dev/null +++ b/config/systems/hp9000_HP-UX.mak @@ -0,0 +1,62 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## HP 9000 series. ## + ## ## + ########################################################################### + + +include $(EST)/config/systems/default.mak + +## Just guesses for what people are likely to have +GCC=gcc27 + +## Libraries needed for sockets based programs. +OS_LIBS = -lsocket -lnsl + +## the native audio module for this type of system +NATIVE_AUDIO_MODULE = + +## Official location for java +DEFAULT_JAVA_HOME=/usr/java1.1 + +GCC_SYSTEM_OPTIONS = + +## specific java files +JAVA_SYSTEM_INCLUDES = -I$(JAVA_HOME)/include/SOMETHING_HP_HERE + +## echo -n doesn't work +ECHO_N = /bin/printf "%s" + diff --git a/config/systems/hp9000_HP-UXB.10.mak b/config/systems/hp9000_HP-UXB.10.mak new file mode 100644 index 0000000..bf4ce46 --- /dev/null +++ b/config/systems/hp9000_HP-UXB.10.mak @@ -0,0 +1,41 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Specifics for this version of HP-UX. ## + ## ## + ########################################################################### + +include $(EST)/config/systems/hp9000_HP-UX.mak + diff --git a/config/systems/ip_IRIX.mak b/config/systems/ip_IRIX.mak new file mode 100644 index 0000000..3f458dc --- /dev/null +++ b/config/systems/ip_IRIX.mak @@ -0,0 +1,53 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh,UK ## +## Copyright (c) 1996 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## ## +## Author: Alan W Black ## +## Date: Nov 1997 ## +########################################################################### +## Settings for Irix ## +## ## +########################################################################### + +include $(EST)/config/systems/default.mak + +## the native audio module for this type of system +NATIVE_AUDIO_MODULE = IRIX + +## echo -n doesn't work (well only sometimes ?) +ECHO_N = /bin/printf "%s" + +## Doesn't have or need RANLIB +RANLIB = true + +## IRIX specific java include files. +JAVA_SYSTEM_INCLUDES = -I$(JAVA_HOME)/include/irix diff --git a/config/systems/ip_IRIX5.3.mak b/config/systems/ip_IRIX5.3.mak new file mode 100644 index 0000000..56ab4d2 --- /dev/null +++ b/config/systems/ip_IRIX5.3.mak @@ -0,0 +1,46 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh,UK ## +## Copyright (c) 1996 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## ## +## Author: Alan W Black ## +## Date: Nov 1997 ## +########################################################################### +## Settings for Irix 5.3. ## +## ## +########################################################################### + +include $(EST)/config/systems/ip_IRIX.mak + +## the native audio module for this type of system +NATIVE_AUDIO_MODULE = IRIX53 + + diff --git a/config/systems/ip_IRIX6.3.mak b/config/systems/ip_IRIX6.3.mak new file mode 100644 index 0000000..90d2514 --- /dev/null +++ b/config/systems/ip_IRIX6.3.mak @@ -0,0 +1,44 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh,UK ## +## Copyright (c) 1996 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## ## +## Author: Alan W Black ## +## Date: Nov 1997 ## +########################################################################### +## Settings for Irix 6.3. ## +## ## +########################################################################### + +include $(EST)/config/systems/ip_IRIX.mak + + + diff --git a/config/systems/ip_IRIX6.4.mak b/config/systems/ip_IRIX6.4.mak new file mode 100644 index 0000000..b453eb6 --- /dev/null +++ b/config/systems/ip_IRIX6.4.mak @@ -0,0 +1,44 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh,UK ## +## Copyright (c) 1996 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## ## +## Author: Alan W Black ## +## Date: Nov 1997 ## +########################################################################### +## Settings for Irix 6.4. ## +## ## +########################################################################### + +include $(EST)/config/systems/ip_IRIX.mak + + + diff --git a/config/systems/ip_IRIX646.4.mak b/config/systems/ip_IRIX646.4.mak new file mode 100644 index 0000000..b453eb6 --- /dev/null +++ b/config/systems/ip_IRIX646.4.mak @@ -0,0 +1,44 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh,UK ## +## Copyright (c) 1996 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## ## +## Author: Alan W Black ## +## Date: Nov 1997 ## +########################################################################### +## Settings for Irix 6.4. ## +## ## +########################################################################### + +include $(EST)/config/systems/ip_IRIX.mak + + + diff --git a/config/systems/ix86_CYGWIN1.0.mak b/config/systems/ix86_CYGWIN1.0.mak new file mode 100644 index 0000000..e71b4c8 --- /dev/null +++ b/config/systems/ix86_CYGWIN1.0.mak @@ -0,0 +1,47 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Cygnus gnuwin1.0 + ## ## + ########################################################################### + + +include $(EST)/config/systems/ix86_CYGWIN32.mak + +## Cygwin version of egcs has optimisatin problems with some files. + +HONOUR_NOOPT=1 + + diff --git a/config/systems/ix86_CYGWIN1.1.mak b/config/systems/ix86_CYGWIN1.1.mak new file mode 100644 index 0000000..d129bae --- /dev/null +++ b/config/systems/ix86_CYGWIN1.1.mak @@ -0,0 +1,47 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Cygnus gnuwin1.1 + ## ## + ########################################################################### + + +include $(EST)/config/systems/ix86_CYGWIN32.mak + +## Cygwin version of egcs has optimisatin problems with some files. + +HONOUR_NOOPT=1 + + diff --git a/config/systems/ix86_CYGWIN1.3.mak b/config/systems/ix86_CYGWIN1.3.mak new file mode 100644 index 0000000..413dc3d --- /dev/null +++ b/config/systems/ix86_CYGWIN1.3.mak @@ -0,0 +1,47 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Cygnus gnuwin1.3 + ## ## + ########################################################################### + + +include $(EST)/config/systems/ix86_CYGWIN32.mak + +## Cygwin version of egcs has optimisatin problems with some files. + +HONOUR_NOOPT=1 + + diff --git a/config/systems/ix86_CYGWIN1.4.mak b/config/systems/ix86_CYGWIN1.4.mak new file mode 100644 index 0000000..1254815 --- /dev/null +++ b/config/systems/ix86_CYGWIN1.4.mak @@ -0,0 +1,44 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Cygnus gnuwin32 v 4.0 (b19). ## + ## ## + ########################################################################### + + +include $(EST)/config/systems/ix86_CYGWIN32.mak + + + diff --git a/config/systems/ix86_CYGWIN1.5.mak b/config/systems/ix86_CYGWIN1.5.mak new file mode 100644 index 0000000..1254815 --- /dev/null +++ b/config/systems/ix86_CYGWIN1.5.mak @@ -0,0 +1,44 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Cygnus gnuwin32 v 4.0 (b19). ## + ## ## + ########################################################################### + + +include $(EST)/config/systems/ix86_CYGWIN32.mak + + + diff --git a/config/systems/ix86_CYGWIN1.7.mak b/config/systems/ix86_CYGWIN1.7.mak new file mode 100644 index 0000000..1254815 --- /dev/null +++ b/config/systems/ix86_CYGWIN1.7.mak @@ -0,0 +1,44 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Cygnus gnuwin32 v 4.0 (b19). ## + ## ## + ########################################################################### + + +include $(EST)/config/systems/ix86_CYGWIN32.mak + + + diff --git a/config/systems/ix86_CYGWIN20.1.mak b/config/systems/ix86_CYGWIN20.1.mak new file mode 100644 index 0000000..c3466fa --- /dev/null +++ b/config/systems/ix86_CYGWIN20.1.mak @@ -0,0 +1,47 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Cygnus gnuwin32 v 4.0 (b19). ## + ## ## + ########################################################################### + + +include $(EST)/config/systems/ix86_CYGWIN32.mak + +## Cygwin version of egcs has optimisatin problems with some files. + +HONOUR_NOOPT=1 + + diff --git a/config/systems/ix86_CYGWIN32.mak b/config/systems/ix86_CYGWIN32.mak new file mode 100644 index 0000000..3799244 --- /dev/null +++ b/config/systems/ix86_CYGWIN32.mak @@ -0,0 +1,61 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Cygwin32. ## + ## ## + ########################################################################### + +include $(EST)/config/systems/default.mak + +DEFAULT_JAVA_HOME=/usr/lib/jdk-1.1.6 + +gcc=gcc27 + +## the native audio module for this type of system +NATIVE_AUDIO_MODULE = WIN32 + +GNUTEST=test + +RM=rm + +## awk is gawk, so it does all we could desire and then more. +NAWK=awk + +## EGCS installs as gcc +EGCS_CC=gcc +EGCS_CXX=gcc + +OS_LIBS = -lwinmm -luser32 + diff --git a/config/systems/ix86_CYGWIN324.0.mak b/config/systems/ix86_CYGWIN324.0.mak new file mode 100644 index 0000000..1254815 --- /dev/null +++ b/config/systems/ix86_CYGWIN324.0.mak @@ -0,0 +1,44 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Cygnus gnuwin32 v 4.0 (b19). ## + ## ## + ########################################################################### + + +include $(EST)/config/systems/ix86_CYGWIN32.mak + + + diff --git a/config/systems/ix86_Darwin.mak b/config/systems/ix86_Darwin.mak new file mode 100644 index 0000000..74a5a7e --- /dev/null +++ b/config/systems/ix86_Darwin.mak @@ -0,0 +1,41 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Alan W Black (awb@cstr.ed.ac.uk) ## + ## Date: Aug 3 2006 ## + ## -------------------------------------------------------------------- ## + ## Settings for Apple Darwin. ## + ## Thanks to Brian West ## + ########################################################################### + +include $(EST)/config/systems/ix86_Darwin.mak diff --git a/config/systems/ix86_FreeBSD.mak b/config/systems/ix86_FreeBSD.mak new file mode 100644 index 0000000..7c18a8b --- /dev/null +++ b/config/systems/ix86_FreeBSD.mak @@ -0,0 +1,60 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Alan W Black (awb@cstr.ed.ac.uk) ## + ## Date: Fri Oct 3 1997 ## + ## -------------------------------------------------------------------- ## + ## Settings for FreeBSD. ## + ## ## + ########################################################################### + +include $(EST)/config/systems/default.mak + +## the native audio module for this type of system +NATIVE_AUDIO_MODULE = FREEBSD16 + +## echo -n doesn't work (well only sometimes ?) +ECHO_N = /usr/bin/printf "%s" + +NAWK=awk + +# GCC_MAKE_SHARED_LIB = ld -Bshareable -x -o XXX + +DEFAULT_JAVA_HOME=/usr/local/jdk + +JAVA=$(JAVA_HOME)/bin/java +JAVAC=$(JAVA_HOME)/bin/javac +JAVAH=$(JAVA_HOME)/bin/javah -jni +JAR=$(JAVA_HOME)/bin/jar cf0v + + diff --git a/config/systems/ix86_FreeBSD2.1.mak b/config/systems/ix86_FreeBSD2.1.mak new file mode 100644 index 0000000..c6ba80c --- /dev/null +++ b/config/systems/ix86_FreeBSD2.1.mak @@ -0,0 +1,47 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Alan W Black (awb@cstr.ed.ac.uk) ## + ## Date: Sat Oct 11 1997 ## + ## -------------------------------------------------------------------- ## + ## Settings for FreeBSD 2.2 ## + ## ## + ########################################################################### + + +include $(EST)/config/systems/ix86_FreeBSD.mak + +GCC=gcc26 + + + diff --git a/config/systems/ix86_FreeBSD2.2.mak b/config/systems/ix86_FreeBSD2.2.mak new file mode 100644 index 0000000..0630da7 --- /dev/null +++ b/config/systems/ix86_FreeBSD2.2.mak @@ -0,0 +1,47 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Alan W Black (awb@cstr.ed.ac.uk) ## + ## Date: Sat Oct 11 1997 ## + ## -------------------------------------------------------------------- ## + ## Settings for FreeBSD 2.2 ## + ## ## + ########################################################################### + + +include $(EST)/config/systems/ix86_FreeBSD.mak + +GCC=gcc27 + + + diff --git a/config/systems/ix86_FreeBSD3.0.mak b/config/systems/ix86_FreeBSD3.0.mak new file mode 100644 index 0000000..cd3e488 --- /dev/null +++ b/config/systems/ix86_FreeBSD3.0.mak @@ -0,0 +1,53 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Alan W Black (awb@cstr.ed.ac.uk) ## + ## Date: Wed Mar 10 1999 ## + ## -------------------------------------------------------------------- ## + ## Settings for FreeBSD 3.0 ## + ## ## + ########################################################################### + +include $(EST)/config/systems/ix86_FreeBSD.mak + +GCC=gcc27 + +JAVA_SYSTEM_INCLUDES = -I$(JAVA_HOME)/include -I$(JAVA_HOME)/include/freebsd +GCC=gcc27 + +JAVA=$(JAVA_HOME)/bin/java +JAVAC=$(JAVA_HOME)/bin/javac +JAVAH=$(JAVA_HOME)/bin/javah -jni +JAR=$(JAVA_HOME)/bin/jar cf0v + + diff --git a/config/systems/ix86_FreeBSD3.1.mak b/config/systems/ix86_FreeBSD3.1.mak new file mode 100644 index 0000000..d61f73b --- /dev/null +++ b/config/systems/ix86_FreeBSD3.1.mak @@ -0,0 +1,55 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Alan W Black (awb@cstr.ed.ac.uk) ## + ## Date: Wed Mar 10 1999 ## + ## -------------------------------------------------------------------- ## + ## Settings for FreeBSD 3.1 ## + ## ## + ########################################################################### + +# They went to ELF at this point, but everything seems fine as is + +include $(EST)/config/systems/ix86_FreeBSD.mak + +JAVA_SYSTEM_INCLUDES = -I$(JAVA_HOME)/include -I$(JAVA_HOME)/include/freebsd +GCC=gcc27 + +JAVA=$(JAVA_HOME)/bin/java +JAVAC=$(JAVA_HOME)/bin/javac +JAVAH=$(JAVA_HOME)/bin/javah -jni +JAR=$(JAVA_HOME)/bin/jar cf0v + +GCC_MAKE_SHARED_LIB = gcc -shared -o XXX + + diff --git a/config/systems/ix86_FreeBSD3.2.mak b/config/systems/ix86_FreeBSD3.2.mak new file mode 100644 index 0000000..9006aa6 --- /dev/null +++ b/config/systems/ix86_FreeBSD3.2.mak @@ -0,0 +1,46 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Alan W Black (awb@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for FreeBSD 3.2 ## + ## ## + ########################################################################### + +# They went to ELF at this point, but everything seems fine as is + +include $(EST)/config/systems/ix86_FreeBSD3.1.mak + + + + diff --git a/config/systems/ix86_FreeBSD3.3.mak b/config/systems/ix86_FreeBSD3.3.mak new file mode 100644 index 0000000..9f278e6 --- /dev/null +++ b/config/systems/ix86_FreeBSD3.3.mak @@ -0,0 +1,46 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Alan W Black (awb@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for FreeBSD 3.3 ## + ## ## + ########################################################################### + +# They went to ELF at this point, but everything seems fine as is + +include $(EST)/config/systems/ix86_FreeBSD3.1.mak + + + + diff --git a/config/systems/ix86_FreeBSD4.0.mak b/config/systems/ix86_FreeBSD4.0.mak new file mode 100644 index 0000000..05de2ad --- /dev/null +++ b/config/systems/ix86_FreeBSD4.0.mak @@ -0,0 +1,48 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Alan W Black (awb@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for FreeBSD 4.0 (not actually checked yet) ## + ## ## + ########################################################################### + +# They went to ELF at this point, but everything seems fine as is + +include $(EST)/config/systems/ix86_FreeBSD.mak + +JAVA_SYSTEM_INCLUDES = -I$(JAVA_HOME)/include -I$(JAVA_HOME)/include/freebsd +GCC=gcc27 + + + diff --git a/config/systems/ix86_OS22.mak b/config/systems/ix86_OS22.mak new file mode 100644 index 0000000..cf1cd66 --- /dev/null +++ b/config/systems/ix86_OS22.mak @@ -0,0 +1,67 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Samuel Audet ## + ## Date: Sat Aug 1 1998 ## + ## -------------------------------------------------------------------- ## + ## Settings for OS/2 Warp 3 and 4 using EMX/GCC ## + ## ## + ########################################################################### + + +include $(TOP)/config/systems/default.mak + +## custom settings needed +GCC=gcc27emx + +## Libraries needed for sockets based programs. +OS_LIBS = -lsocket + +## uses the path +RM = rm + +## the native audio module for this type of system +NATIVE_AUDIO_MODULE = OS2 + +## Must support -nt +GNUTEST = test + +## echo -n doesn't work (well only sometimes ?) +ECHO_N = echo -n + +## awk is gawk, so it does all we could desire and then more. +NAWK=awk + +## Used to index libraries +RANLIB = ar s + diff --git a/config/systems/ix86_RedHatLinux4.0.mak b/config/systems/ix86_RedHatLinux4.0.mak new file mode 100644 index 0000000..0c25c28 --- /dev/null +++ b/config/systems/ix86_RedHatLinux4.0.mak @@ -0,0 +1,46 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Red Hat Linux 4.0. ## + ## ## + ########################################################################### + +include $(EST)/config/systems/Linux.mak +include $(EST)/config/systems/RedHatLinux.mak + +GCC=gcc27 + + + diff --git a/config/systems/ix86_RedHatLinux4.1.mak b/config/systems/ix86_RedHatLinux4.1.mak new file mode 100644 index 0000000..7904296 --- /dev/null +++ b/config/systems/ix86_RedHatLinux4.1.mak @@ -0,0 +1,46 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Red Hat Linux 4.1. ## + ## ## + ########################################################################### + +include $(EST)/config/systems/Linux.mak +include $(EST)/config/systems/RedHatLinux.mak + +GCC=gcc27 + + + diff --git a/config/systems/ix86_RedHatLinux4.2.mak b/config/systems/ix86_RedHatLinux4.2.mak new file mode 100644 index 0000000..7904296 --- /dev/null +++ b/config/systems/ix86_RedHatLinux4.2.mak @@ -0,0 +1,46 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Red Hat Linux 4.1. ## + ## ## + ########################################################################### + +include $(EST)/config/systems/Linux.mak +include $(EST)/config/systems/RedHatLinux.mak + +GCC=gcc27 + + + diff --git a/config/systems/ix86_RedHatLinux5.0.mak b/config/systems/ix86_RedHatLinux5.0.mak new file mode 100644 index 0000000..f657065 --- /dev/null +++ b/config/systems/ix86_RedHatLinux5.0.mak @@ -0,0 +1,46 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Red Hat Linux 5.1. ## + ## ## + ########################################################################### + +include $(EST)/config/systems/Linux.mak +include $(EST)/config/systems/RedHatLinux.mak + +GCC=gcc27 + + + diff --git a/config/systems/ix86_RedHatLinux5.1.mak b/config/systems/ix86_RedHatLinux5.1.mak new file mode 100644 index 0000000..ef3e54e --- /dev/null +++ b/config/systems/ix86_RedHatLinux5.1.mak @@ -0,0 +1,47 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Red Hat Linux 5.1. ## + ## ## + ########################################################################### + +include $(EST)/config/systems/Linux.mak +include $(EST)/config/systems/RedHatLinux.mak + +ifndef GCC +GCC=egcs +endif +EGCS_CC=gcc +EGCS_CXX=g++ diff --git a/config/systems/ix86_RedHatLinux5.2.mak b/config/systems/ix86_RedHatLinux5.2.mak new file mode 100644 index 0000000..1aa82be --- /dev/null +++ b/config/systems/ix86_RedHatLinux5.2.mak @@ -0,0 +1,46 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Red Hat Linux 5.1. ## + ## ## + ########################################################################### + +include $(EST)/config/systems/ix86_RedHatLinux5.1.mak + +DEFAULT_JAVA_HOME=/usr/local/jdk-1.1.6 +JAVA=$(JAVA_HOME)/bin/java +JAVAC=$(JAVA_HOME)/bin/javac +JAVAH=$(JAVA_HOME)/bin/javah -jni +JAR=$(JAVA_HOME)/bin/jar cf0v diff --git a/config/systems/ix86_RedHatLinux6.0.mak b/config/systems/ix86_RedHatLinux6.0.mak new file mode 100644 index 0000000..d4d39c5 --- /dev/null +++ b/config/systems/ix86_RedHatLinux6.0.mak @@ -0,0 +1,53 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Red Hat Linux 6.0. ## + ## ## + ########################################################################### + +include $(EST)/config/systems/Linux.mak +include $(EST)/config/systems/RedHatLinux.mak + +ifndef GCC +GCC=egcs +endif +EGCS_CC=gcc +EGCS_CXX=g++ + +DEFAULT_JAVA_HOME=/usr/local/jdk1.1.6 +JAVA=$(JAVA_HOME)/bin/java +JAVAC=$(JAVA_HOME)/bin/javac +JAVAH=$(JAVA_HOME)/bin/javah -jni +JAR=$(JAVA_HOME)/bin/jar cf0v diff --git a/config/systems/ix86_RedHatLinux6.1.mak b/config/systems/ix86_RedHatLinux6.1.mak new file mode 100644 index 0000000..39d750b --- /dev/null +++ b/config/systems/ix86_RedHatLinux6.1.mak @@ -0,0 +1,54 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Red Hat Linux 6.1. (guess) ## + ## ## + ########################################################################### + +include $(EST)/config/systems/Linux.mak +include $(EST)/config/systems/RedHatLinux.mak + +ifndef GCC +GCC=egcs +endif +EGCS_CC=gcc +EGCS_CXX=g++ + +DEFAULT_JAVA_HOME=/usr/local/jdk1.1.6 +JAVA=$(JAVA_HOME)/bin/java +JAVAC=$(JAVA_HOME)/bin/javac +JAVAH=$(JAVA_HOME)/bin/javah -jni +JAR=$(JAVA_HOME)/bin/jar cf0v + diff --git a/config/systems/ix86_RedHatLinux6.2.mak b/config/systems/ix86_RedHatLinux6.2.mak new file mode 100644 index 0000000..5834f94 --- /dev/null +++ b/config/systems/ix86_RedHatLinux6.2.mak @@ -0,0 +1,41 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Red Hat Linux 6.2. (guess) ## + ## ## + ########################################################################### + +include $(EST)/config/systems/ix86_RedHatLinux6.1.mak + diff --git a/config/systems/ix86_RedHatLinux7.0.mak b/config/systems/ix86_RedHatLinux7.0.mak new file mode 100644 index 0000000..2a85c2b --- /dev/null +++ b/config/systems/ix86_RedHatLinux7.0.mak @@ -0,0 +1,43 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for Red Hat Linux 7.0. (guess) ## + ## ## + ########################################################################### + +include $(EST)/config/systems/ix86_RedHatLinux6.1.mak +GCC=gcc296 + + diff --git a/config/systems/ix86_SunOS5.5.mak b/config/systems/ix86_SunOS5.5.mak new file mode 100644 index 0000000..ed36800 --- /dev/null +++ b/config/systems/ix86_SunOS5.5.mak @@ -0,0 +1,43 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Settings for i386 SunOS 5.5 ## + ## ## + ## Like sparc except no mv8 ## + ## ## + ########################################################################### + +include $(EST)/config/systems/ix86_SunOS5.mak + + + diff --git a/config/systems/ix86_SunOS5.6.mak b/config/systems/ix86_SunOS5.6.mak new file mode 100644 index 0000000..41dd5bb --- /dev/null +++ b/config/systems/ix86_SunOS5.6.mak @@ -0,0 +1,43 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Settings for i386 SunOS 5.6 ## + ## ## + ## Like sparc but no mv8 ## + ## ## + ########################################################################### + +include $(EST)/config/systems/ix86_SunOS5.mak + + + diff --git a/config/systems/ix86_SunOS5.7.mak b/config/systems/ix86_SunOS5.7.mak new file mode 100644 index 0000000..c3ea801 --- /dev/null +++ b/config/systems/ix86_SunOS5.7.mak @@ -0,0 +1,43 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Settings for i386 SunOS 5.7 ## + ## ## + ## Like sparc but no mv8 ## + ## ## + ########################################################################### + +include $(EST)/config/systems/ix86_SunOS5.mak + + + diff --git a/config/systems/ix86_SunOS5.8.mak b/config/systems/ix86_SunOS5.8.mak new file mode 100644 index 0000000..cef476a --- /dev/null +++ b/config/systems/ix86_SunOS5.8.mak @@ -0,0 +1,41 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Alan W Black (awb@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for SunOS 5.8 ## + ## ## + ########################################################################### + +include $(EST)/config/systems/ix86_SunOS5.mak + diff --git a/config/systems/ix86_SunOS5.mak b/config/systems/ix86_SunOS5.mak new file mode 100644 index 0000000..d27e104 --- /dev/null +++ b/config/systems/ix86_SunOS5.mak @@ -0,0 +1,57 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Thu Oct 2 1997 ## + ## -------------------------------------------------------------------- ## + ## Settings for SunOS 5 on Intel platform. ## + ## ## + ########################################################################### + +include $(EST)/config/systems/default.mak + +DEFAULT_JAVA_HOME=/usr/java1.1 + +## Libraries needed for sockets based programs. +OS_LIBS = -lsocket -lnsl + +## the native audio module for this type of system +NATIVE_AUDIO_MODULE = SUN16 + +## echo -n doesn't work (well only sometimes ?) +ECHO_N = /bin/printf "%s" + +SYSTEM_JAVA_INCLUDES = -I$(JAVA_HOME)/include/solaris + + + diff --git a/config/systems/power_macintosh_Darwin.mak b/config/systems/power_macintosh_Darwin.mak new file mode 100644 index 0000000..f098509 --- /dev/null +++ b/config/systems/power_macintosh_Darwin.mak @@ -0,0 +1,55 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Alan W Black (awb@cstr.ed.ac.uk) ## + ## Date: Fri Oct 3 1997 ## + ## -------------------------------------------------------------------- ## + ## Settings for Apple Darwin. ## + ## ## + ########################################################################### + +include $(EST)/config/systems/default.mak + +## the native audio module for this type of system +NATIVE_AUDIO_MODULE = MACOSX + +## echo -n doesn't work (well only sometimes ?) +ECHO_N = /usr/bin/printf "%s" + +NAWK=awk + +GCC295=cc + +# GCC_MAKE_SHARED_LIB = ld -Bshareable -x -o XXX + + diff --git a/config/systems/rs6000_AIX4.1.mak b/config/systems/rs6000_AIX4.1.mak new file mode 100644 index 0000000..8dee60d --- /dev/null +++ b/config/systems/rs6000_AIX4.1.mak @@ -0,0 +1,42 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1999 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Alan W Black (awb@cs.cmu.edu) ## + ## -------------------------------------------------------------------- ## + ## For AIX 4.1 (from Stan Chen) ## + ## ## + ########################################################################### + +include $(EST)/config/systems/default.mak + +LINKFLAGS += -Wl,-bbigtoc diff --git a/config/systems/sparc_SunOS4.1.mak b/config/systems/sparc_SunOS4.1.mak new file mode 100644 index 0000000..9d35936 --- /dev/null +++ b/config/systems/sparc_SunOS4.1.mak @@ -0,0 +1,44 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Thu Oct 2 1997 ## + ## -------------------------------------------------------------------- ## + ## Settings for SunOS 4.1 ## + ## ## + ########################################################################### + +include $(EST)/config/systems/sparc_SunOS4.mak + + + diff --git a/config/systems/sparc_SunOS4.mak b/config/systems/sparc_SunOS4.mak new file mode 100644 index 0000000..cb202e5 --- /dev/null +++ b/config/systems/sparc_SunOS4.mak @@ -0,0 +1,46 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Thu Oct 2 1997 ## + ## -------------------------------------------------------------------- ## + ## Settings for SunOS 4. ## + ## ## + ########################################################################### + +include $(EST)/config/systems/default.mak + +## the native audio module for this type of system +NATIVE_AUDIO_MODULE = SUN16 + + diff --git a/config/systems/sparc_SunOS5.5.mak b/config/systems/sparc_SunOS5.5.mak new file mode 100644 index 0000000..1b25879 --- /dev/null +++ b/config/systems/sparc_SunOS5.5.mak @@ -0,0 +1,46 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Thu Oct 2 1997 ## + ## -------------------------------------------------------------------- ## + ## Settings for SunOS 5.5 ## + ## ## + ########################################################################### + +CC_OTHER_FLAGS += -Dolder_solaris + +include $(EST)/config/systems/sparc_SunOS5.mak + + + diff --git a/config/systems/sparc_SunOS5.6.mak b/config/systems/sparc_SunOS5.6.mak new file mode 100644 index 0000000..c4714c5 --- /dev/null +++ b/config/systems/sparc_SunOS5.6.mak @@ -0,0 +1,44 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Alan W Black (awb@cstr.ed.ac.uk) ## + ## Date: Tue Nov 4 1997 ## + ## -------------------------------------------------------------------- ## + ## Settings for SunOS 5.6 ## + ## ## + ########################################################################### + +CC_OTHER_FLAGS += -Dolder_solaris + +include $(EST)/config/systems/sparc_SunOS5.mak + diff --git a/config/systems/sparc_SunOS5.7.mak b/config/systems/sparc_SunOS5.7.mak new file mode 100644 index 0000000..fc3bc95 --- /dev/null +++ b/config/systems/sparc_SunOS5.7.mak @@ -0,0 +1,41 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Alan W Black (awb@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for SunOS 5.7 ## + ## ## + ########################################################################### + +include $(EST)/config/systems/sparc_SunOS5.mak + diff --git a/config/systems/sparc_SunOS5.8.mak b/config/systems/sparc_SunOS5.8.mak new file mode 100644 index 0000000..64202ae --- /dev/null +++ b/config/systems/sparc_SunOS5.8.mak @@ -0,0 +1,41 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Alan W Black (awb@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for SunOS 5.8 ## + ## ## + ########################################################################### + +include $(EST)/config/systems/sparc_SunOS5.mak + diff --git a/config/systems/sparc_SunOS5.mak b/config/systems/sparc_SunOS5.mak new file mode 100644 index 0000000..131b34f --- /dev/null +++ b/config/systems/sparc_SunOS5.mak @@ -0,0 +1,65 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Thu Oct 2 1997 ## + ## -------------------------------------------------------------------- ## + ## Settings for SunOS 5.* (aka Solaris 2, aka 'Solaris') ## + ## ## + ########################################################################### + +include $(EST)/config/systems/default.mak + +## Just guesses for what people are likely to have +GCC=gcc27 +SUNCC=suncc40 + +## Libraries needed for sockets based programs. +OS_LIBS = -lsocket -lnsl + +## the native audio module for this type of system +NATIVE_AUDIO_MODULE = SUN16 + +## Official location for java +DEFAULT_JAVA_HOME=/usr/java1.1 + +## Tell gcc we are a v8 sparc or better. Any legacy machines lose. +GCC_SYSTEM_OPTIONS = -mv8 + +## echo -n doesn't work +ECHO_N = /bin/printf "%s" + +JAVA_SYSTEM_INCLUDES = -I$(JAVA_HOME)/include/solaris + +## Force use of nawk +AWK=nawk diff --git a/config/systems/unknown_DebianGNULinux.mak b/config/systems/unknown_DebianGNULinux.mak new file mode 100644 index 0000000..079eb43 --- /dev/null +++ b/config/systems/unknown_DebianGNULinux.mak @@ -0,0 +1,41 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: David Huggins-Daines ## + ## -------------------------------------------------------------------- ## + ## Settings for generic/unknown Debian GNU/Linux systems ## + ## ## + ########################################################################### + +include $(EST)/config/systems/Linux.mak +include $(EST)/config/systems/DebianGNULinux.mak diff --git a/config/systems/unknown_Linux.mak b/config/systems/unknown_Linux.mak new file mode 100644 index 0000000..d3b3654 --- /dev/null +++ b/config/systems/unknown_Linux.mak @@ -0,0 +1,40 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Linux v2. ## + ## ## + ########################################################################### + +include $(EST)/config/systems/Linux.mak diff --git a/config/systems/unknown_RedHatLinux.mak b/config/systems/unknown_RedHatLinux.mak new file mode 100644 index 0000000..9307e08 --- /dev/null +++ b/config/systems/unknown_RedHatLinux.mak @@ -0,0 +1,42 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Robert Clark (robert@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Settings for unknown Red Hat Linux versions ## + ## ## + ########################################################################### + +include $(EST)/config/systems/Linux.mak +include $(EST)/config/systems/RedHatLinux.mak + diff --git a/config/systems/unknown_unknown.mak b/config/systems/unknown_unknown.mak new file mode 100644 index 0000000..06e26a5 --- /dev/null +++ b/config/systems/unknown_unknown.mak @@ -0,0 +1,54 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## Date: Tue Oct 7 1997 ## + ## -------------------------------------------------------------------- ## + ## A default description used to bootstrap. ## + ## ## + ########################################################################### + + +include $(EST)/config/systems/default.mak + +## Must support -nt +GNUTEST = gnutest + +## echo -n doesn't work (well only sometimes ?) +ECHO_N = echo + + + + + + diff --git a/config/systems/x86_64_Darwin.mak b/config/systems/x86_64_Darwin.mak new file mode 100644 index 0000000..a3dda3a --- /dev/null +++ b/config/systems/x86_64_Darwin.mak @@ -0,0 +1,41 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Alan W Black (awb@cstr.ed.ac.uk) ## + ## Date: Aug 3 2006 ## + ## -------------------------------------------------------------------- ## + ## Settings for Apple Darwin. ## + ## Thanks to Brian West ## + ########################################################################### + +include $(EST)/config/systems/x86_64_Darwin.mak diff --git a/config/test_make_rules b/config/test_make_rules new file mode 100644 index 0000000..099b627 --- /dev/null +++ b/config/test_make_rules @@ -0,0 +1,113 @@ + ########################################################-*-mode:Makefile-*- + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996,1997 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + # Makefile rules for testing things # + ########################################################################### + +TEST_PROGRAMS = $(TEST_MODULES:%=%_example) $(TEST_MODULES:%=%_regression) +#TSRCS = $(TEST_PROGRAMS:%=%.C) +#SRCS = $(TSRCS) +#OBJS = $(SRCS:%.C=%.o) + +include $(TOP)/config/common_make_rules + +test_scripts: $(TEST_SCRIPTS:%=%_script_test) + +test_modules: $(TEST_MODULES:%=%_module_build_and_test) + +$(TEST_MODULES:%=%_module_build_and_test) : %_module_build_and_test : %_module_rebuild %_module_test + +$(TEST_MODULES:%=%_module_rebuild) : %_module_rebuild : + @echo 'build $* (module)' + @/bin/rm -f $(OBJS) + @if $(MAKE) --no-print-directory OPTIMISE=$(TEST_OPTIMISE) WARN=1 $*_example $*_regression ;\ + then \ + : ;\ + else \ + echo $* example status: FAILED ; exit 1 ;\ + fi + +$(TEST_MODULES:%=%_module_test) : %_module_test : correct/%_example.out correct/%_regression.out + @echo 'test $* (module)' + @if ./$*_example $($(*:=_example_args)) > $*_example.out ;\ + then \ + echo $*_example completed ;\ + if [ ! -f $*_example.out ] || diff $*_example.out correct/$*_example.out ;\ + then \ + echo $* example status: CORRECT ;\ + else \ + echo $* example status: INCORRECT ;\ + fi ;\ + else \ + echo $* example status: FAILED ;\ + fi + @if ./$*_regression $($(*:=_regression_args)) > $*_regression.out ;\ + then \ + echo $*_regression completed ;\ + if [ ! -f $*_regression.out ] || diff $*_regression.out correct/$*_regression.out ;\ + then \ + echo $* regression status: CORRECT ;\ + else \ + echo $* regression status: INCORRECT ;\ + fi ;\ + else \ + echo $* regression status: FAILED ;\ + fi + @echo + @echo + +$(TEST_SCRIPTS:%=%_script_test) : %_script_test : %.sh correct/%_script.out + @echo 'test $* (script)' + @OUTPUT='$*_script.out' ;\ + TOP='$(TOP)' ;\ + export TOP OUTPUT ;\ + if /bin/sh $*.sh $($(*:=_script_args)) ;\ + then \ + echo $* script completed ;\ + if [ ! -f $*_script.out ] || diff $*_script.out correct/$*_script.out ;\ + then \ + echo $* script status: CORRECT ;\ + else \ + echo $* script status: INCORRECT ;\ + fi ;\ + else \ + echo $* script status: FAILED ;\ + fi + @echo + @echo + +$(SRCS:%.C=%.o) : %.o : %.C + +% : %.o + $(CXX) $(CXXFLAGS) $(TEMPLATES) -o $@ $@.o $(ESTLIB) $($(@:=_LIBS)) $(LIBS) + + diff --git a/config/vc_common_make_rules b/config/vc_common_make_rules new file mode 100644 index 0000000..26e09bf --- /dev/null +++ b/config/vc_common_make_rules @@ -0,0 +1,71 @@ + +###################################################################### +# # +# Make rules for MicroCruft Visual C++ # +# # +###################################################################### + +!include $(TOP)\config\vc_config_make_rules +!include $(TOP)\config\project.mak + +.SUFFIXES: .cc .obj +CPP=cl /nologo /DSYSTEM_IS_WIN32=1 /DINSTANTIATE_TEMPLATES=1 $(MODULEFLAGS) +CC=cl /nologo /DSYSTEM_IS_WIN32=1 $(MODULEFLAGS) + +default_target: $(DIRS) all + @echo done $(DIRNAME) + +all: $(ALL) + +$(DIRS) x1 : FORCE + @echo building in $(DIRNAME)\$@ + @cd $@ + @nmake /nologo /fVCMakefile + @cd .. + +FORCE: + +.vcbuildlib: $(OBJS) + @echo add to $(INLIB) $(OBJS) + @if EXIST $(INLIB) lib/nologo $(INLIB) $(OBJS) + @if NOT EXIST $(INLIB) lib/nologo /out:$(INLIB) $(OBJS) + @echo built > .vcbuildlib + +.vc_add_to_lib: $(TOADD) + lib/nologo $(ADDLIB) $(TOADD) + @echo built > .vc_add_to_lib + +.libraries: + @echo Libraries not touched for VC++ + +relink: + @echo Links not made for Visual C++ + +.vc_build_scripts: + @echo Scripts not built for Visual C++ + +.vc_build_manpages: + @echo manual pages not built for Visual C++ + +.config_error:: + @echo Config OK + +.sub_directories: $(BUILD_DIRS) + +.remove_links: + @echo Links not made for Visual C++ +.process_scripts: + @echo Scripts not created for VC +.process_docs: + @echo Documentation not created for VC +.link_main: + @echo Links not made for Visual C++ + +.cc.obj: + $(CPP) $(CFLAGS) /c /Tp$*.cc /Fo$*.obj +.c.obj: + $(CC) $(CFLAGS) /c /Tc$*.c /Fo$*.obj + +# this dummy rule stops the comment in make.depend getting +# interpreted as a command... YEUCH +hack_dummy_target: diff --git a/config/vc_config_make_rules-dist b/config/vc_config_make_rules-dist new file mode 100644 index 0000000..b1d0f38 --- /dev/null +++ b/config/vc_config_make_rules-dist @@ -0,0 +1,37 @@ + + ###################################################################### + # # + # Configuration variable make settings for MicroCruft Visual C++ # + # # + ###################################################################### + +EST=$(TOP)\..\speech_tools + +SYSTEM_LIB=c:\\festival\\lib + +MODULEFLAGS=/DSUPPORT_EDITLINE=1 +AUDIOFLAGS= + +DEBUGFLAGS= /Zi +LINKDEBUGFLAGS = /debug + +OPTFLAGS= /EHsc /wd4675 /GR + +DEFINES=/DFTLIBDIRC=$(SYSTEM_LIB) "/DFTOSTYPEC=win32_vc" "/DFTNAME=$(PROJECT_NAME)" "/DFTVERSION=$(PROJECT_VERSION)" "/DFTSTATE=$(PROJECT_STATE)" "/DFTDATE=$(PROJECT_DATE)" + +INCLUDEFLAGS= /I$(TOP)/src/include /I$(EST)\include $(LOCAL_INCLUDES) /DNO_READLINE=1 /DNO_SPOOLER=1 $(DEFINES) + +LINKFLAGS=$(LINKDEBUGFLAGS) + +LIB_DIR=src\lib + +FESTLIBS = $(TOP)\src\lib\libfestival.lib + +ESTLIBS = $(EST)\lib\libestools.lib $(EST)\lib\libestbase.lib $(EST)\lib\libeststring.lib + +WINLIBS = wsock32.lib winmm.lib + +!ifndef VCLIBS +VCLIBS = $(FESTLIBS) $(ESTLIBS) +!endif + diff --git a/configure b/configure new file mode 100755 index 0000000..95d29da --- /dev/null +++ b/configure @@ -0,0 +1,1286 @@ +#! /bin/sh + +# Guess values for system-dependent variables and create Makefiles. +# Generated automatically using autoconf version 2.13 +# Copyright (C) 1992, 93, 94, 95, 96 Free Software Foundation, Inc. +# +# This configure script is free software; the Free Software Foundation +# gives unlimited permission to copy, distribute and modify it. + +# Defaults: +ac_help= +ac_default_prefix=/usr/local +# Any additions from configure.in: + +# Initialize some variables set by options. +# The variables have the same names as the options, with +# dashes changed to underlines. +build=NONE +cache_file=./config.cache +exec_prefix=NONE +host=NONE +no_create= +nonopt=NONE +no_recursion= +prefix=NONE +program_prefix=NONE +program_suffix=NONE +program_transform_name=s,x,x, +silent= +site= +srcdir= +target=NONE +verbose= +x_includes=NONE +x_libraries=NONE +bindir='${exec_prefix}/bin' +sbindir='${exec_prefix}/sbin' +libexecdir='${exec_prefix}/libexec' +datadir='${prefix}/share' +sysconfdir='${prefix}/etc' +sharedstatedir='${prefix}/com' +localstatedir='${prefix}/var' +libdir='${exec_prefix}/lib' +includedir='${prefix}/include' +oldincludedir='/usr/include' +infodir='${prefix}/info' +mandir='${prefix}/man' + +# Initialize some other variables. +subdirs= +MFLAGS= MAKEFLAGS= +SHELL=${CONFIG_SHELL-/bin/sh} +# Maximum number of lines to put in a shell here document. +ac_max_here_lines=12 + +ac_prev= +for ac_option +do + + # If the previous option needs an argument, assign it. + if test -n "$ac_prev"; then + eval "$ac_prev=\$ac_option" + ac_prev= + continue + fi + + case "$ac_option" in + -*=*) ac_optarg=`echo "$ac_option" | sed 's/[-_a-zA-Z0-9]*=//'` ;; + *) ac_optarg= ;; + esac + + # Accept the important Cygnus configure options, so we can diagnose typos. + + case "$ac_option" in + + -bindir | --bindir | --bindi | --bind | --bin | --bi) + ac_prev=bindir ;; + -bindir=* | --bindir=* | --bindi=* | --bind=* | --bin=* | --bi=*) + bindir="$ac_optarg" ;; + + -build | --build | --buil | --bui | --bu) + ac_prev=build ;; + -build=* | --build=* | --buil=* | --bui=* | --bu=*) + build="$ac_optarg" ;; + + -cache-file | --cache-file | --cache-fil | --cache-fi \ + | --cache-f | --cache- | --cache | --cach | --cac | --ca | --c) + ac_prev=cache_file ;; + -cache-file=* | --cache-file=* | --cache-fil=* | --cache-fi=* \ + | --cache-f=* | --cache-=* | --cache=* | --cach=* | --cac=* | --ca=* | --c=*) + cache_file="$ac_optarg" ;; + + -datadir | --datadir | --datadi | --datad | --data | --dat | --da) + ac_prev=datadir ;; + -datadir=* | --datadir=* | --datadi=* | --datad=* | --data=* | --dat=* \ + | --da=*) + datadir="$ac_optarg" ;; + + -disable-* | --disable-*) + ac_feature=`echo $ac_option|sed -e 's/-*disable-//'` + # Reject names that are not valid shell variable names. + if test -n "`echo $ac_feature| sed 's/[-a-zA-Z0-9_]//g'`"; then + { echo "configure: error: $ac_feature: invalid feature name" 1>&2; exit 1; } + fi + ac_feature=`echo $ac_feature| sed 's/-/_/g'` + eval "enable_${ac_feature}=no" ;; + + -enable-* | --enable-*) + ac_feature=`echo $ac_option|sed -e 's/-*enable-//' -e 's/=.*//'` + # Reject names that are not valid shell variable names. + if test -n "`echo $ac_feature| sed 's/[-_a-zA-Z0-9]//g'`"; then + { echo "configure: error: $ac_feature: invalid feature name" 1>&2; exit 1; } + fi + ac_feature=`echo $ac_feature| sed 's/-/_/g'` + case "$ac_option" in + *=*) ;; + *) ac_optarg=yes ;; + esac + eval "enable_${ac_feature}='$ac_optarg'" ;; + + -exec-prefix | --exec_prefix | --exec-prefix | --exec-prefi \ + | --exec-pref | --exec-pre | --exec-pr | --exec-p | --exec- \ + | --exec | --exe | --ex) + ac_prev=exec_prefix ;; + -exec-prefix=* | --exec_prefix=* | --exec-prefix=* | --exec-prefi=* \ + | --exec-pref=* | --exec-pre=* | --exec-pr=* | --exec-p=* | --exec-=* \ + | --exec=* | --exe=* | --ex=*) + exec_prefix="$ac_optarg" ;; + + -gas | --gas | --ga | --g) + # Obsolete; use --with-gas. + with_gas=yes ;; + + -help | --help | --hel | --he) + # Omit some internal or obsolete options to make the list less imposing. + # This message is too long to be a string in the A/UX 3.1 sh. + cat << EOF +Usage: configure [options] [host] +Options: [defaults in brackets after descriptions] +Configuration: + --cache-file=FILE cache test results in FILE + --help print this message + --no-create do not create output files + --quiet, --silent do not print \`checking...' messages + --version print the version of autoconf that created configure +Directory and file names: + --prefix=PREFIX install architecture-independent files in PREFIX + [$ac_default_prefix] + --exec-prefix=EPREFIX install architecture-dependent files in EPREFIX + [same as prefix] + --bindir=DIR user executables in DIR [EPREFIX/bin] + --sbindir=DIR system admin executables in DIR [EPREFIX/sbin] + --libexecdir=DIR program executables in DIR [EPREFIX/libexec] + --datadir=DIR read-only architecture-independent data in DIR + [PREFIX/share] + --sysconfdir=DIR read-only single-machine data in DIR [PREFIX/etc] + --sharedstatedir=DIR modifiable architecture-independent data in DIR + [PREFIX/com] + --localstatedir=DIR modifiable single-machine data in DIR [PREFIX/var] + --libdir=DIR object code libraries in DIR [EPREFIX/lib] + --includedir=DIR C header files in DIR [PREFIX/include] + --oldincludedir=DIR C header files for non-gcc in DIR [/usr/include] + --infodir=DIR info documentation in DIR [PREFIX/info] + --mandir=DIR man documentation in DIR [PREFIX/man] + --srcdir=DIR find the sources in DIR [configure dir or ..] + --program-prefix=PREFIX prepend PREFIX to installed program names + --program-suffix=SUFFIX append SUFFIX to installed program names + --program-transform-name=PROGRAM + run sed PROGRAM on installed program names +EOF + cat << EOF +Host type: + --build=BUILD configure for building on BUILD [BUILD=HOST] + --host=HOST configure for HOST [guessed] + --target=TARGET configure for TARGET [TARGET=HOST] +Features and packages: + --disable-FEATURE do not include FEATURE (same as --enable-FEATURE=no) + --enable-FEATURE[=ARG] include FEATURE [ARG=yes] + --with-PACKAGE[=ARG] use PACKAGE [ARG=yes] + --without-PACKAGE do not use PACKAGE (same as --with-PACKAGE=no) + --x-includes=DIR X include files are in DIR + --x-libraries=DIR X library files are in DIR +EOF + if test -n "$ac_help"; then + echo "--enable and --with options recognized:$ac_help" + fi + exit 0 ;; + + -host | --host | --hos | --ho) + ac_prev=host ;; + -host=* | --host=* | --hos=* | --ho=*) + host="$ac_optarg" ;; + + -includedir | --includedir | --includedi | --included | --include \ + | --includ | --inclu | --incl | --inc) + ac_prev=includedir ;; + -includedir=* | --includedir=* | --includedi=* | --included=* | --include=* \ + | --includ=* | --inclu=* | --incl=* | --inc=*) + includedir="$ac_optarg" ;; + + -infodir | --infodir | --infodi | --infod | --info | --inf) + ac_prev=infodir ;; + -infodir=* | --infodir=* | --infodi=* | --infod=* | --info=* | --inf=*) + infodir="$ac_optarg" ;; + + -libdir | --libdir | --libdi | --libd) + ac_prev=libdir ;; + -libdir=* | --libdir=* | --libdi=* | --libd=*) + libdir="$ac_optarg" ;; + + -libexecdir | --libexecdir | --libexecdi | --libexecd | --libexec \ + | --libexe | --libex | --libe) + ac_prev=libexecdir ;; + -libexecdir=* | --libexecdir=* | --libexecdi=* | --libexecd=* | --libexec=* \ + | --libexe=* | --libex=* | --libe=*) + libexecdir="$ac_optarg" ;; + + -localstatedir | --localstatedir | --localstatedi | --localstated \ + | --localstate | --localstat | --localsta | --localst \ + | --locals | --local | --loca | --loc | --lo) + ac_prev=localstatedir ;; + -localstatedir=* | --localstatedir=* | --localstatedi=* | --localstated=* \ + | --localstate=* | --localstat=* | --localsta=* | --localst=* \ + | --locals=* | --local=* | --loca=* | --loc=* | --lo=*) + localstatedir="$ac_optarg" ;; + + -mandir | --mandir | --mandi | --mand | --man | --ma | --m) + ac_prev=mandir ;; + -mandir=* | --mandir=* | --mandi=* | --mand=* | --man=* | --ma=* | --m=*) + mandir="$ac_optarg" ;; + + -nfp | --nfp | --nf) + # Obsolete; use --without-fp. + with_fp=no ;; + + -no-create | --no-create | --no-creat | --no-crea | --no-cre \ + | --no-cr | --no-c) + no_create=yes ;; + + -no-recursion | --no-recursion | --no-recursio | --no-recursi \ + | --no-recurs | --no-recur | --no-recu | --no-rec | --no-re | --no-r) + no_recursion=yes ;; + + -oldincludedir | --oldincludedir | --oldincludedi | --oldincluded \ + | --oldinclude | --oldinclud | --oldinclu | --oldincl | --oldinc \ + | --oldin | --oldi | --old | --ol | --o) + ac_prev=oldincludedir ;; + -oldincludedir=* | --oldincludedir=* | --oldincludedi=* | --oldincluded=* \ + | --oldinclude=* | --oldinclud=* | --oldinclu=* | --oldincl=* | --oldinc=* \ + | --oldin=* | --oldi=* | --old=* | --ol=* | --o=*) + oldincludedir="$ac_optarg" ;; + + -prefix | --prefix | --prefi | --pref | --pre | --pr | --p) + ac_prev=prefix ;; + -prefix=* | --prefix=* | --prefi=* | --pref=* | --pre=* | --pr=* | --p=*) + prefix="$ac_optarg" ;; + + -program-prefix | --program-prefix | --program-prefi | --program-pref \ + | --program-pre | --program-pr | --program-p) + ac_prev=program_prefix ;; + -program-prefix=* | --program-prefix=* | --program-prefi=* \ + | --program-pref=* | --program-pre=* | --program-pr=* | --program-p=*) + program_prefix="$ac_optarg" ;; + + -program-suffix | --program-suffix | --program-suffi | --program-suff \ + | --program-suf | --program-su | --program-s) + ac_prev=program_suffix ;; + -program-suffix=* | --program-suffix=* | --program-suffi=* \ + | --program-suff=* | --program-suf=* | --program-su=* | --program-s=*) + program_suffix="$ac_optarg" ;; + + -program-transform-name | --program-transform-name \ + | --program-transform-nam | --program-transform-na \ + | --program-transform-n | --program-transform- \ + | --program-transform | --program-transfor \ + | --program-transfo | --program-transf \ + | --program-trans | --program-tran \ + | --progr-tra | --program-tr | --program-t) + ac_prev=program_transform_name ;; + -program-transform-name=* | --program-transform-name=* \ + | --program-transform-nam=* | --program-transform-na=* \ + | --program-transform-n=* | --program-transform-=* \ + | --program-transform=* | --program-transfor=* \ + | --program-transfo=* | --program-transf=* \ + | --program-trans=* | --program-tran=* \ + | --progr-tra=* | --program-tr=* | --program-t=*) + program_transform_name="$ac_optarg" ;; + + -q | -quiet | --quiet | --quie | --qui | --qu | --q \ + | -silent | --silent | --silen | --sile | --sil) + silent=yes ;; + + -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb) + ac_prev=sbindir ;; + -sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \ + | --sbi=* | --sb=*) + sbindir="$ac_optarg" ;; + + -sharedstatedir | --sharedstatedir | --sharedstatedi \ + | --sharedstated | --sharedstate | --sharedstat | --sharedsta \ + | --sharedst | --shareds | --shared | --share | --shar \ + | --sha | --sh) + ac_prev=sharedstatedir ;; + -sharedstatedir=* | --sharedstatedir=* | --sharedstatedi=* \ + | --sharedstated=* | --sharedstate=* | --sharedstat=* | --sharedsta=* \ + | --sharedst=* | --shareds=* | --shared=* | --share=* | --shar=* \ + | --sha=* | --sh=*) + sharedstatedir="$ac_optarg" ;; + + -site | --site | --sit) + ac_prev=site ;; + -site=* | --site=* | --sit=*) + site="$ac_optarg" ;; + + -srcdir | --srcdir | --srcdi | --srcd | --src | --sr) + ac_prev=srcdir ;; + -srcdir=* | --srcdir=* | --srcdi=* | --srcd=* | --src=* | --sr=*) + srcdir="$ac_optarg" ;; + + -sysconfdir | --sysconfdir | --sysconfdi | --sysconfd | --sysconf \ + | --syscon | --sysco | --sysc | --sys | --sy) + ac_prev=sysconfdir ;; + -sysconfdir=* | --sysconfdir=* | --sysconfdi=* | --sysconfd=* | --sysconf=* \ + | --syscon=* | --sysco=* | --sysc=* | --sys=* | --sy=*) + sysconfdir="$ac_optarg" ;; + + -target | --target | --targe | --targ | --tar | --ta | --t) + ac_prev=target ;; + -target=* | --target=* | --targe=* | --targ=* | --tar=* | --ta=* | --t=*) + target="$ac_optarg" ;; + + -v | -verbose | --verbose | --verbos | --verbo | --verb) + verbose=yes ;; + + -version | --version | --versio | --versi | --vers) + echo "configure generated by autoconf version 2.13" + exit 0 ;; + + -with-* | --with-*) + ac_package=`echo $ac_option|sed -e 's/-*with-//' -e 's/=.*//'` + # Reject names that are not valid shell variable names. + if test -n "`echo $ac_package| sed 's/[-_a-zA-Z0-9]//g'`"; then + { echo "configure: error: $ac_package: invalid package name" 1>&2; exit 1; } + fi + ac_package=`echo $ac_package| sed 's/-/_/g'` + case "$ac_option" in + *=*) ;; + *) ac_optarg=yes ;; + esac + eval "with_${ac_package}='$ac_optarg'" ;; + + -without-* | --without-*) + ac_package=`echo $ac_option|sed -e 's/-*without-//'` + # Reject names that are not valid shell variable names. + if test -n "`echo $ac_package| sed 's/[-a-zA-Z0-9_]//g'`"; then + { echo "configure: error: $ac_package: invalid package name" 1>&2; exit 1; } + fi + ac_package=`echo $ac_package| sed 's/-/_/g'` + eval "with_${ac_package}=no" ;; + + --x) + # Obsolete; use --with-x. + with_x=yes ;; + + -x-includes | --x-includes | --x-include | --x-includ | --x-inclu \ + | --x-incl | --x-inc | --x-in | --x-i) + ac_prev=x_includes ;; + -x-includes=* | --x-includes=* | --x-include=* | --x-includ=* | --x-inclu=* \ + | --x-incl=* | --x-inc=* | --x-in=* | --x-i=*) + x_includes="$ac_optarg" ;; + + -x-libraries | --x-libraries | --x-librarie | --x-librari \ + | --x-librar | --x-libra | --x-libr | --x-lib | --x-li | --x-l) + ac_prev=x_libraries ;; + -x-libraries=* | --x-libraries=* | --x-librarie=* | --x-librari=* \ + | --x-librar=* | --x-libra=* | --x-libr=* | --x-lib=* | --x-li=* | --x-l=*) + x_libraries="$ac_optarg" ;; + + -*) { echo "configure: error: $ac_option: invalid option; use --help to show usage" 1>&2; exit 1; } + ;; + + *) + if test -n "`echo $ac_option| sed 's/[-a-z0-9.]//g'`"; then + echo "configure: warning: $ac_option: invalid host type" 1>&2 + fi + if test "x$nonopt" != xNONE; then + { echo "configure: error: can only configure for one host and one target at a time" 1>&2; exit 1; } + fi + nonopt="$ac_option" + ;; + + esac +done + +if test -n "$ac_prev"; then + { echo "configure: error: missing argument to --`echo $ac_prev | sed 's/_/-/g'`" 1>&2; exit 1; } +fi + +trap 'rm -fr conftest* confdefs* core core.* *.core $ac_clean_files; exit 1' 1 2 15 + +# File descriptor usage: +# 0 standard input +# 1 file creation +# 2 errors and warnings +# 3 some systems may open it to /dev/tty +# 4 used on the Kubota Titan +# 6 checking for... messages and results +# 5 compiler messages saved in config.log +if test "$silent" = yes; then + exec 6>/dev/null +else + exec 6>&1 +fi +exec 5>./config.log + +echo "\ +This file contains any messages produced by compilers while +running configure, to aid debugging if configure makes a mistake. +" 1>&5 + +# Strip out --no-create and --no-recursion so they do not pile up. +# Also quote any args containing shell metacharacters. +ac_configure_args= +for ac_arg +do + case "$ac_arg" in + -no-create | --no-create | --no-creat | --no-crea | --no-cre \ + | --no-cr | --no-c) ;; + -no-recursion | --no-recursion | --no-recursio | --no-recursi \ + | --no-recurs | --no-recur | --no-recu | --no-rec | --no-re | --no-r) ;; + *" "*|*" "*|*[\[\]\~\#\$\^\&\*\(\)\{\}\\\|\;\<\>\?]*) + ac_configure_args="$ac_configure_args '$ac_arg'" ;; + *) ac_configure_args="$ac_configure_args $ac_arg" ;; + esac +done + +# NLS nuisances. +# Only set these to C if already set. These must not be set unconditionally +# because not all systems understand e.g. LANG=C (notably SCO). +# Fixing LC_MESSAGES prevents Solaris sh from translating var values in `set'! +# Non-C LC_CTYPE values break the ctype check. +if test "${LANG+set}" = set; then LANG=C; export LANG; fi +if test "${LC_ALL+set}" = set; then LC_ALL=C; export LC_ALL; fi +if test "${LC_MESSAGES+set}" = set; then LC_MESSAGES=C; export LC_MESSAGES; fi +if test "${LC_CTYPE+set}" = set; then LC_CTYPE=C; export LC_CTYPE; fi + +# confdefs.h avoids OS command line length limits that DEFS can exceed. +rm -rf conftest* confdefs.h +# AIX cpp loses on an empty file, so make sure it contains at least a newline. +echo > confdefs.h + +# A filename unique to this package, relative to the directory that +# configure is in, which we can look for to find out if srcdir is correct. +ac_unique_file=src/include/festival.h + +# Find the source files, if location was not specified. +if test -z "$srcdir"; then + ac_srcdir_defaulted=yes + # Try the directory containing this script, then its parent. + ac_prog=$0 + ac_confdir=`echo $ac_prog|sed 's%/[^/][^/]*$%%'` + test "x$ac_confdir" = "x$ac_prog" && ac_confdir=. + srcdir=$ac_confdir + if test ! -r $srcdir/$ac_unique_file; then + srcdir=.. + fi +else + ac_srcdir_defaulted=no +fi +if test ! -r $srcdir/$ac_unique_file; then + if test "$ac_srcdir_defaulted" = yes; then + { echo "configure: error: can not find sources in $ac_confdir or .." 1>&2; exit 1; } + else + { echo "configure: error: can not find sources in $srcdir" 1>&2; exit 1; } + fi +fi +srcdir=`echo "${srcdir}" | sed 's%\([^/]\)/*$%\1%'` + +# Prefer explicitly selected file to automatically selected ones. +if test -z "$CONFIG_SITE"; then + if test "x$prefix" != xNONE; then + CONFIG_SITE="$prefix/share/config.site $prefix/etc/config.site" + else + CONFIG_SITE="$ac_default_prefix/share/config.site $ac_default_prefix/etc/config.site" + fi +fi +for ac_site_file in $CONFIG_SITE; do + if test -r "$ac_site_file"; then + echo "loading site script $ac_site_file" + . "$ac_site_file" + fi +done + +if test -r "$cache_file"; then + echo "loading cache $cache_file" + . $cache_file +else + echo "creating cache $cache_file" + > $cache_file +fi + +ac_ext=c +# CFLAGS is not in ac_cpp because -g, -O, etc. are not valid cpp options. +ac_cpp='$CPP $CPPFLAGS' +ac_compile='${CC-cc} -c $CFLAGS $CPPFLAGS conftest.$ac_ext 1>&5' +ac_link='${CC-cc} -o conftest${ac_exeext} $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS 1>&5' +cross_compiling=$ac_cv_prog_cc_cross + +ac_exeext= +ac_objext=o +if (echo "testing\c"; echo 1,2,3) | grep c >/dev/null; then + # Stardent Vistra SVR4 grep lacks -e, says ghazi@caip.rutgers.edu. + if (echo -n testing; echo 1,2,3) | sed s/-n/xn/ | grep xn >/dev/null; then + ac_n= ac_c=' +' ac_t=' ' + else + ac_n=-n ac_c= ac_t= + fi +else + ac_n= ac_c='\c' ac_t= +fi + + + +ac_aux_dir= +for ac_dir in $srcdir $srcdir/.. $srcdir/../..; do + if test -f $ac_dir/install-sh; then + ac_aux_dir=$ac_dir + ac_install_sh="$ac_aux_dir/install-sh -c" + break + elif test -f $ac_dir/install.sh; then + ac_aux_dir=$ac_dir + ac_install_sh="$ac_aux_dir/install.sh -c" + break + fi +done +if test -z "$ac_aux_dir"; then + { echo "configure: error: can not find install-sh or install.sh in $srcdir $srcdir/.. $srcdir/../.." 1>&2; exit 1; } +fi +ac_config_guess=$ac_aux_dir/config.guess +ac_config_sub=$ac_aux_dir/config.sub +ac_configure=$ac_aux_dir/configure # This should be Cygnus configure. + + +# Do some error checking and defaulting for the host and target type. +# The inputs are: +# configure --host=HOST --target=TARGET --build=BUILD NONOPT +# +# The rules are: +# 1. You are not allowed to specify --host, --target, and nonopt at the +# same time. +# 2. Host defaults to nonopt. +# 3. If nonopt is not specified, then host defaults to the current host, +# as determined by config.guess. +# 4. Target and build default to nonopt. +# 5. If nonopt is not specified, then target and build default to host. + +# The aliases save the names the user supplied, while $host etc. +# will get canonicalized. +case $host---$target---$nonopt in +NONE---*---* | *---NONE---* | *---*---NONE) ;; +*) { echo "configure: error: can only configure for one host and one target at a time" 1>&2; exit 1; } ;; +esac + + +# Make sure we can run config.sub. +if ${CONFIG_SHELL-/bin/sh} $ac_config_sub sun4 >/dev/null 2>&1; then : +else { echo "configure: error: can not run $ac_config_sub" 1>&2; exit 1; } +fi + +echo $ac_n "checking host system type""... $ac_c" 1>&6 +echo "configure:573: checking host system type" >&5 + +host_alias=$host +case "$host_alias" in +NONE) + case $nonopt in + NONE) + if host_alias=`${CONFIG_SHELL-/bin/sh} $ac_config_guess`; then : + else { echo "configure: error: can not guess host type; you must specify one" 1>&2; exit 1; } + fi ;; + *) host_alias=$nonopt ;; + esac ;; +esac + +host=`${CONFIG_SHELL-/bin/sh} $ac_config_sub $host_alias` +host_cpu=`echo $host | sed 's/^\([^-]*\)-\([^-]*\)-\(.*\)$/\1/'` +host_vendor=`echo $host | sed 's/^\([^-]*\)-\([^-]*\)-\(.*\)$/\2/'` +host_os=`echo $host | sed 's/^\([^-]*\)-\([^-]*\)-\(.*\)$/\3/'` +echo "$ac_t""$host" 1>&6 + +echo $ac_n "checking target system type""... $ac_c" 1>&6 +echo "configure:594: checking target system type" >&5 + +target_alias=$target +case "$target_alias" in +NONE) + case $nonopt in + NONE) target_alias=$host_alias ;; + *) target_alias=$nonopt ;; + esac ;; +esac + +target=`${CONFIG_SHELL-/bin/sh} $ac_config_sub $target_alias` +target_cpu=`echo $target | sed 's/^\([^-]*\)-\([^-]*\)-\(.*\)$/\1/'` +target_vendor=`echo $target | sed 's/^\([^-]*\)-\([^-]*\)-\(.*\)$/\2/'` +target_os=`echo $target | sed 's/^\([^-]*\)-\([^-]*\)-\(.*\)$/\3/'` +echo "$ac_t""$target" 1>&6 + +echo $ac_n "checking build system type""... $ac_c" 1>&6 +echo "configure:612: checking build system type" >&5 + +build_alias=$build +case "$build_alias" in +NONE) + case $nonopt in + NONE) build_alias=$host_alias ;; + *) build_alias=$nonopt ;; + esac ;; +esac + +build=`${CONFIG_SHELL-/bin/sh} $ac_config_sub $build_alias` +build_cpu=`echo $build | sed 's/^\([^-]*\)-\([^-]*\)-\(.*\)$/\1/'` +build_vendor=`echo $build | sed 's/^\([^-]*\)-\([^-]*\)-\(.*\)$/\2/'` +build_os=`echo $build | sed 's/^\([^-]*\)-\([^-]*\)-\(.*\)$/\3/'` +echo "$ac_t""$build" 1>&6 + +test "$host_alias" != "$target_alias" && + test "$program_prefix$program_suffix$program_transform_name" = \ + NONENONEs,x,x, && + program_prefix=${target_alias}- + +# Extract the first word of "gcc", so it can be a program name with args. +set dummy gcc; ac_word=$2 +echo $ac_n "checking for $ac_word""... $ac_c" 1>&6 +echo "configure:637: checking for $ac_word" >&5 +if eval "test \"`echo '$''{'ac_cv_prog_CC'+set}'`\" = set"; then + echo $ac_n "(cached) $ac_c" 1>&6 +else + if test -n "$CC"; then + ac_cv_prog_CC="$CC" # Let the user override the test. +else + IFS="${IFS= }"; ac_save_ifs="$IFS"; IFS=":" + ac_dummy="$PATH" + for ac_dir in $ac_dummy; do + test -z "$ac_dir" && ac_dir=. + if test -f $ac_dir/$ac_word; then + ac_cv_prog_CC="gcc" + break + fi + done + IFS="$ac_save_ifs" +fi +fi +CC="$ac_cv_prog_CC" +if test -n "$CC"; then + echo "$ac_t""$CC" 1>&6 +else + echo "$ac_t""no" 1>&6 +fi + +if test -z "$CC"; then + # Extract the first word of "cc", so it can be a program name with args. +set dummy cc; ac_word=$2 +echo $ac_n "checking for $ac_word""... $ac_c" 1>&6 +echo "configure:667: checking for $ac_word" >&5 +if eval "test \"`echo '$''{'ac_cv_prog_CC'+set}'`\" = set"; then + echo $ac_n "(cached) $ac_c" 1>&6 +else + if test -n "$CC"; then + ac_cv_prog_CC="$CC" # Let the user override the test. +else + IFS="${IFS= }"; ac_save_ifs="$IFS"; IFS=":" + ac_prog_rejected=no + ac_dummy="$PATH" + for ac_dir in $ac_dummy; do + test -z "$ac_dir" && ac_dir=. + if test -f $ac_dir/$ac_word; then + if test "$ac_dir/$ac_word" = "/usr/ucb/cc"; then + ac_prog_rejected=yes + continue + fi + ac_cv_prog_CC="cc" + break + fi + done + IFS="$ac_save_ifs" +if test $ac_prog_rejected = yes; then + # We found a bogon in the path, so make sure we never use it. + set dummy $ac_cv_prog_CC + shift + if test $# -gt 0; then + # We chose a different compiler from the bogus one. + # However, it has the same basename, so the bogon will be chosen + # first if we set CC to just the basename; use the full file name. + shift + set dummy "$ac_dir/$ac_word" "$@" + shift + ac_cv_prog_CC="$@" + fi +fi +fi +fi +CC="$ac_cv_prog_CC" +if test -n "$CC"; then + echo "$ac_t""$CC" 1>&6 +else + echo "$ac_t""no" 1>&6 +fi + + if test -z "$CC"; then + case "`uname -s`" in + *win32* | *WIN32*) + # Extract the first word of "cl", so it can be a program name with args. +set dummy cl; ac_word=$2 +echo $ac_n "checking for $ac_word""... $ac_c" 1>&6 +echo "configure:718: checking for $ac_word" >&5 +if eval "test \"`echo '$''{'ac_cv_prog_CC'+set}'`\" = set"; then + echo $ac_n "(cached) $ac_c" 1>&6 +else + if test -n "$CC"; then + ac_cv_prog_CC="$CC" # Let the user override the test. +else + IFS="${IFS= }"; ac_save_ifs="$IFS"; IFS=":" + ac_dummy="$PATH" + for ac_dir in $ac_dummy; do + test -z "$ac_dir" && ac_dir=. + if test -f $ac_dir/$ac_word; then + ac_cv_prog_CC="cl" + break + fi + done + IFS="$ac_save_ifs" +fi +fi +CC="$ac_cv_prog_CC" +if test -n "$CC"; then + echo "$ac_t""$CC" 1>&6 +else + echo "$ac_t""no" 1>&6 +fi + ;; + esac + fi + test -z "$CC" && { echo "configure: error: no acceptable cc found in \$PATH" 1>&2; exit 1; } +fi + +echo $ac_n "checking whether the C compiler ($CC $CFLAGS $LDFLAGS) works""... $ac_c" 1>&6 +echo "configure:750: checking whether the C compiler ($CC $CFLAGS $LDFLAGS) works" >&5 + +ac_ext=c +# CFLAGS is not in ac_cpp because -g, -O, etc. are not valid cpp options. +ac_cpp='$CPP $CPPFLAGS' +ac_compile='${CC-cc} -c $CFLAGS $CPPFLAGS conftest.$ac_ext 1>&5' +ac_link='${CC-cc} -o conftest${ac_exeext} $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS 1>&5' +cross_compiling=$ac_cv_prog_cc_cross + +cat > conftest.$ac_ext << EOF + +#line 761 "configure" +#include "confdefs.h" + +main(){return(0);} +EOF +if { (eval echo configure:766: \"$ac_link\") 1>&5; (eval $ac_link) 2>&5; } && test -s conftest${ac_exeext}; then + ac_cv_prog_cc_works=yes + # If we can't run a trivial program, we are probably using a cross compiler. + if (./conftest; exit) 2>/dev/null; then + ac_cv_prog_cc_cross=no + else + ac_cv_prog_cc_cross=yes + fi +else + echo "configure: failed program was:" >&5 + cat conftest.$ac_ext >&5 + ac_cv_prog_cc_works=no +fi +rm -fr conftest* +ac_ext=c +# CFLAGS is not in ac_cpp because -g, -O, etc. are not valid cpp options. +ac_cpp='$CPP $CPPFLAGS' +ac_compile='${CC-cc} -c $CFLAGS $CPPFLAGS conftest.$ac_ext 1>&5' +ac_link='${CC-cc} -o conftest${ac_exeext} $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS 1>&5' +cross_compiling=$ac_cv_prog_cc_cross + +echo "$ac_t""$ac_cv_prog_cc_works" 1>&6 +if test $ac_cv_prog_cc_works = no; then + { echo "configure: error: installation or configuration problem: C compiler cannot create executables." 1>&2; exit 1; } +fi +echo $ac_n "checking whether the C compiler ($CC $CFLAGS $LDFLAGS) is a cross-compiler""... $ac_c" 1>&6 +echo "configure:792: checking whether the C compiler ($CC $CFLAGS $LDFLAGS) is a cross-compiler" >&5 +echo "$ac_t""$ac_cv_prog_cc_cross" 1>&6 +cross_compiling=$ac_cv_prog_cc_cross + +echo $ac_n "checking whether we are using GNU C""... $ac_c" 1>&6 +echo "configure:797: checking whether we are using GNU C" >&5 +if eval "test \"`echo '$''{'ac_cv_prog_gcc'+set}'`\" = set"; then + echo $ac_n "(cached) $ac_c" 1>&6 +else + cat > conftest.c <&5; (eval $ac_try) 2>&5; }; } | egrep yes >/dev/null 2>&1; then + ac_cv_prog_gcc=yes +else + ac_cv_prog_gcc=no +fi +fi + +echo "$ac_t""$ac_cv_prog_gcc" 1>&6 + +if test $ac_cv_prog_gcc = yes; then + GCC=yes +else + GCC= +fi + +ac_test_CFLAGS="${CFLAGS+set}" +ac_save_CFLAGS="$CFLAGS" +CFLAGS= +echo $ac_n "checking whether ${CC-cc} accepts -g""... $ac_c" 1>&6 +echo "configure:825: checking whether ${CC-cc} accepts -g" >&5 +if eval "test \"`echo '$''{'ac_cv_prog_cc_g'+set}'`\" = set"; then + echo $ac_n "(cached) $ac_c" 1>&6 +else + echo 'void f(){}' > conftest.c +if test -z "`${CC-cc} -g -c conftest.c 2>&1`"; then + ac_cv_prog_cc_g=yes +else + ac_cv_prog_cc_g=no +fi +rm -f conftest* + +fi + +echo "$ac_t""$ac_cv_prog_cc_g" 1>&6 +if test "$ac_test_CFLAGS" = set; then + CFLAGS="$ac_save_CFLAGS" +elif test $ac_cv_prog_cc_g = yes; then + if test "$GCC" = yes; then + CFLAGS="-g -O2" + else + CFLAGS="-g" + fi +else + if test "$GCC" = yes; then + CFLAGS="-O2" + else + CFLAGS= + fi +fi + +if test "x$GCC" = "xyes"; then + CFLAGS="$CFLAGS -Wall" +fi +# Extract the first word of "ranlib", so it can be a program name with args. +set dummy ranlib; ac_word=$2 +echo $ac_n "checking for $ac_word""... $ac_c" 1>&6 +echo "configure:862: checking for $ac_word" >&5 +if eval "test \"`echo '$''{'ac_cv_prog_RANLIB'+set}'`\" = set"; then + echo $ac_n "(cached) $ac_c" 1>&6 +else + if test -n "$RANLIB"; then + ac_cv_prog_RANLIB="$RANLIB" # Let the user override the test. +else + IFS="${IFS= }"; ac_save_ifs="$IFS"; IFS=":" + ac_dummy="$PATH" + for ac_dir in $ac_dummy; do + test -z "$ac_dir" && ac_dir=. + if test -f $ac_dir/$ac_word; then + ac_cv_prog_RANLIB="ranlib" + break + fi + done + IFS="$ac_save_ifs" + test -z "$ac_cv_prog_RANLIB" && ac_cv_prog_RANLIB=":" +fi +fi +RANLIB="$ac_cv_prog_RANLIB" +if test -n "$RANLIB"; then + echo "$ac_t""$RANLIB" 1>&6 +else + echo "$ac_t""no" 1>&6 +fi + +if test $host != $build; then + ac_tool_prefix=${host_alias}- +else + ac_tool_prefix= +fi + +# Extract the first word of "${ac_tool_prefix}ar", so it can be a program name with args. +set dummy ${ac_tool_prefix}ar; ac_word=$2 +echo $ac_n "checking for $ac_word""... $ac_c" 1>&6 +echo "configure:898: checking for $ac_word" >&5 +if eval "test \"`echo '$''{'ac_cv_prog_AR'+set}'`\" = set"; then + echo $ac_n "(cached) $ac_c" 1>&6 +else + if test -n "$AR"; then + ac_cv_prog_AR="$AR" # Let the user override the test. +else + IFS="${IFS= }"; ac_save_ifs="$IFS"; IFS=":" + ac_dummy="$PATH" + for ac_dir in $ac_dummy; do + test -z "$ac_dir" && ac_dir=. + if test -f $ac_dir/$ac_word; then + ac_cv_prog_AR="${ac_tool_prefix}ar" + break + fi + done + IFS="$ac_save_ifs" + test -z "$ac_cv_prog_AR" && ac_cv_prog_AR="ar" +fi +fi +AR="$ac_cv_prog_AR" +if test -n "$AR"; then + echo "$ac_t""$AR" 1>&6 +else + echo "$ac_t""no" 1>&6 +fi + + + + +echo $ac_n "checking whether byte ordering is bigendian""... $ac_c" 1>&6 +echo "configure:929: checking whether byte ordering is bigendian" >&5 +if eval "test \"`echo '$''{'ac_cv_c_bigendian'+set}'`\" = set"; then + echo $ac_n "(cached) $ac_c" 1>&6 +else + ac_cv_c_bigendian=unknown +# See if sys/param.h defines the BYTE_ORDER macro. +cat > conftest.$ac_ext < +#include +int main() { + +#if !BYTE_ORDER || !BIG_ENDIAN || !LITTLE_ENDIAN + bogus endian macros +#endif +; return 0; } +EOF +if { (eval echo configure:947: \"$ac_compile\") 1>&5; (eval $ac_compile) 2>&5; }; then + rm -rf conftest* + # It does; now see whether it defined to BIG_ENDIAN or not. +cat > conftest.$ac_ext < +#include +int main() { + +#if BYTE_ORDER != BIG_ENDIAN + not big endian +#endif +; return 0; } +EOF +if { (eval echo configure:962: \"$ac_compile\") 1>&5; (eval $ac_compile) 2>&5; }; then + rm -rf conftest* + ac_cv_c_bigendian=yes +else + echo "configure: failed program was:" >&5 + cat conftest.$ac_ext >&5 + rm -rf conftest* + ac_cv_c_bigendian=no +fi +rm -f conftest* +else + echo "configure: failed program was:" >&5 + cat conftest.$ac_ext >&5 +fi +rm -f conftest* +if test $ac_cv_c_bigendian = unknown; then +if test "$cross_compiling" = yes; then + { echo "configure: error: can not run test program while cross compiling" 1>&2; exit 1; } +else + cat > conftest.$ac_ext <&5; (eval $ac_link) 2>&5; } && test -s conftest${ac_exeext} && (./conftest; exit) 2>/dev/null +then + ac_cv_c_bigendian=no +else + echo "configure: failed program was:" >&5 + cat conftest.$ac_ext >&5 + rm -fr conftest* + ac_cv_c_bigendian=yes +fi +rm -fr conftest* +fi + +fi +fi + +echo "$ac_t""$ac_cv_c_bigendian" 1>&6 +if test $ac_cv_c_bigendian = yes; then + cat >> confdefs.h <<\EOF +#define WORDS_BIGENDIAN 1 +EOF + +fi + + +trap '' 1 2 15 +cat > confcache <<\EOF +# This file is a shell script that caches the results of configure +# tests run on this system so they can be shared between configure +# scripts and configure runs. It is not useful on other systems. +# If it contains results you don't want to keep, you may remove or edit it. +# +# By default, configure uses ./config.cache as the cache file, +# creating it if it does not exist already. You can give configure +# the --cache-file=FILE option to use a different cache file; that is +# what configure does when it calls configure scripts in +# subdirectories, so they share the cache. +# Giving --cache-file=/dev/null disables caching, for debugging configure. +# config.status only pays attention to the cache file if you give it the +# --recheck option to rerun configure. +# +EOF +# The following way of writing the cache mishandles newlines in values, +# but we know of no workaround that is simple, portable, and efficient. +# So, don't put newlines in cache variables' values. +# Ultrix sh set writes to stderr and can't be redirected directly, +# and sets the high bit in the cache file unless we assign to the vars. +(set) 2>&1 | + case `(ac_space=' '; set | grep ac_space) 2>&1` in + *ac_space=\ *) + # `set' does not quote correctly, so add quotes (double-quote substitution + # turns \\\\ into \\, and sed turns \\ into \). + sed -n \ + -e "s/'/'\\\\''/g" \ + -e "s/^\\([a-zA-Z0-9_]*_cv_[a-zA-Z0-9_]*\\)=\\(.*\\)/\\1=\${\\1='\\2'}/p" + ;; + *) + # `set' quotes correctly as required by POSIX, so do not add quotes. + sed -n -e 's/^\([a-zA-Z0-9_]*_cv_[a-zA-Z0-9_]*\)=\(.*\)/\1=${\1=\2}/p' + ;; + esac >> confcache +if cmp -s $cache_file confcache; then + : +else + if test -w $cache_file; then + echo "updating cache $cache_file" + cat confcache > $cache_file + else + echo "not updating unwritable cache $cache_file" + fi +fi +rm -f confcache + +trap 'rm -fr conftest* confdefs* core core.* *.core $ac_clean_files; exit 1' 1 2 15 + +test "x$prefix" = xNONE && prefix=$ac_default_prefix +# Let make expand exec_prefix. +test "x$exec_prefix" = xNONE && exec_prefix='${prefix}' + +# Any assignment to VPATH causes Sun make to only execute +# the first set of double-colon rules, so remove it if not needed. +# If there is a colon in the path, we need to keep it. +if test "x$srcdir" = x.; then + ac_vpsub='/^[ ]*VPATH[ ]*=[^:]*$/d' +fi + +trap 'rm -f $CONFIG_STATUS conftest*; exit 1' 1 2 15 + +# Transform confdefs.h into DEFS. +# Protect against shell expansion while executing Makefile rules. +# Protect against Makefile macro expansion. +cat > conftest.defs <<\EOF +s%#define \([A-Za-z_][A-Za-z0-9_]*\) *\(.*\)%-D\1=\2%g +s%[ `~#$^&*(){}\\|;'"<>?]%\\&%g +s%\[%\\&%g +s%\]%\\&%g +s%\$%$$%g +EOF +DEFS=`sed -f conftest.defs confdefs.h | tr '\012' ' '` +rm -f conftest.defs + + +# Without the "./", some shells look in PATH for config.status. +: ${CONFIG_STATUS=./config.status} + +echo creating $CONFIG_STATUS +rm -f $CONFIG_STATUS +cat > $CONFIG_STATUS </dev/null | sed 1q`: +# +# $0 $ac_configure_args +# +# Compiler output produced by configure, useful for debugging +# configure, is in ./config.log if it exists. + +ac_cs_usage="Usage: $CONFIG_STATUS [--recheck] [--version] [--help]" +for ac_option +do + case "\$ac_option" in + -recheck | --recheck | --rechec | --reche | --rech | --rec | --re | --r) + echo "running \${CONFIG_SHELL-/bin/sh} $0 $ac_configure_args --no-create --no-recursion" + exec \${CONFIG_SHELL-/bin/sh} $0 $ac_configure_args --no-create --no-recursion ;; + -version | --version | --versio | --versi | --vers | --ver | --ve | --v) + echo "$CONFIG_STATUS generated by autoconf version 2.13" + exit 0 ;; + -help | --help | --hel | --he | --h) + echo "\$ac_cs_usage"; exit 0 ;; + *) echo "\$ac_cs_usage"; exit 1 ;; + esac +done + +ac_given_srcdir=$srcdir + +trap 'rm -fr `echo "config/config" | sed "s/:[^ ]*//g"` conftest*; exit 1' 1 2 15 +EOF +cat >> $CONFIG_STATUS < conftest.subs <<\\CEOF +$ac_vpsub +$extrasub +s%@SHELL@%$SHELL%g +s%@CFLAGS@%$CFLAGS%g +s%@CPPFLAGS@%$CPPFLAGS%g +s%@CXXFLAGS@%$CXXFLAGS%g +s%@FFLAGS@%$FFLAGS%g +s%@DEFS@%$DEFS%g +s%@LDFLAGS@%$LDFLAGS%g +s%@LIBS@%$LIBS%g +s%@exec_prefix@%$exec_prefix%g +s%@prefix@%$prefix%g +s%@program_transform_name@%$program_transform_name%g +s%@bindir@%$bindir%g +s%@sbindir@%$sbindir%g +s%@libexecdir@%$libexecdir%g +s%@datadir@%$datadir%g +s%@sysconfdir@%$sysconfdir%g +s%@sharedstatedir@%$sharedstatedir%g +s%@localstatedir@%$localstatedir%g +s%@libdir@%$libdir%g +s%@includedir@%$includedir%g +s%@oldincludedir@%$oldincludedir%g +s%@infodir@%$infodir%g +s%@mandir@%$mandir%g +s%@host@%$host%g +s%@host_alias@%$host_alias%g +s%@host_cpu@%$host_cpu%g +s%@host_vendor@%$host_vendor%g +s%@host_os@%$host_os%g +s%@target@%$target%g +s%@target_alias@%$target_alias%g +s%@target_cpu@%$target_cpu%g +s%@target_vendor@%$target_vendor%g +s%@target_os@%$target_os%g +s%@build@%$build%g +s%@build_alias@%$build_alias%g +s%@build_cpu@%$build_cpu%g +s%@build_vendor@%$build_vendor%g +s%@build_os@%$build_os%g +s%@CC@%$CC%g +s%@RANLIB@%$RANLIB%g +s%@AR@%$AR%g + +CEOF +EOF + +cat >> $CONFIG_STATUS <<\EOF + +# Split the substitutions into bite-sized pieces for seds with +# small command number limits, like on Digital OSF/1 and HP-UX. +ac_max_sed_cmds=90 # Maximum number of lines to put in a sed script. +ac_file=1 # Number of current file. +ac_beg=1 # First line for current file. +ac_end=$ac_max_sed_cmds # Line after last line for current file. +ac_more_lines=: +ac_sed_cmds="" +while $ac_more_lines; do + if test $ac_beg -gt 1; then + sed "1,${ac_beg}d; ${ac_end}q" conftest.subs > conftest.s$ac_file + else + sed "${ac_end}q" conftest.subs > conftest.s$ac_file + fi + if test ! -s conftest.s$ac_file; then + ac_more_lines=false + rm -f conftest.s$ac_file + else + if test -z "$ac_sed_cmds"; then + ac_sed_cmds="sed -f conftest.s$ac_file" + else + ac_sed_cmds="$ac_sed_cmds | sed -f conftest.s$ac_file" + fi + ac_file=`expr $ac_file + 1` + ac_beg=$ac_end + ac_end=`expr $ac_end + $ac_max_sed_cmds` + fi +done +if test -z "$ac_sed_cmds"; then + ac_sed_cmds=cat +fi +EOF + +cat >> $CONFIG_STATUS <> $CONFIG_STATUS <<\EOF +for ac_file in .. $CONFIG_FILES; do if test "x$ac_file" != x..; then + # Support "outfile[:infile[:infile...]]", defaulting infile="outfile.in". + case "$ac_file" in + *:*) ac_file_in=`echo "$ac_file"|sed 's%[^:]*:%%'` + ac_file=`echo "$ac_file"|sed 's%:.*%%'` ;; + *) ac_file_in="${ac_file}.in" ;; + esac + + # Adjust a relative srcdir, top_srcdir, and INSTALL for subdirectories. + + # Remove last slash and all that follows it. Not all systems have dirname. + ac_dir=`echo $ac_file|sed 's%/[^/][^/]*$%%'` + if test "$ac_dir" != "$ac_file" && test "$ac_dir" != .; then + # The file is in a subdirectory. + test ! -d "$ac_dir" && mkdir "$ac_dir" + ac_dir_suffix="/`echo $ac_dir|sed 's%^\./%%'`" + # A "../" for each directory in $ac_dir_suffix. + ac_dots=`echo $ac_dir_suffix|sed 's%/[^/]*%../%g'` + else + ac_dir_suffix= ac_dots= + fi + + case "$ac_given_srcdir" in + .) srcdir=. + if test -z "$ac_dots"; then top_srcdir=. + else top_srcdir=`echo $ac_dots|sed 's%/$%%'`; fi ;; + /*) srcdir="$ac_given_srcdir$ac_dir_suffix"; top_srcdir="$ac_given_srcdir" ;; + *) # Relative path. + srcdir="$ac_dots$ac_given_srcdir$ac_dir_suffix" + top_srcdir="$ac_dots$ac_given_srcdir" ;; + esac + + + echo creating "$ac_file" + rm -f "$ac_file" + configure_input="Generated automatically from `echo $ac_file_in|sed 's%.*/%%'` by configure." + case "$ac_file" in + *Makefile*) ac_comsub="1i\\ +# $configure_input" ;; + *) ac_comsub= ;; + esac + + ac_file_inputs=`echo $ac_file_in|sed -e "s%^%$ac_given_srcdir/%" -e "s%:% $ac_given_srcdir/%g"` + sed -e "$ac_comsub +s%@configure_input@%$configure_input%g +s%@srcdir@%$srcdir%g +s%@top_srcdir@%$top_srcdir%g +" $ac_file_inputs | (eval "$ac_sed_cmds") > $ac_file +fi; done +rm -f conftest.s* + +EOF +cat >> $CONFIG_STATUS <> $CONFIG_STATUS <<\EOF + +exit 0 +EOF +chmod +x $CONFIG_STATUS +rm -fr confdefs* $ac_clean_files +test "$no_create" = yes || ${CONFIG_SHELL-/bin/sh} $CONFIG_STATUS || exit 1 + diff --git a/configure.in b/configure.in new file mode 100644 index 0000000..76b3c5d --- /dev/null +++ b/configure.in @@ -0,0 +1,45 @@ +dnl######################################################################## +dnl ## +dnl Centre for Speech Technology Research ## +dnl University of Edinburgh, UK ## +dnl Copyright (c) 1996-2001 ## +dnl All Rights Reserved. ## +dnl ## +dnl Permission is hereby granted, free of charge, to use and distribute ## +dnl this software and its documentation without restriction, including ## +dnl without limitation the rights to use, copy, modify, merge, publish, ## +dnl distribute, sublicense, and/or sell copies of this work, and to ## +dnl permit persons to whom this work is furnished to do so, subject to ## +dnl the following conditions: ## +dnl 1. The code must retain the above copyright notice, this list of ## +dnl conditions and the following disclaimer. ## +dnl 2. Any modifications must be clearly marked as such. ## +dnl 3. Original authors' names are not deleted. ## +dnl 4. The authors' names are not used to endorse or promote products ## +dnl derived from this software without specific prior written ## +dnl permission. ## +dnl ## +dnl THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +dnl DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +dnl ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +dnl SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +dnl FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +dnl WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +dnl AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +dnl ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +dnl THIS SOFTWARE. ## +dnl ## +dnl######################################################################## +AC_INIT(src/include/festival.h) + +AC_CANONICAL_SYSTEM +AC_PROG_CC +if test "x$GCC" = "xyes"; then + CFLAGS="$CFLAGS -Wall" +fi +AC_PROG_RANLIB +AC_CHECK_TOOL(AR, ar) + +AC_C_BIGENDIAN + +AC_OUTPUT(config/config) diff --git a/doc/Makefile b/doc/Makefile new file mode 100644 index 0000000..e38e419 --- /dev/null +++ b/doc/Makefile @@ -0,0 +1,114 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +TOP=.. +DIRNAME=doc + +DOCNAME=festival +DSSL_SANE_DB=$(EST)/doc/sane_to_docbook.dsl +DSSSL_HTML=$(EST)/doc/cstr.dssl +DSSSL=$(DSSSL_HTML) + +# Temporarilly we explicitly list the programs which have been documented + +MAIN_TO_DOCUMENT= + +EXAMPLE_TO_DOCUMENT= + +FESTIVAL=$(TOP)/bin/festival --libdir $(TOP)/lib + +# Include some of EST documentation. +DOCXX_EXTRA_FILES = + +DOCXXIMAGES = edcrest.gif cstr.gif \ + est.jpg est_small.jpg \ + festival.jpg festival_small.jpg festival_tiny.jpg +DOCXXFILES= classHeader.inc hierHeader.inc indexHeader.inc \ + banner.inc $(DOCXXIMAGES) + +MANPAGES = festival.head festival.tail \ + festival_client.head festival_client.tail + +#SGMLFILES = festival.sgml \ +# introductory.sgml basics.sgml core.sgml advanced.sgml programming.sgml + +FILES=Makefile $(MANPAGES) festival.texi $(SGMLFILES) refcard.tex $(DOCXXFILES) +LOCAL_CLEAN = *.aux *.cp *.fn *.ky *.log *.pg *.toc *.tp *.vr + +ALL = festival.1 festival_client.1 + +include $(TOP)/config/common_make_rules +include $(EST)/config/rules/doc.mak + +%.1 : %.head %.options %.tail + cat $^ >$@ +%.options : $(TOP)/src/main/% + $(TOP)/bin/$* -man_options >$@ +%.options : $(TOP)/src/main/%.exe + $(TOP)/bin/$* -man_options >$@ + +festival.info: festival.texi festfunc.texi festvars.texi festfeat.texi + @ if [ ! -d info ] ; \ + then mkdir -p info ; fi + @ sed 's/@url{/@file{/g' info/festival.texi + @ cp festfunc.texi info/festfunc.texi + @ cp festvars.texi info/festvars.texi + @ cp festfeat.texi info/festfeat.texi + ( cd info; makeinfo festival.texi ) + @ rm info/*.texi +# texi2html is available from http://wwwcn.cern.ch/dci/texi2html/ +festival.html: festival.texi festfunc.texi festvars.texi + @ if [ ! -d html ] ; \ + then mkdir -p html ; fi + (cd html; texi2html -number -split_chapter ../festival.texi) +# give the html files background color of white + @ for i in html/*.html ; \ + do \ + sed 's///' $$i >ttt.html; \ + mv ttt.html $$i ; \ + done +festival.ps: festival.dvi + dvips -f festival.dvi >festival.ps +festival.dvi: festival.texi festfunc.texi festvars.texi + tex festival.texi + texindex festival.cp + tex festival.texi +doc: festival.ps festival.html festival.info + +festfunc.texi festvars.texi festfeat.texi: $(TOP)/src/main/festival + echo "(load_library \"festdoc.scm\") (make-doc)" | $(FESTIVAL) + +refcard.dvi: refcard.tex + latex refcard.tex +refcard.ps: refcard.dvi + dvips -f -t landscape refcard.dvi >refcard.ps + diff --git a/doc/banner.inc b/doc/banner.inc new file mode 100644 index 0000000..fc23d8d --- /dev/null +++ b/doc/banner.inc @@ -0,0 +1,14 @@ + + + + +

This page is part of the + +Festival Text to Speech System documentation +
+Copyright University of Edinburgh 1997 +
+Contact: + festival@cstr.ed.ac.uk +

+
diff --git a/doc/classHeader.inc b/doc/classHeader.inc new file mode 100644 index 0000000..ef733c6 --- /dev/null +++ b/doc/classHeader.inc @@ -0,0 +1,14 @@ + + + + + + + +
+ + +
+

diff --git a/doc/cstr.gif b/doc/cstr.gif new file mode 100644 index 0000000..24dda8f Binary files /dev/null and b/doc/cstr.gif differ diff --git a/doc/edcrest.gif b/doc/edcrest.gif new file mode 100644 index 0000000..7a7ac89 Binary files /dev/null and b/doc/edcrest.gif differ diff --git a/doc/est.jpg b/doc/est.jpg new file mode 100644 index 0000000..ad99829 Binary files /dev/null and b/doc/est.jpg differ diff --git a/doc/est_small.jpg b/doc/est_small.jpg new file mode 100644 index 0000000..0738394 Binary files /dev/null and b/doc/est_small.jpg differ diff --git a/doc/festival.head b/doc/festival.head new file mode 100644 index 0000000..6e47152 --- /dev/null +++ b/doc/festival.head @@ -0,0 +1,25 @@ +.TH FESTIVAL 1 "6th Apr 1998" +.SH NAME +festival \- a text-to-speech system. +.SH SYNOPSIS +.B festival +.I [options] +.I [file0] +.I [file1] +.I ... + + +.SH DESCRIPTION +Festival is a general purpose text-to-speech system. As well as +simply rendering text as speech it can be used in an interactive +command mode for testing and developing various aspects of speech +synthesis technology. + +Festival has two major modes, command and tts (text-to-speech). +When in command mode input (from file or interactively) is interpreted +by the command interpreter. When in tts mode input is rendered as +speech. When in command mode filenames that start with a left +paranthesis are treated as literal commands and evaluated. + +.SH OPTIONS + diff --git a/doc/festival.jpg b/doc/festival.jpg new file mode 100644 index 0000000..e3a13b4 Binary files /dev/null and b/doc/festival.jpg differ diff --git a/doc/festival.tail b/doc/festival.tail new file mode 100644 index 0000000..2bcf499 --- /dev/null +++ b/doc/festival.tail @@ -0,0 +1,25 @@ +.SH BUGS +More than you can imagine. + +A manual with much detail (though not complete) is available +in distributed as part of the system and is also accessible at +.br +http://www.cstr.ed.ac.uk/projects/festival/manual/ + +Although we cannot guarantee the time required to fix bugs, we +would appreciated it if they were reported to +.br +festival-bug@cstr.ed.ac.uk + +.SH AUTHOR +Alan W Black, Richard Caley and Paul Taylor +.br +(C) Centre for Speech Technology Research, 1996-1998 +.br +University of Edinburgh +.br +80 South Bridge +.br +Edinburgh EH1 1HN +.br +http://www.cstr.ed.ac.uk/projects/festival.html diff --git a/doc/festival.texi b/doc/festival.texi new file mode 100644 index 0000000..94cfb8b --- /dev/null +++ b/doc/festival.texi @@ -0,0 +1,8576 @@ +\input texinfo @c -*-texinfo-*- +@c %**start of header +@setfilename festival.info +@settitle Festival Speech Synthesis System +@finalout +@setchapternewpage odd +@c %**end of header + +@c This document was modelled on the numerous examples of texinfo +@c documentation available with GNU software, primarily the hello +@c world example, but many others too. I happily acknowledge their +@c aid in producing this document -- awb + +@set EDITION 1.4 +@set VERSION 1.4.3 +@set UPDATED 27th December 2002 + +@ifinfo +This file documents the @code{Festival} Speech Synthesis System a general +text to speech system for making your computer talk and developing +new synthesis techniques. + +Copyright (C) 1996-2004 University of Edinburgh + +Permission is granted to make and distribute verbatim copies of +this manual provided the copyright notice and this permission notice +are preserved on all copies. + +@ignore +Permission is granted to process this file through TeX, or otherwise and +print the results, provided the printed document carries copying +permission notice identical to this one except for the removal of this +paragraph (this paragraph not being relevant to the printed manual). + +@end ignore +Permission is granted to copy and distribute modified versions of this +manual under the conditions for verbatim copying, provided that the entire +resulting derived work is distributed under the terms of a permission +notice identical to this one. + +Permission is granted to copy and distribute translations of this manual +into another language, under the above conditions for modified versions, +except that this permission notice may be stated in a translation approved +by the authors. +@end ifinfo + +@titlepage +@title The Festival Speech Synthesis System +@subtitle System documentation +@subtitle Edition @value{EDITION}, for Festival Version @value{VERSION} +@subtitle @value{UPDATED} +@author by Alan W Black, Paul Taylor and Richard Caley. + +@page +@vskip 0pt plus 1filll +Copyright @copyright{} 1996-2004 University of Edinburgh, all rights +reserved. + +Permission is granted to make and distribute verbatim copies of +this manual provided the copyright notice and this permission notice +are preserved on all copies. + +Permission is granted to copy and distribute modified versions of this +manual under the conditions for verbatim copying, provided that the entire +resulting derived work is distributed under the terms of a permission +notice identical to this one. + +Permission is granted to copy and distribute translations of this manual +into another language, under the above conditions for modified versions, +except that this permission notice may be stated in a translation approved +by the University of Edinburgh +@end titlepage + +@node Top, , , (dir) + +@ifinfo +This file documents the @emph{Festival Speech Synthesis System} +@value{VERSION}. This document contains many gaps and is still in the +process of being written. +@end ifinfo + +@menu +* Abstract:: initial comments +* Copying:: How you can copy and share the code +* Acknowledgements:: List of contributors +* What is new:: Enhancements since last public release + +* Overview:: Generalities and Philosophy +* Installation:: Compilation and Installation +* Quick start:: Just tell me what to type +* Scheme:: A quick introduction to Festival's scripting language + +Text methods for interfacing to Festival +* TTS:: Text to speech modes +* XML/SGML mark-up:: XML/SGML mark-up Language +* Emacs interface:: Using Festival within Emacs + +Internal functions +* Phonesets:: Defining and using phonesets +* Lexicons:: Building and compiling Lexicons +* Utterances:: Existing and defining new utterance types + +Modules +* Text analysis:: Tokenizing text +* POS tagging:: Part of speech tagging +* Phrase breaks:: Finding phrase breaks +* Intonation:: Intonations modules +* Duration:: Duration modules +* UniSyn synthesizer:: The UniSyn waveform synthesizer +* Diphone synthesizer:: Building and using diphone synthesizers +* Other synthesis methods:: other waveform synthesis methods +* Audio output:: Getting sound from Festival + +* Voices:: Adding new voices (and languages) + +* Tools:: CART, Ngrams etc + +* Building models from databases:: + +Adding new modules and writing C++ code +* Programming:: Programming in Festival (Lisp/C/C++) +* API:: Using Festival in other programs + +* Examples:: Some simple (and not so simple) examples + +* Problems:: Reporting bugs. +* References:: Other sources of information +* Feature functions:: List of builtin feature functions. +* Variable list:: Short descriptions of all variables +* Function list:: Short descriptions of all functions +* Index:: Index of concepts. +@end menu + +@node Abstract, Copying, , Top +@chapter Abstract + +This document provides a user manual for the Festival +Speech Synthesis System, version @value{VERSION}. + +Festival offers a general framework for building speech synthesis +systems as well as including examples of various modules. As a whole it +offers full text to speech through a number APIs: from shell level, +though a Scheme command interpreter, as a C++ library, and an Emacs +interface. Festival is multi-lingual, we have develeoped voices in many +languages including English (UK and US), Spanish and Welsh, though +English is the most advanced. + +The system is written in C++ and uses the Edinburgh Speech Tools +for low level architecture and has a Scheme (SIOD) based command +interpreter for control. Documentation is given in the FSF texinfo +format which can generate a printed manual, info files and HTML. + +The latest details and a full software distribution of the Festival Speech +Synthesis System are available through its home page which may be found +at +@example +@url{http://www.cstr.ed.ac.uk/projects/festival.html} +@end example + +@node Copying, Acknowledgements, Abstract, Top +@chapter Copying + +@cindex restrictions +@cindex redistribution +As we feeel the core system has reached an acceptable level of maturity +from 1.4.0 the basic system is released under a free lience, without the +commercial restrictions we imposed on early versions. The basic system +has been placed under an X11 type licence which as free licences go is +pretty free. No GPL code is included in festival or the speech tools +themselves (though some auxiliary files are GPL'd e.g. the Emacs mode +for Festival). We have deliberately choosen a licence that should be +compatible with our commercial partners and our free software users. + +However although the code is free, we still offer no warranties and no +maintenance. We will continue to endeavor to fix bugs and answer +queries when can, but are not in a position to guarantee it. We will +consider maintenance contracts and consultancy if desired, please +contacts us for details. + +Also note that not all the voices and lexicons we distribute with +festival are free. Particularly the British English lexicon derived +from Oxford Advanced Learners' Dictionary is free only for +non-commercial use (we will release an alternative soon). Also the +Spanish diphone voice we relase is only free for non-commercial use. + +If you are using Festival or the speech tools in commercial environment, +even though no licence is required, we would be grateful if you let us +know as it helps justify ourselves to our various sponsors. + +The current copyright on the core system is +@example + The Festival Speech Synthesis System: version 1.4.3 + Centre for Speech Technology Research + University of Edinburgh, UK + Copyright (c) 1996-2004 + All Rights Reserved. + + Permission is hereby granted, free of charge, to use and distribute + this software and its documentation without restriction, including + without limitation the rights to use, copy, modify, merge, publish, + distribute, sublicense, and/or sell copies of this work, and to + permit persons to whom this work is furnished to do so, subject to + the following conditions: + 1. The code must retain the above copyright notice, this list of + conditions and the following disclaimer. + 2. Any modifications must be clearly marked as such. + 3. Original authors' names are not deleted. + 4. The authors' names are not used to endorse or promote products + derived from this software without specific prior written + permission. + + THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK + DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING + ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT + SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE + FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN + AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, + ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF + THIS SOFTWARE. +@end example + +@node Acknowledgements, What is new, Copying, Top +@chapter Acknowledgements +@cindex acknowledgements +@cindex thanks + +The code in this system was primarily written by Alan W Black, Paul +Taylor and Richard Caley. Festival sits on top of the Edinburgh Speech +Tools Library, and uses much of its functionality. + +Amy Isard wrote a synthesizer for her MSc project in 1995, which first +used the Edinburgh Speech Tools Library. Although Festival doesn't +contain any code from that system, her system was used as a basic model. + +Much of the design and philosophy of Festival has been built on the +experience both Paul and Alan gained from the development of various +previous synthesizers and software systems, especially CSTR's Osprey and +Polyglot systems @cite{taylor91} and ATR's CHATR system @cite{black94}. + +However, it should be stated that Festival is fully developed at CSTR +and contains neither proprietary code or ideas. + +Festival contains a number of subsystems integrated from other sources +and we acknowledge those systems here. + +@section SIOD +@cindex SIOD +@cindex Scheme +@cindex Paradigm Associates + +The Scheme interpreter (SIOD -- Scheme In One Defun 3.0) was +written by George Carrett (gjc@@mitech.com, gjc@@paradigm.com) +and offers a basic small Scheme (Lisp) interpreter suitable +for embedding in applications such as Festival as a scripting +language. A number of changes and improvements have been added +in our development but it still remains that basic system. +We are grateful to George and Paradigm Associates Incorporated +for providing such a useful and well-written sub-system. +@example + Scheme In One Defun (SIOD) + COPYRIGHT (c) 1988-1994 BY + PARADIGM ASSOCIATES INCORPORATED, CAMBRIDGE, MASSACHUSETTS. + ALL RIGHTS RESERVED + +Permission to use, copy, modify, distribute and sell this software +and its documentation for any purpose and without fee is hereby +granted, provided that the above copyright notice appear in all copies +and that both that copyright notice and this permission notice appear +in supporting documentation, and that the name of Paradigm Associates +Inc not be used in advertising or publicity pertaining to distribution +of the software without specific, written prior permission. + +PARADIGM DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING +ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL +PARADIGM BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR +ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, +WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, +ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS +SOFTWARE. +@end example + +@section editline + +Because of conflicts between the copyright for GNU readline, for which +an optional interface was included in earlier versions, we have replace +the interface with a complete command line editing system based on +@file{editline}. @file{Editline} was posted to the USENET newsgroup +@file{comp.sources.misc} in 1992. A number of modifications have been +made to make it more useful to us but the original code (contained +within the standard speech tools distribution) and our modifications +fall under the following licence. +@example +Copyright 1992 Simmule Turner and Rich Salz. All rights reserved. + +This software is not subject to any license of the American Telephone +and Telegraph Company or of the Regents of the University of California. + +Permission is granted to anyone to use this software for any purpose on +any computer system, and to alter it and redistribute it freely, subject +to the following restrictions: +1. The authors are not responsible for the consequences of use of this + software, no matter how awful, even if they arise from flaws in it. +2. The origin of this software must not be misrepresented, either by + explicit claim or by omission. Since few users ever read sources, + credits must appear in the documentation. +3. Altered versions must be plainly marked as such, and must not be + misrepresented as being the original software. Since few users + ever read sources, credits must appear in the documentation. +4. This notice may not be removed or altered. +@end example + +@section Edinburgh Speech Tools Library + +@cindex Edinburgh Speech Tools Library +The Edinburgh Speech Tools lies at the core of Festival. Although +developed separately, much of the development of certain parts of the +Edinburgh Speech Tools has been directed by Festival's needs. In turn +those who have contributed to the Speech Tools make Festival +a more usable system. + +@xref{Acknowledgements, , Acknowledgements, speechtools, + Edinburgh Speech Tools Library Manual}. + +Online information about the Edinburgh Speech Tools library +is available through +@example +@url{http://www.cstr.ed.ac.uk/projects/speech_tools.html} +@end example + +@section Others + +Many others have provided actual code and support for Festival, +for which we are grateful. Specifically, + +@itemize @bullet +@item Alistair Conkie: +various low level code points and some design work, +Spanish synthesis, the old diphone synthesis code. +@item Steve Isard: +directorship and LPC diphone code, design of diphone schema. +@item EPSRC: +who fund Alan Black and Paul Taylor. +@item Sun Microsystems Laboratories: +for supporting the project and funding Richard. +@item AT&T Labs - Research: +for supporting the project. +@item Paradigm Associates and George Carrett: +for Scheme in one defun. +@item Mike Macon: +Improving the quality of the diphone synthesizer and LPC analysis. +@item Kurt Dusterhoff: +Tilt intonation training and modelling. +@item Amy Isard: +for her SSML project and related synthesizer. +@item Richard Tobin: +for answering all those difficult questions, the socket code, +and the XML parser. +@item Simmule Turner and Rich Salz: +command line editor (editline) +@item Borja Etxebarria: +Help with the Spanish synthesis +@item Briony Williams: +Welsh synthesis +@item Jacques H. de Villiers: @file{jacques@@cse.ogi.edu} from CSLU +at OGI, for the TCL interface, and other usability issues +@item Kevin Lenzo: @file{lenzo@@cs.cmu.edu} from CMU for the PERL +interface. +@item Rob Clarke: +for support under Linux. +@item Samuel Audet @file{guardia@@cam.org}: +OS/2 support +@item Mari Ostendorf: +For providing access to the BU FM Radio corpus from which some +modules were trained. +@item Melvin Hunt: +from whose work we based our residual LPC synthesis model on +@item Oxford Text Archive: +For the computer users version of Oxford Advanced +Learners' Dictionary (redistributed with permission). +@item Reading University: +for access to MARSEC from which the phrase break +model was trained. +@item LDC & Penn Tree Bank: +from which the POS tagger was trained, redistribution +of the models is with permission from the LDC. +@item Roger Burroughes and Kurt Dusterhoff: +For letting us capture their voices. +@item ATR and Nick Campbell: +for first getting Paul and Alan to work together and for the +experience we gained. +@item FSF: +for G++, make, .... +@item Center for Spoken Language Understanding: +CSLU at OGI, particularly Ron Cole and Mike Macon, have acted as +significant users for the system giving significant feedback and +allowing us to teach courses on Festival offering valuable real-use +feedback. +@item Our beta testers: +Thanks to all the people who put up with previous versions of the system +and reported bugs, both big and small. These comments are very important +to the constant improvements in the system. And thanks for your quick +responses when I had specific requests. +@item And our users ... +Many people have downloaded earlier versions of the system. Many have +found problems with installation and use and have reported it to us. +Many of you have put up with multiple compilations trying to fix bugs +remotely. We thank you for putting up with us and are pleased you've +taken the time to help us improve our system. Many of you have come up +with uses we hadn't thought of, which is always rewarding. + +Even if you haven't actively responded, the fact that you use the system +at all makes it worthwhile. +@end itemize + +@node What is new, Overview, Acknowledgements , Top +@chapter What is new + +Compared to the the previous major release (1.3.0 release Aug 1998) +1.4.0 is not functionally so different from its previous versions. +This release is primarily a consolidation release fixing and tidying +up some of the lower level aspects of the system to allow better +modularity for some of our future planned modules. + +@itemize @bullet +@item Copyright change: +The system is now free and has no commercial restriction. Note that +currently on the US voices (ked and kal) are also now unrestricted. The +UK English voices depend on the Oxford Advanced Learners' Dictionary of +Current English which cannot be used for commercial use without +permission from Oxford University Press. + +@item Architecture tidy up: +the interfaces to lower level part parts of the system have been tidied +up deleting some of the older code that was supported for +compatibility reasons. This is a much higher dependence of features +and easier (and safer) ways to register new objects as feature values +and Scheme objects. Scheme has been tidied up. It is no longer +"in one defun" but "in one directory". + +@item New documentation system for speech tools: +A new docbook based documentation system has been added to the +speech tools. Festival's documentation will will move +over to this sometime soon too. + +@item initial JSAPI support: both JSAPI and JSML (somewhat +similar to Sable) now have initial impelementations. They of course +depend on Java support which so far we have only (successfully) +investgated under Solaris and Linux. + +@item Generalization of statistical models: CART, ngrams, +and WFSTs are now fully supported from Lisp and can be used with a +generalized viterbi function. This makes adding quite complex statistical +models easy without adding new C++. + +@item Tilt Intonation modelling: +Full support is now included for the Tilt intomation models, +both training and use. + +@item Documentation on Bulding New Voices in Festival: +documentation, scripts etc. for building new voices and languages in +the system, see +@example +@url{http://www.cstr.ed.ac.uk/projects/festival/docs/festvox/} +@end example + +@end itemize + +@node Overview, Installation , What is new, Top +@chapter Overview + +Festival is designed as a speech synthesis system for at least three +levels of user. First, those who simply want high quality speech from +arbitrary text with the minimum of effort. Second, those who are +developing language systems and wish to include synthesis output. In +this case, a certain amount of customization is desired, such as +different voices, specific phrasing, dialog types etc. The third level +is in developing and testing new synthesis methods. + +This manual is not designed as a tutorial on converting text to speech +but for documenting the processes and use of our system. We do not +discuss the detailed algorithms involved in converting text to speech or +the relative merits of multiple methods, though we will often give +references to relevant papers when describing the use of each module. + +For more general information about text to speech we recommend Dutoit's +@file{An introduction to Text-to-Speech Synthesis} @cite{dutoit97}. For +more detailed research issues in TTS see @cite{sproat98} or +@cite{vansanten96}. + +@menu +* Philosophy:: Why we did it like it is +* Future:: How much better its going to get +@end menu + +@node Philosophy, Future, , Overview +@section Philosophy + +One of the biggest problems in the development of speech synthesis, and +other areas of speech and language processing systems, is that there are +a lot of simple well-known techniques lying around which can help you +realise your goal. But in order to improve some part of the whole +system it is necessary to have a whole system in which you can test and +improve your part. Festival is intended as that whole system in which +you may simply work on your small part to improve the whole. Without a +system like Festival, before you could even start to test your new +module you would need to spend significant effort to build a whole +system, or adapt an existing one before you could start working on your +improvements. + +Festival is specifically designed to allow the addition of new +modules, easily and efficiently, so that development need not +get bogged down in re-implementing the wheel. + +But there is another aspect of Festival which makes it more useful than +simply an environment for researching into new synthesis techniques. +It is a fully usable text-to-speech system suitable for embedding in +other projects that require speech output. The provision of a fully +working easy-to-use speech synthesizer in addition to just a testing +environment is good for two specific reasons. First, it offers a conduit +for our research, in that our experiments can quickly and directly +benefit users of our synthesis system. And secondly, in ensuring we have +a fully working usable system we can immediately see what problems exist +and where our research should be directed rather where our whims take +us. + +These concepts are not unique to Festival. ATR's CHATR system +(@cite{black94}) follows very much the same philosophy and Festival +benefits from the experiences gained in the development of that system. +Festival benefits from various pieces of previous work. As well as +CHATR, CSTR's previous synthesizers, Osprey and the Polyglot projects +influenced many design decisions. Also we are influenced by more +general programs in considering software engineering issues, especially +GNU Octave and Emacs on which the basic script model was based. + +Unlike in some other speech and language systems, software engineering is +considered very important to the development of Festival. Too often +research systems consist of random collections of hacky little scripts +and code. No one person can confidently describe the algorithms it +performs, as parameters are scattered throughout the system, with tricks +and hacks making it impossible to really evaluate why the system is good +(or bad). Such systems do not help the advancement of speech +technology, except perhaps in pointing at ideas that should be further +investigated. If the algorithms and techniques cannot be described +externally from the program @emph{such that} they can reimplemented by +others, what is the point of doing the work? + +Festival offers a common framework where multiple techniques may be +implemented (by the same or different researchers) so that they may +be tested more fairly in the same environment. + +As a final word, we'd like to make two short statements which both +achieve the same end but unfortunately perhaps not for the same reasons: +@quotation +Good software engineering makes good research easier +@end quotation +But the following seems to be true also +@quotation +If you spend enough effort on something it can be shown to be better +than its competitors. +@end quotation + +@node Future, , Philosophy , Overview +@section Future + +Festival is still very much in development. Hopefully this state will +continue for a long time. It is never possible to complete software, +there are always new things that can make it better. However as time +goes on Festival's core architecture will stabilise and little or +no changes will be made. Other aspects of the system will gain +greater attention such as waveform synthesis modules, intonation +techniques, text type dependent analysers etc. + +Festival will improve, so don't expected it to be the same six months +from now. + +A number of new modules and enhancements are already under consideration +at various stages of implementation. The following is a non-exhaustive +list of what we may (or may not) add to Festival over the +next six months or so. +@itemize @bullet +@item Selection-based synthesis: +Moving away from diphone technology to more generalized selection +of units for speech database. +@item New structure for linguistic content of utterances: +Using techniques for Metrical Phonology we are building more structure +representations of utterances reflecting there linguistic significance +better. This will allow improvements in prosody and unit selection. +@item Non-prosodic prosodic control: +For language generation systems and custom tasks where the speech +to be synthesized is being generated by some program, more information +about text structure will probably exist, such as phrasing, contrast, +key items etc. We are investigating the relationship of high-level +tags to prosodic information through the Sole project +@url{http://www.cstr.ed.ac.uk/projects/sole.html} +@item Dialect independent lexicons: +Currently for each new dialect we need a new lexicon, we are currently +investigating a form of lexical specification that is dialect independent +that allows the core form to be mapped to different dialects. This +will make the generation of voices in different dialects much easier. +@end itemize + +@node Installation, Quick start, Overview, Top +@chapter Installation + +This section describes how to install Festival from source in a new +location and customize that installation. + +@menu +* Requirements:: Software/Hardware requirements for Festival +* Configuration:: Setting up compilation +* Site initialization:: Settings for your particular site +* Checking an installation:: But does it work ... +@end menu + +@node Requirements, Configuration, , Installation +@section Requirements + +@cindex requirements +In order to compile Festival you first need the following +source packages + +@table @code +@item festival-1.4.3-release.tar.gz +Festival Speech Synthesis System source +@item speech_tools-1.2.3-release.tar.gz +The Edinburgh Speech Tools Library +@item festlex_NAME.tar.gz +@cindex lexicon +The lexicon distribution, where possible, includes the lexicon input +file as well as the compiled form, for your convenience. The lexicons +have varying distribution policies, but are all free except OALD, which +is only free for non-commercial use (we are working on a free +replacement). In some cases only a pointer to an ftp'able file plus a +program to convert that file to the Festival format is included. +@item festvox_NAME.tar.gz +You'll need a speech database. A number are available (with varying +distribution policies). Each voice may have other dependencies such as +requiring particular lexicons +@item festdoc_1.4.3.tar.gz +Full postscript, info and html documentation for Festival and the +Speech Tools. The source of the documentation is available +in the standard distributions but for your conveniences it has +been pre-generated. +@end table + +In addition to Festival specific sources you will also need + +@table @emph +@item A UNIX machine +Currently we have compiled and tested the system under Solaris (2.5(.1), +2.6, 2.7 and 2.8), SunOS (4.1.3), FreeBSD (3.x, 4.x), Linux (Redhat 4.1, +5.0, 5.1, 5.2, 6.[012], 7.[01], 8.0 and other Linux distributions), and it +should work under OSF (Dec Alphas), SGI (Irix), HPs (HPUX). But any +standard UNIX machine should be acceptable. We have now successfully +ported this version to Windows NT and Windows 95 (using the Cygnus GNU +win32 environment). This is still a young port but seems to work. +@item A C++ compiler +@cindex GNU g++ +@cindex g++ +@cindex C++ +Note that C++ is not very portable even between different versions +of the compiler from the same vendor. Although we've tried very +hard to make the system portable, we know it is very unlikely to +compile without change except with compilers that have already been tested. +The currently tested systems are +@itemize @bullet +@item Sun Sparc Solaris 2.5, 2.5.1, 2.6, 2.7, 2.9: +GCC 2.95.1, GCC 3.2 +@item FreeBSD for Intel 3.x and 4.x: +GCC 2.95.1, GCC 3.0 +@item Linux for Intel (RedHat 4.1/5.0/5.1/5.2/6.0/7.x/8.0): +GCC 2.7.2, GCC 2.7.2/egcs-1.0.2, egcs 1.1.1, egcs-1.1.2, GCC 2.95.[123], +GCC "2.96", GCC 3.0, GCC 3.0.1 GCC 3.2 GCC 3.2.1 +@item Windows NT 4.0: +GCC 2.7.2 plus egcs (from Cygnus GNU win32 b19), Visual C++ PRO v5.0, +Visual C++ v6.0 +@end itemize +Note if GCC works on one version of Unix it usually works on +others. + +@cindex Windows NT/95 +We have compiled both the speech tools and Festival under Windows NT 4.0 +and Windows 95 using the GNU tools available from Cygnus. +@example +@url{ftp://ftp.cygnus.com/pub/gnu-win32/}. +@end example + +@item GNU make +Due to there being too many different @code{make} programs out there +we have tested the system using GNU make on all systems we use. +Others may work but we know GNU make does. +@item Audio hardware +@cindex audio hardware +You can use Festival without audio output hardware but it doesn't sound +very good (though admittedly you can hear less problems with it). A +number of audio systems are supported (directly inherited from the +audio support in the Edinburgh Speech Tools Library): NCD's NAS +(formerly called netaudio) a network transparent audio system (which can +be found at @url{ftp://ftp.x.org/contrib/audio/nas/}); +@file{/dev/audio} (at 8k ulaw and 8/16bit linear), found on Suns, Linux +machines and FreeBSD; and a method allowing arbitrary UNIX +commands. @xref{Audio output}. +@end table + +@cindex readline +@cindex editline +@cindex GNU readline +Earlier versions of Festival mistakenly offered a command line editor +interface to the GNU package readline, but due to conflicts with the GNU +Public Licence and Festival's licence this interface was removed in +version 1.3.1. Even Festival's new free licence would cause problems as +readline support would restrict Festival linking with non-free code. A +new command line interface based on editline was provided that offers +similar functionality. Editline remains a compilation option as it is +probably not yet as portable as we would like it to be. + +@cindex @file{texi2html} +In addition to the above, in order to process the documentation you will +need @file{TeX}, @file{dvips} (or similar), GNU's @file{makeinfo} (part +of the texinfo package) and @file{texi2html} which is available from +@url{http://wwwcn.cern.ch/dci/texi2html/}. + +@cindex documentation +However the document files are also available pre-processed into, +postscript, DVI, info and html as part of the distribution in +@file{festdoc-1.4.X.tar.gz}. + +Ensure you have a fully installed and working version of your C++ +compiler. Most of the problems people have had in installing Festival +have been due to incomplete or bad compiler installation. It +might be worth checking if the following program works if you don't +know if anyone has used your C++ installation before. +@example +#include +int main (int argc, char **argv) +@{ + cout << "Hello world\n"; +@} +@end example + +Unpack all the source files in a new directory. The directory +will then contain two subdirectories +@example +speech_tools/ +festival/ +@end example + +@node Configuration, Site initialization, Requirements , Installation +@section Configuration + +First ensure you have a compiled version of the Edinburgh +Speech Tools Library. See @file{speech_tools/INSTALL} for +instructions. + +@cindex configuration +The system now supports the standard GNU @file{configure} method +for set up. In most cases this will automatically configure festival +for your particular system. In most cases you need only +type +@example +gmake +@end example +and the system will configure itself and compile, (note you +need to have compiled the Edinburgh Speech Tools +@file{speech_tools-1.2.2} first. + +@cindex @file{config/config} +In some case hand configuration is required. All of the configuration +choices are kept in the file @file{config/config}. + +@cindex OTHER_DIRS +For the most part Festival configuration inherits the configuration from +your speech tools config file (@file{../speech_tools/config/config}). +Additional optional modules may be added by adding them to the end of +your config file e.g. +@example +ALSO_INCLUDE += clunits +@end example +Adding and new module here will treat is as a new directory in +the @file{src/modules/} and compile it into the system in the +same way the @code{OTHER_DIRS} feature was used in +previous versions. + +@cindex NFS +@cindex automounter +If the compilation directory being accessed by NFS or if you use an +automounter (e.g. amd) it is recommend to explicitly set the variable +@code{FESTIVAL_HOME} in @file{config/config}. The command @code{pwd} is +not reliable when a directory may have multiple names. + +There is a simple test suite with Festival but it requires the three +basic voices and their respective lexicons installed before it will work. +Thus you need to install +@example +festlex_CMU.tar.gz +festlex_OALD.tar.gz +festlex_POSLEX.tar.gz +festvox_don.tar.gz +festvox_kedlpc16k.tar.gz +festvox_rablpc16k.tar.gz +@end example +If these are installed you can test the installation with +@example +gmake test +@end example + +To simply make it run with a male US English voice it is +sufficient to install just +@example +festlex_CMU.tar.gz +festlex_POSLEX.tar.gz +festvox_kallpc16k.tar.gz +@end example + +Note that the single most common reason for problems in compilation and +linking found amongst the beta testers was a bad installation of GNU +C++. If you get many strange errors in G++ library header files or link +errors it is worth checking that your system has the compiler, header +files and runtime libraries properly installed. This may be checked by +compiling a simple program under C++ and also finding out if anyone at +your site has ever used the installation. Most of these installation +problems are caused by upgrading to a newer version of libg++ without +removing the older version so a mixed version of the @file{.h} files +exist. + +Although we have tried very hard to ensure that Festival compiles with +no warnings this is not possible under some systems. + +@cindex SunOS +Under SunOS the system include files do not declare a number of +system provided functions. This a bug in Sun's include files. This +will causes warnings like "implicit definition of fprintf". These +are harmless. + +@cindex Linux +Under Linux a warning at link time about reducing the size of some +symbols often is produced. This is harmless. There is often +occasional warnings about some socket system function having an +incorrect argument type, this is also harmless. + +@cindex Visual C++ +The speech tools and festival compile under Windows95 or Windows NT +with Visual C++ v5.0 using the Microsoft @file{nmake} make program. We've +only done this with the Professonal edition, but have no reason to +believe that it relies on anything not in the standard edition. + +In accordance to VC++ conventions, object files are created with extension +.obj, executables with extension .exe and libraries with extension +.lib. This may mean that both unix and Win32 versions can be built in +the same directory tree, but I wouldn't rely on it. + +To do this you require nmake Makefiles for the system. These can be +generated from the gnumake Makefiles, using the command +@example +gnumake VCMakefile +@end example +in the speech_tools and festival directories. I have only done this +under unix, it's possible it would work under the cygnus gnuwin32 +system. + +If @file{make.depend} files exist (i.e. if you have done @file{gnumake +depend} in unix) equivalent @file{vc_make.depend} files will be created, if not +the VCMakefiles will not contain dependency information for the @file{.cc} +files. The result will be that you can compile the system once, but +changes will not cause the correct things to be rebuilt. + +In order to compile from the DOS command line using Visual C++ you +need to have a collection of environment variables set. In Windows NT +there is an instalation option for Visual C++ which sets these +globally. Under Windows95 or if you don't ask for them to be set +globally under NT you need to run +@example +vcvars32.bat +@end example +See the VC++ documentation for more details. + +Once you have the source trees with VCMakefiles somewhere visible from +Windows, you need to copy +@file{peech_tools\config\vc_config-dist} to +@file{speech_tools\config\vc_config} and edit it to suit your +local situation. Then do the same with +@file{festival\config\vc_config-dist}. + +The thing most likely to need changing is the definition of +@code{FESTIVAL_HOME} in @file{festival\config\vc_config_make_rules} +which needs to point to where you have put festival. + +Now you can compile. cd to the speech_tools directory and do +@example +nmake /nologo /fVCMakefile +@end example +@exdent and the library, the programs in main and the test programs should be +compiled. + +The tests can't be run automatically under Windows. A simple test to +check that things are probably OK is: +@example +main\na_play testsuite\data\ch_wave.wav +@end example +@exdent which reads and plays a waveform. + +Next go into the festival directory and do +@example +nmake /nologo /fVCMakefile +@end example +@exdent to build festival. When it's finished, and assuming you have the +voices and lexicons unpacked in the right place, festival should run +just as under unix. + +We should remind you that the NT/95 ports are still young and there may +yet be problems that we've not found yet. We only recommend the use the +speech tools and Festival under Windows if you have significant +experience in C++ under those platforms. + +@cindex smaller system +@cindex minimal system +Most of the modules @file{src/modules} are actually optional and the +system could be compiled without them. The basic set could be reduced +further if certain facilities are not desired. Particularly: +@file{donovan} which is only required if the donovan voice is used; +@file{rxp} if no XML parsing is required (e.g. Sable); and @file{parser} +if no stochastic paring is required (this parser isn't used for any of +our currently released voices). Actually even @file{UniSyn} and +@file{UniSyn_diphone} could be removed if some external waveform +synthesizer is being used (e.g. MBROLA) or some alternative one like +@file{OGIresLPC}. Removing unused modules will make the festival binary +smaller and (potentially) start up faster but don't expect too much. +You can delete these by changing the @code{BASE_DIRS} variable in +@file{src/modules/Makefile}. + +@node Site initialization, Checking an installation, Configuration, Installation +@section Site initialization + +@cindex run-time configuration +@cindex initialization +@cindex installation initialization +@cindex @file{init.scm} +@cindex @file{siteinit.scm} +Once compiled Festival may be further customized for particular sites. +At start up time Festival loads the file @file{init.scm} from its +library directory. This file further loads other necessary files such +as phoneset descriptions, duration parameters, intonation parameters, +definitions of voices etc. It will also load the files +@file{sitevars.scm} and @file{siteinit.scm} if they exist. +@file{sitevars.scm} is loaded after the basic Scheme library functions +are loaded but before any of the festival related functions are +loaded. This file is intended to set various path names before +various subsystems are loaded. Typically variables such +as @code{lexdir} (the directory where the lexicons are held), and +@code{voices_dir} (pointing to voice directories) should +be reset here if necessary. + +@cindex change libdir at run-time +@cindex run-time configuration +@cindex @code{load-path} +The default installation will try to find its lexicons and voices +automatically based on the value of @code{load-path} (this is derived +from @code{FESTIVAL_HOME} at compilation time or by using the @code{--libdir} +at run-time). If the voices and lexicons have been unpacked into +subdirectories of the library directory (the default) then no site +specific initialization of the above pathnames will be necessary. + +The second site specific file is @file{siteinit.scm}. Typical examples +of local initialization are as follows. The default audio output method +is NCD's NAS system if that is supported as that's what we use normally +in CSTR. If it is not supported, any hardware specific mode is the +default (e.g. sun16audio, freebas16audio, linux16audio or mplayeraudio). +But that default is just a setting in @file{init.scm}. If for example +in your environment you may wish the default audio output method to be +8k mulaw through @file{/dev/audio} you should add the following line to +your @file{siteinit.scm} file +@lisp +(Parameter.set 'Audio_Method 'sunaudio) +@end lisp +Note the use of @code{Parameter.set} rather than @code{Parameter.def} +the second function will not reset the value if it is already set. +Remember that you may use the audio methods @code{sun16audio}. +@code{linux16audio} or @code{freebsd16audio} only if @code{NATIVE_AUDIO} +was selected in @file{speech_tools/config/config} and your are +on such machines. The Festival variable @code{*modules*} contains +a list of all supported functions/modules in a particular installation +including audio support. Check the value of that variable if things +aren't what you expect. + +If you are installing on a machine whose audio is not directly supported +by the speech tools library, an external command may be executed to play +a waveform. The following example is for an imaginary machine that can +play audio files through a program called @file{adplay} with arguments +for sample rate and file type. When playing waveforms, Festival, by +default, outputs as unheadered waveform in native byte order. In this +example you would set up the default audio playing mechanism in +@file{siteinit.scm} as follows +@lisp +(Parameter.set 'Audio_Method 'Audio_Command) +(Parameter.set 'Audio_Command "adplay -raw -r $SR $FILE") +@end lisp +@cindex output sample rate +@cindex output file type +@cindex audio command output +@cindex audio output rate +@cindex audio output filetype +For @code{Audio_Command} method of playing waveforms Festival supports +two additional audio parameters. @code{Audio_Required_Rate} allows you +to use Festivals internal sample rate conversion function to any desired +rate. Note this may not be as good as playing the waveform at the +sample rate it is originally created in, but as some hardware devices +are restrictive in what sample rates they support, or have naive +resample functions this could be optimal. The second addition +audio parameter is @code{Audio_Required_Format} which can be +used to specify the desired output forms of the file. The default +is unheadered raw, but this may be any of the values supported by +the speech tools (including nist, esps, snd, riff, aiff, audlab, raw +and, if you really want it, ascii). + +For example suppose you run Festival on a remote machine and are not +running any network audio system and want Festival to copy files back to +your local machine and simply cat them to @file{/dev/audio}. The +following would do that (assuming permissions for rsh are allowed). +@lisp +(Parameter.set 'Audio_Method 'Audio_Command) +;; Make output file ulaw 8k (format ulaw implies 8k) +(Parameter.set 'Audio_Required_Format 'ulaw) +(Parameter.set 'Audio_Command + "userhost=`echo $DISPLAY | sed 's/:.*$//'`; rcp $FILE $userhost:$FILE; \ + rsh $userhost \"cat $FILE >/dev/audio\" ; rsh $userhost \"rm $FILE\"") +@end lisp +Note there are limits on how complex a command you want to put in the +@code{Audio_Command} string directly. It can get very confusing with respect +to quoting. It is therefore recommended that once you get past a certain +complexity consider writing a simple shell script and calling it from +the @code{Audio_Command} string. + +@cindex default voice +A second typical customization is setting the default speaker. Speakers +depend on many things but due to various licence (and resource) +restrictions you may only have some diphone/nphone databases available +in your installation. The function name that is the value of +@code{voice_default} is called immediately after @file{siteinit.scm} is +loaded offering the opportunity for you to change it. In +the standard distribution no change should be required. If you +download all the distributed voices @code{voice_rab_diphone} is +the default voice. You may change this for a site by adding +the following to @file{siteinit.scm} or per person by changing +your @file{.festivalrc}. For example if you wish to +change the default voice to the American one @code{voice_ked_diphone} +@lisp +(set! voice_default 'voice_ked_diphone) +@end lisp +Note the single quote, and note that unlike in early versions +@code{voice_default} is not a function you can call directly. + +@cindex @file{.festivalrc} +@cindex user initialization +A second level of customization is on a per user basis. After loading +@file{init.scm}, which includes @file{sitevars.scm} and +@file{siteinit.scm} for local installation, Festival loads the file +@file{.festivalrc} from the user's home directory (if it exists). This +file may contain arbitrary Festival commands. + +@node Checking an installation, , Site initialization, Installation +@section Checking an installation + +Once compiled and site initialization is set up you should test +to see if Festival can speak or not. + +Start the system +@example +$ bin/festival +Festival Speech Synthesis System 1.4.3:release Jan 2003 +Copyright (C) University of Edinburgh, 1996-2003. All rights reserved. +For details type `(festival_warranty)' +festival> ^D +@end example +If errors occur at this stage they are most likely to do +with pathname problems. If any error messages are printed +about non-existent files check that those pathnames +point to where you intended them to be. Most of the (default) +pathnames are dependent on the basic library path. Ensure that +is correct. To find out what it has been set to, start the +system without loading the init files. +@example +$ bin/festival -q +Festival Speech Synthesis System 1.4.3:release Jan 2003 +Copyright (C) University of Edinburgh, 1996-2003. All rights reserved. +For details type `(festival_warranty)' +festival> libdir +"/projects/festival/lib/" +festival> ^D +@end example +This should show the pathname you set in your @file{config/config}. + +If the system starts with no errors try to synthesize something +@example +festival> (SayText "hello world") +@end example +Some files are only accessed at synthesis time so this may +show up other problem pathnames. If it talks, you're in business, +if it doesn't, here are some possible problems. + +@cindex audio problems +If you get the error message +@example +Can't access NAS server +@end example +You have selected NAS as the audio output but have no server running on +that machine or your @code{DISPLAY} or @code{AUDIOSERVER} environment +variable is not set properly for your output device. Either set these +properly or change the audio output device in @file{lib/siteinit.scm} as +described above. + +Ensure your audio device actually works the way you think it does. On +Suns, the audio output device can be switched into a number of different +output modes, speaker, jack, headphones. If this is set to the wrong +one you may not hear the output. Use one of Sun's tools to change this +(try @file{/usr/demo/SOUND/bin/soundtool}). Try to find an audio +file independent of Festival and get it to play on your audio. +Once you have done that ensure that the audio output method set in +Festival matches that. + +Once you have got it talking, test the audio spooling device. +@example +festival> (intro) +@end example +This plays a short introduction of two sentences, spooling the audio +output. + +Finally exit from Festival (by end of file or @code{(quit)}) and test +the script mode with. +@example +$ examples/saytime +@end example + +A test suite is included with Festival but it makes certain assumptions +about which voices are installed. It assumes that +@code{voice_rab_diphone} (@file{festvox_rabxxxx.tar.gz}) is the default +voice and that @code{voice_ked_diphone} and @code{voice_don_diphone} +(@file{festvox_kedxxxx.tar.gz} and @file{festvox_don.tar.gz}) are +installed. Also local settings in your @file{festival/lib/siteinit.scm} +may affect these tests. However, after installation it may +be worth trying +@example +gnumake test +@end example +from the @file{festival/} directory. This will do various tests +including basic utterance tests and tokenization tests. It also checks +that voices are installed and that they don't interfere with each other. +These tests are primarily regression tests for the developers of +Festival, to ensure new enhancements don't mess up existing supported +features. They are not designed to test an installation is successful, +though if they run correctly it is most probable the installation has +worked. + +@node Quick start, Scheme, Installation, Top +@chapter Quick start + +This section is for those who just want to know the absolute basics +to run the system. + +@cindex command mode +@cindex text-to-speech mode +@cindex tts mode +Festival works in two fundamental modes, @emph{command mode} and +@emph{text-to-speech mode} (tts-mode). In command mode, information (in +files or through standard input) is treated as commands and is +interpreted by a Scheme interpreter. In tts-mode, information (in files +or through standard input) is treated as text to be rendered as speech. +The default mode is command mode, though this may change in later +versions. + +@menu +* Basic command line options:: +* Simple command driven session:: +* Getting some help:: +@end menu + +@node Basic command line options, Simple command driven session, , Quick start +@section Basic command line options + +@cindex command line options +Festival's basic calling method is as + +@lisp +festival [options] file1 file2 ... +@end lisp + +Options may be any of the following + +@table @code +@item -q +start Festival without loading @file{init.scm} or user's +@file{.festivalrc} +@item -b +@itemx --batch +@cindex batch mode +After processing any file arguments do not become interactive +@item -i +@itemx --interactive +@cindex interactive mode +After processing file arguments become interactive. This option overrides +any batch argument. +@item --tts +@cindex tts mode +Treat file arguments in text-to-speech mode, causing them to be +rendered as speech rather than interpreted as commands. When selected +in interactive mode the command line edit functions are not available +@item --command +@cindex command mode +Treat file arguments in command mode. This is the default. +@item --language LANG +@cindex language specification +Set the default language to @var{LANG}. Currently @var{LANG} may be +one of @code{english}, @code{spanish} or @code{welsh} (depending on +what voices are actually available in your installation). +@item --server +After loading any specified files go into server mode. This is +a mode where Festival waits for clients on a known port (the +value of @code{server_port}, default is 1314). Connected +clients may send commands (or text) to the server and expect +waveforms back. @xref{Server/client API}. Note server mode +may be unsafe and allow unauthorised access to your +machine, be sure to read the security recommendations in +@ref{Server/client API} +@item --script scriptfile +@cindex script files +@cindex Festival script files +Run scriptfile as a Festival script file. This is similar to +to @code{--batch} but it encapsulates the command line arguments into +the Scheme variables @code{argv} and @code{argc}, so that Festival +scripts may process their command line arguments just like +any other program. It also does not load the the basic initialisation +files as sometimes you may not want to do this. If you wish them, +you should copy the loading sequence from an example Festival +script like @file{festival/examples/saytext}. +@item --heap NUMBER +@cindex heap size +@cindex Scheme heap size +The Scheme heap (basic number of Lisp cells) is of a fixed size and +cannot be dynamically increased at run time (this would complicate +garbage collection). The default size is 210000 which seems to be more +than adequate for most work. In some of our training experiments where +very large list structures are required it is necessary to increase +this. Note there is a trade off between size of the heap and time it +takes to garbage collect so making this unnecessarily big is not a good +idea. If you don't understand the above explanation you almost +certainly don't need to use the option. +@end table +In command mode, if the file name starts with a left parenthesis, the +name itself is read and evaluated as a Lisp command. This is often +convenient when running in batch mode and a simple command is necessary +to start the whole thing off after loading in some other specific files. + +@node Simple command driven session, Getting some help, Basic command line options, Quick start +@section Sample command driven session + +Here is a short session using Festival's command interpreter. + +Start Festival with no arguments +@lisp +$ festival +Festival Speech Synthesis System 1.4.3:release Dec 2002 +Copyright (C) University of Edinburgh, 1996-2002. All rights reserved. +For details type `(festival_warranty)' +festival> +@end lisp + +Festival uses the a command line editor based on editline for terminal +input so command line editing may be done with Emacs commands. Festival +also supports history as well as function, variable name, and file name +completion via the @key{TAB} key. + +Typing @code{help} will give you more information, that is @code{help} +without any parenthesis. (It is actually a variable name whose value is a +string containing help.) + +@cindex Scheme +@cindex read-eval-print loop +Festival offers what is called a read-eval-print loop, because +it reads an s-expression (atom or list), evaluates it and prints +the result. As Festival includes the SIOD Scheme interpreter most +standard Scheme commands work +@lisp +festival> (car '(a d)) +a +festival> (+ 34 52) +86 +@end lisp +In addition to standard Scheme commands a number of commands specific to +speech synthesis are included. Although, as we will see, there are +simpler methods for getting Festival to speak, here are the basic +underlying explicit functions used in synthesizing an utterance. + +@cindex utterance +@cindex hello world +Utterances can consist of various types @xref{Utterance types}, +but the simplest form is plain text. We can create an utterance +and save it in a variable +@lisp +festival> (set! utt1 (Utterance Text "Hello world")) +# +festival> +@end lisp +The (hex) number in the return value may be different for your +installation. That is the print form for utterances. Their internal +structure can be very large so only a token form is printed. + +@cindex synthesizing an utterance +Although this creates an utterance it doesn't do anything else. +To get a waveform you must synthesize it. +@lisp +festival> (utt.synth utt1) +# +festival> +@end lisp +@cindex playing an utterance +This calls various modules, including tokenizing, duration,. intonation +etc. Which modules are called are defined with respect to the type +of the utterance, in this case @code{Text}. It is possible to +individually call the modules by hand but you just wanted it to talk didn't +you. So +@lisp +festival> (utt.play utt1) +# +festival> +@end lisp +@exdent will send the synthesized waveform to your audio device. You should +hear "Hello world" from your machine. + +@cindex @code{SayText} +To make this all easier a small function doing these three steps exists. +@code{SayText} simply takes a string of text, synthesizes it and sends it +to the audio device. +@lisp +festival> (SayText "Good morning, welcome to Festival") +# +festival> +@end lisp +Of course as history and command line editing are supported @key{c-p} +or up-arrow will allow you to edit the above to whatever you wish. + +Festival may also synthesize from files rather than simply text. +@lisp +festival> (tts "myfile" nil) +nil +festival> +@end lisp +@cindex exiting Festival +@cindex @code{quit} +The end of file character @key{c-d} will exit from Festival and +return you to the shell, alternatively the command @code{quit} may +be called (don't forget the parentheses). + +@cindex TTS +@cindex text to speech +Rather than starting the command interpreter, Festival may synthesize +files specified on the command line +@lisp +unix$ festival --tts myfile +unix$ +@end lisp + +@cindex text to wave +@cindex offline TTS +Sometimes a simple waveform is required from text that is to be kept and +played at some later time. The simplest way to do this with festival is +by using the @file{text2wave} program. This is a festival script that +will take a file (or text from standard input) and produce a single +waveform. + +@cindex text2wave +An example use is +@example +text2wave myfile.txt -o myfile.wav +@end example +Options exist to specify the waveform file type, for example if +Sun audio format is required +@example +text2wave myfile.txt -otype snd -o myfile.wav +@end example +Use @file{-h} on @file{text2wave} to see all options. + +@node Getting some help, , Simple command driven session, Quick start +@section Getting some help + +@cindex help +If no audio is generated then you must check to see if audio is +properly initialized on your machine. @xref{Audio output}. + +In the command interpreter @key{m-h} (meta-h) will give you help +on the current symbol before the cursor. This will be a short +description of the function or variable, how to use it and what +its arguments are. A listing of all such help strings appears +at the end of this document. @key{m-s} will synthesize and say +the same information, but this extra function is really just for show. + +@cindex @code{manual} +The lisp function @code{manual} will send the appropriate command to an +already running Netscape browser process. If @code{nil} is given as an +argument the browser will be directed to the tables of contents of the +manual. If a non-nil value is given it is assumed to be a section title +and that section is searched and if found displayed. For example +@example +festival> (manual "Accessing an utterance") +@end example +Another related function is @code{manual-sym} which given a symbol will +check its documentation string for a cross reference to a manual +section and request Netscape to display it. This function is +bound to @key{m-m} and will display the appropriate section for +the given symbol. + +Note also that the @key{TAB} key can be used to find out the name +of commands available as can the function @code{Help} (remember the +parentheses). + +For more up to date information on Festival regularly check +the Festival Home Page at +@example +@url{http://www.cstr.ed.ac.uk/projects/festival.html} +@end example + +Further help is available by mailing questions to +@example +festival-help@@cstr.ed.ac.uk +@end example +Although we cannot guarantee the time required to answer you, we +will do our best to offer help. + +@cindex bug reports +Bug reports should be submitted to +@example +festival-bug@@cstr.ed.ac.uk +@end example + +If there is enough user traffic a general mailing list will be +created so all users may share comments and receive announcements. +In the mean time watch the Festival Home Page for news. + +@node Scheme, TTS, Quick start, Top +@chapter Scheme + +@cindex Scheme introduction +Many people seem daunted by the fact that Festival uses Scheme as its +scripting language and feel they can't use Festival because they don't +know Scheme. However most of those same people use Emacs everyday which +also has (a much more complex) Lisp system underneath. The number of +Scheme commands you actually need to know in Festival is really very +small and you can easily just find out as you go along. Also people use +the Unix shell often but only know a small fraction of actual commands +available in the shell (or in fact that there even is a distinction +between shell builtin commands and user definable ones). So take it +easy, you'll learn the commands you need fairly quickly. + +@menu +* Scheme references:: Places to learn more about Scheme +* Scheme fundamentals:: Syntax and semantics +* Scheme Festival specifics:: +* Scheme I/O:: +@end menu + +@node Scheme references, Scheme fundamentals, , Scheme +@section Scheme references + +If you wish to learn about Scheme in more detail I recommend +the book @cite{abelson85}. + +The Emacs Lisp documentation is reasonable as it is comprehensive and +many of the underlying uses of Scheme in Festival were influenced +by Emacs. Emacs Lisp however is not Scheme so there are some +differences. + +@cindex Scheme references +Other Scheme tutorials and resources available on the Web are +@itemize @bullet +@item +The Revised Revised Revised Revised Scheme Report, the document +defining the language is available from +@example +@url{http://tinuviel.cs.wcu.edu/res/ldp/r4rs-html/r4rs_toc.html} +@end example +@item +a Scheme tutorials from the net: +@itemize @bullet +@item @url{http://www.cs.uoregon.edu/classes/cis425/schemeTutorial.html} +@end itemize +@item the Scheme FAQ +@itemize @bullet +@item @url{http://www.landfield.com/faqs/scheme-faq/part1/} +@end itemize +@end itemize + +@node Scheme fundamentals, Scheme Festival specifics, Scheme references, Scheme +@section Scheme fundamentals + +But you want more now, don't you, not just be referred to some +other book. OK here goes. + +@emph{Syntax}: an expression is an @emph{atom} or a @emph{list}. A +list consists of a left paren, a number of expressions and right +paren. Atoms can be symbols, numbers, strings or other special +types like functions, hash tables, arrays, etc. + +@emph{Semantics}: All expressions can be evaluated. Lists are +evaluated as function calls. When evaluating a list all the +members of the list are evaluated first then the first item (a +function) is called with the remaining items in the list as arguments. +Atoms are evaluated depending on their type: symbols are +evaluated as variables returning their values. Numbers, strings, +functions, etc. evaluate to themselves. + +Comments are started by a semicolon and run until end of line. + +And that's it. There is nothing more to the language that. But just +in case you can't follow the consequences of that, here are +some key examples. + +@lisp +festival> (+ 2 3) +5 +festival> (set! a 4) +4 +festival> (* 3 a) +12 +festival> (define (add a b) (+ a b)) +# +festival> (add 3 4) +7 +festival> (set! alist '(apples pears bananas)) +(apples pears bananas) +festival> (car alist) +apples +festival> (cdr alist) +(pears bananas) +festival> (set! blist (cons 'oranges alist)) +(oranges apples pears bananas) +festival> (append alist blist) +(apples pears bananas oranges apples pears bananas) +festival> (cons alist blist) +((apples pears bananas) oranges apples pears bananas) +festival> (length alist) +3 +festival> (length (append alist blist)) +7 +@end lisp + +@node Scheme Festival specifics, Scheme I/O, Scheme fundamentals, Scheme +@section Scheme Festival specifics + +There a number of additions to SIOD that are Festival specific though +still part of the Lisp system rather than the synthesis functions per se. + +By convention if the first statement of a function is a string, +it is treated as a documentation string. The string will be +printed when help is requested for that function symbol. + +@cindex debugging Scheme errors +@cindex debugging scripts +@cindex backtrace +In interactive mode if the function @code{:backtrace} is called (within +parenthesis) the previous stack trace is displayed. Calling +@code{:backtrace} with a numeric argument will display that particular +stack frame in full. Note that any command other than @code{:backtrace} +will reset the trace. You may optionally call +@lisp +(set_backtrace t) +@end lisp +Which will cause a backtrace to be displayed whenever a Scheme error +occurs. This can be put in your @file{.festivalrc} if you wish. This +is especially useful when running Festival in non-interactive mode +(batch or script mode) so that more information is printed when an error +occurs. + +@cindex hooks +A @emph{hook} in Lisp terms is a position within some piece of code +where a user may specify their own customization. The notion is used +heavily in Emacs. In Festival there a number of places where hooks are +used. A hook variable contains either a function or list of functions +that are to be applied at some point in the processing. For example the +@code{after_synth_hooks} are applied after synthesis has been applied to +allow specific customization such as resampling or modification of the +gain of the synthesized waveform. The Scheme function +@code{apply_hooks} takes a hook variable as argument and an object and +applies the function/list of functions in turn to the object. + +@cindex catching errors in Scheme +@cindex @code{unwind-protect} +@cindex errors in Scheme +When an error occurs in either Scheme or within the C++ part of Festival +by default the system jumps to the top level, resets itself and +continues. Note that errors are usually serious things, pointing to +bugs in parameters or code. Every effort has been made to ensure +that the processing of text never causes errors in Festival. +However when using Festival as a development system it is often +that errors occur in code. + +Sometimes in writing Scheme code you know there is a potential for +an error but you wish to ignore that and continue on to the next +thing without exiting or stopping and returning to the top level. For +example you are processing a number of utterances from a database and +some files containing the descriptions have errors in them but you +want your processing to continue through every utterance that can +be processed rather than stopping 5 minutes after you gone home after +setting a big batch job for overnight. + +@cindex @code{unwind-protect} +@cindex catching errors +Festival's Scheme provides the function @code{unwind-protect} which +allows the catching of errors and then continuing normally. For example +suppose you have the function @code{process_utt} which takes a filename +and does things which you know might cause an error. You can write the +following to ensure you continue processing even in an error +occurs. +@lisp +(unwind-protect + (process_utt filename) + (begin + (format t "Error found in processing %s\n" filename) + (format t "continuing\n"))) +@end lisp +The @code{unwind-protect} function takes two arguments. The first is +evaluated and if no error occurs the value returned from that expression +is returned. If an error does occur while evaluating the first +expression, the second expression is evaluated. @code{unwind-protect} +may be used recursively. Note that all files opened while evaluating +the first expression are closed if an error occurs. All global +variables outside the scope of the @code{unwind-protect} will be left as +they were set up until the error. Care should be taken in using this +function but its power is necessary to be able to write robust Scheme +code. + +@node Scheme I/O, , Scheme Festival specifics, Scheme +@section Scheme I/O + +@cindex file i/o in Scheme +@cindex i/o in Scheme +Different Scheme's may have quite different implementations of +file i/o functions so in this section we will describe the +basic functions in Festival SIOD regarding i/o. + +Simple printing to the screen may be achieved with the function +@code{print} which prints the given s-expression to the screen. +The printed form is preceded by a new line. This is often useful +for debugging but isn't really powerful enough for much else. + +@cindex @code{fopen} +@cindex @code{fclose} +Files may be opened and closed and referred to file descriptors +in a direct analogy to C's stdio library. The SIOD functions +@code{fopen} and @code{fclose} work in the exactly the same +way as their equivalently named partners in C. + +@cindex @code{format} +@cindex formatted output +The @code{format} command follows the command of the same name in Emacs +and a number of other Lisps. C programmers can think of it as +@code{fprintf}. @code{format} takes a file descriptor, format string +and arguments to print. The file description may be a file descriptor +as returned by the Scheme function @code{fopen}, it may also be @code{t} +which means the output will be directed as standard out +(cf. @code{printf}). A third possibility is @code{nil} which will cause +the output to printed to a string which is returned (cf. @code{sprintf}). + +The format string closely follows the format strings +in ANSI C, but it is not the same. Specifically the directives +currently supported are, @code{%%}, @code{%d}, @code{%x}, +@code{%s}, @code{%f}, @code{%g} and @code{%c}. All modifiers +for these are also supported. In addition @code{%l} is provided +for printing of Scheme objects as objects. + +For example +@lisp +(format t "%03d %3.4f %s %l %l %l\n" 23 23 "abc" "abc" '(a b d) utt1) +@end lisp +will produce +@lisp +023 23.0000 abc "abc" (a b d) # +@end lisp +on standard output. + +@cindex pretty printing +When large lisp expressions are printed they are difficult to read +because of the parentheses. The function @code{pprintf} prints an +expression to a file description (or @code{t} for standard out). It +prints so the s-expression is nicely lined up and indented. This +is often called pretty printing in Lisps. + +@cindex reading from files +@cindex loading data from files +For reading input from terminal or file, there is currently no +equivalent to @code{scanf}. Items may only be read as Scheme +expressions. The command +@lisp +(load FILENAME t) +@end lisp +@exdent +will load all s-expressions in @code{FILENAME} and return them, +unevaluated as a list. Without the third argument the @code{load} +function will load and evaluate each s-expression in the file. + +To read individual s-expressions use @code{readfp}. For +example +@lisp +(let ((fd (fopen trainfile "r")) + (entry) + (count 0)) + (while (not (equal? (set! entry (readfp fd)) (eof-val))) + (if (string-equal (car entry) "home") + (set! count (+ 1 count)))) + (fclose fd)) +@end lisp + +@cindex @code{parse-number} +@cindex @code{atof} +@cindex string to number +@cindex convert string to number +To convert a symbol whose print name is a number to a number +use @code{parse-number}. This is the equivalent to @code{atof} +in C. + +Note that, all i/o from Scheme input files is assumed to be +basically some form of Scheme data (though can be just numbers, +tokens). For more elaborate analysis of incoming data it is +possible to use the text tokenization functions which offer +a fully programmable method of reading data. + +@node TTS, XML/SGML mark-up, Scheme, Top +@chapter TTS + +Festival supports text to speech for raw text files. If you +are not interested in using Festival in any other way except as +black box for rendering text as speech, the following method +is probably what you want. +@example +festival --tts myfile +@end example +This will say the contents of @file{myfile}. Alternatively text +may be submitted on standard input +@example +echo hello world | festival --tts +cat myfile | festival --tts +@end example + +@cindex text modes +Festival supports the notion of @emph{text modes} where the text file +type may be identified, allowing Festival to process the file in an +appropriate way. Currently only two types are considered stable: +@code{STML} and @code{raw}, but other types such as @code{email}, +@code{HTML}, @code{Latex}, etc. are being developed and discussed below. +This follows the idea of buffer modes in Emacs where a file's type can +be utilized to best display the text. Text mode may also be selected +based on a filename's extension. + +Within the command interpreter the function @code{tts} is used +to render files as text; it takes a filename and the text mode +as arguments. + +@menu +* Utterance chunking:: From text to utterances +* Text modes:: Mode specific text analysis +* Example text mode:: An example mode for reading email +@end menu + +@node Utterance chunking, Text modes, , TTS +@section Utterance chunking + +@cindex utterance chunking +@cindex @code{eou_tree} +Text to speech works by first tokenizing the file and chunking the +tokens into utterances. The definition of utterance breaks is +determined by the utterance tree in variable @code{eou_tree}. A default +version is given in @file{lib/tts.scm}. This uses a decision tree to +determine what signifies an utterance break. Obviously blank lines are +probably the most reliable, followed by certain punctuation. The +confusion of the use of periods for both sentence breaks and +abbreviations requires some more heuristics to best guess their +different use. The following tree is currently used which +works better than simply using punctuation. +@lisp +(defvar eou_tree +'((n.whitespace matches ".*\n.*\n\\(.\\|\n\\)*") ;; 2 or more newlines + ((1)) + ((punc in ("?" ":" "!")) + ((1)) + ((punc is ".") + ;; This is to distinguish abbreviations vs periods + ;; These are heuristics + ((name matches "\\(.*\\..*\\|[A-Z][A-Za-z]?[A-Za-z]?\\|etc\\)") + ((n.whitespace is " ") + ((0)) ;; if abbrev single space isn't enough for break + ((n.name matches "[A-Z].*") + ((1)) + ((0)))) + ((n.whitespace is " ") ;; if it doesn't look like an abbreviation + ((n.name matches "[A-Z].*") ;; single space and non-cap is no break + ((1)) + ((0))) + ((1)))) + ((0))))) +@end lisp +The token items this is applied to will always (except in the +end of file case) include one following token, so look ahead is +possible. The "n." and "p." and "p.p." prefixes allow access to the +surrounding token context. The features @code{name}, @code{whitespace} +and @code{punc} allow access to the contents of the token itself. At +present there is no way to access the lexicon form this tree which +unfortunately might be useful if certain abbreviations were identified +as such there. + +Note these are heuristics and written by hand not trained from data, +though problems have been fixed as they have been observed in data. The +above rules may make mistakes where abbreviations appear at end of +lines, and when improper spacing and capitalization is used. This is +probably worth changing, for modes where more casual text appears, such +as email messages and USENET news messages. A possible improvement +could be made by analysing a text to find out its basic threshold of +utterance break (i.e. if no full stop, two spaces, followed by a +capitalized word sequences appear and the text is of a reasonable length +then look for other criteria for utterance breaks). + +Ultimately what we are trying to do is to chunk the text into utterances +that can be synthesized quickly and start to play them quickly to +minimise the time someone has to wait for the first sound when starting +synthesis. Thus it would be better if this chunking were done on +@emph{prosodic phrases} rather than chunks more similar to linguistic +sentences. Prosodic phrases are bounded in size, while sentences are +not. + +@node Text modes, Example text mode, Utterance chunking, TTS +@section Text modes + +@cindex text modes +We do not believe that all texts are of the same type. Often information +about the general contents of file will aid synthesis greatly. For +example in Latex files we do not want to here "left brace, backslash e +m" before each emphasized word, nor do we want to necessarily hear +formating commands. Festival offers a basic method for specifying +customization rules depending on the @emph{mode} of the text. By type +we are following the notion of modes in Emacs and eventually will allow +customization at a similar level. + +Modes are specified as the third argument to the function @code{tts}. +When using the Emacs interface to Festival the buffer mode is +automatically passed as the text mode. If the mode is not supported a +warning message is printed and the raw text mode is used. + +Our initial text mode implementation allows configuration both in C++ +and in Scheme. Obviously in C++ almost anything can be done but it is +not as easy to reconfigure without recompilation. Here +we will discuss those modes which can be fully configured at +run time. + +A text mode may contain the following +@table @emph +@item filter +A Unix shell program filter that processes the text file in some +appropriate way. For example for email it might remove uninteresting +headers and just output the subject, from line and the message body. +If not specified, an identity filter is used. +@item init_function +This (Scheme) function will be called before any processing +will be done. It allows further set up of tokenization rules +and voices etc. +@item exit_function +This (Scheme) function will be called at the end of any processing +allowing reseting of tokenization rules etc. +@item analysis_mode +If analysis mode is @code{xml} the file is read through the built in XML +parser @code{rxp}. Alternatively if analysis mode is @code{xxml} the +filter should an SGML normalising parser and the output is processed in +a way suitable for it. Any other value is ignored. +@end table +These mode specific parameters are specified in the a-list +held in @code{tts_text_modes}. + +When using Festival in Emacs the emacs buffer mode is passed to +Festival as the text mode. + +Note that above mechanism is not really designed to be re-entrant, +this should be addressed in later versions. + +@cindex @code{auto-text-mode-alist} +@cindex automatic selection of text mode +Following the use of auto-selection of mode in Emacs, Festival can +auto-select the text mode based on the filename given when no explicit +mode is given. The Lisp variable @code{auto-text-mode-alist} is a list +of dotted pairs of regular expression and mode name. For example +to specify that the @code{email} mode is to be used for files ending +in @file{.email} we would add to the current @code{auto-text-mode-alist} +as follows +@lisp +(set! auto-text-mode-alist + (cons (cons "\\.email$" 'email) + auto-text-mode-alist)) +@end lisp +If the function @code{tts} is called with a mode other than @code{nil} +that mode overrides any specified by the @code{auto-text-mode-alist}. +The mode @code{fundamental} is the explicit "null" mode, it is used +when no mode is specified in the function @code{tts}, and match +is found in @code{auto-text-mode-alist} or the specified mode +is not found. + +By convention if a requested text model is not found in +@code{tts_text_modes} the file @file{MODENAME-mode} will be +@code{required}. Therefore if you have the file +@file{MODENAME-mode.scm} in your library then it will be automatically +loaded on reference. Modes may be quite large and it is not necessary +have Festival load them all at start up time. + +Because of the @code{auto-text-mode-alist} and the auto loading +of currently undefined text modes you can use Festival like +@example +festival --tts example.email +@end example +Festival with automatically synthesize @file{example.email} in text +mode @code{email}. + +@cindex personal text modes +If you add your own personal text modes you should do the following. +Suppose you've written an HTML mode. You have named it +@file{html-mode.scm} and put it in @file{/home/awb/lib/festival/}. In +your @file{.festivalrc} first identify you're personal Festival library +directory by adding it to @code{lib-path}. +@example +(set! lib-path (cons "/home/awb/lib/festival/" lib-path)) +@end example +Then add the definition to the @code{auto-text-mode-alist} +that file names ending @file{.html} or @file{.htm} should +be read in HTML mode. +@example +(set! auto-text-mode-alist + (cons (cons "\\.html?$" 'html) + auto-text-mode-alist)) +@end example +Then you may synthesize an HTML file either from Scheme +@example +(tts "example.html" nil) +@end example +@exdent Or from the shell command line +@example +festival --tts example.html +@end example +Anyone familiar with modes in Emacs should recognise that the process of +adding a new text mode to Festival is very similar to adding a new +buffer mode to Emacs. + +@node Example text mode, , Text modes, TTS +@section Example text mode + +@cindex email mode +Here is a short example of a tts mode for reading email messages. It +is by no means complete but is a start at showing how you can customize +tts modes without writing new C++ code. + +The first task is to define a filter that will take a saved mail +message and remove extraneous headers and just leave the from +line, subject and body of the message. The filter program +is given a file name as its first argument and should output the +result on standard out. For our purposes we will do this as +a shell script. +@example +#!/bin/sh +# Email filter for Festival tts mode +# usage: email_filter mail_message >tidied_mail_message +grep "^From: " $1 +echo +grep "^Subject: " $1 +echo +# delete up to first blank line (i.e. the header) +sed '1,/^$/ d' $1 +@end example +Next we define the email init function, which will be called +when we start this mode. What we will do is save the current +token to words function and slot in our own new one. We can +then restore the previous one when we exit. +@lisp +(define (email_init_func) + "Called on starting email text mode." + (set! email_previous_t2w_func token_to_words) + (set! english_token_to_words email_token_to_words) + (set! token_to_words email_token_to_words)) +@end lisp +Note that @emph{both} @code{english_token_to_words} and +@code{token_to_words} should be set to ensure that our new +token to word function is still used when we change voices. + +The corresponding end function puts the token to words function +back. +@lisp +(define (email_exit_func) + "Called on exit email text mode." + (set! english_token_to_words email_previous_t2w_func) + (set! token_to_words email_previous_t2w_func)) +@end lisp +Now we can define the email specific token to words function. In this +example we deal with two specific cases. First we deal with the common +form of email addresses so that the angle brackets are not pronounced. +The second points are to recognise quoted text and immediately change the +the speaker to the alternative speaker. +@lisp +(define (email_token_to_words token name) + "Email specific token to word rules." + (cond +@end lisp +This first condition identifies the token as a bracketed email address +and removes the brackets and splits the token into name +and IP address. Note that we recursively call the function +@code{email_previous_t2w_func} on the email name and IP address +so that they will be pronounced properly. Note that because that +function returns a @emph{list} of words we need to append them together. +@lisp + ((string-matches name "<.*@.*>") + (append + (email_previous_t2w_func token + (string-after (string-before name "@@") "<")) + (cons + "at" + (email_previous_t2w_func token + (string-before (string-after name "@@") ">"))))) +@end lisp +Our next condition deals with identifying a greater than sign being used +as a quote marker. When we detect this we select the alternative +speaker, even though it may already be selected. We then return no +words so the quote marker is not spoken. The following condition finds +greater than signs which are the first token on a line. +@lisp + ((and (string-matches name ">") + (string-matches (item.feat token "whitespace") + "[ \t\n]*\n *")) + (voice_don_diphone) + nil ;; return nothing to say + ) +@end lisp +If it doesn't match any of these we can go ahead and use the builtin +token to words function Actually, we call the function that was set +before we entered this mode to ensure any other specific rules +still remain. But before that we need to check if we've had a newline +with doesn't start with a greater than sign. In that case we +switch back to the primary speaker. +@lisp + (t ;; for all other cases + (if (string-matches (item.feat token "whitespace") + ".*\n[ \t\n]*") + (voice_rab_diphone)) + (email_previous_t2w_func token name)))) +@end lisp +@cindex declaring text modes +In addition to these we have to actually declare the text mode. +This we do by adding to any existing modes as follows. +@lisp +(set! tts_text_modes + (cons + (list + 'email ;; mode name + (list ;; email mode params + (list 'init_func email_init_func) + (list 'exit_func email_exit_func) + '(filter "email_filter"))) + tts_text_modes)) +@end lisp +This will now allow simple email messages to be dealt with in a mode +specific way. + +An example mail message is included in @file{examples/ex1.email}. To +hear the result of the above text mode start Festival, load +in the email mode descriptions, and call TTS on the example file. +@example +(tts ".../examples/ex1.email" 'email) +@end example + +The above is very short of a real email mode but does illustrate +how one might go about building one. It should be reiterated +that text modes are new in Festival and their most effective form +has not been discovered yet. This will improve with time +and experience. + +@node XML/SGML mark-up, Emacs interface, TTS, Top +@chapter XML/SGML mark-up + +@cindex STML +@cindex SGML +@cindex SSML +@cindex Sable +@cindex XML +@cindex Spoken Text Mark-up Language +The ideas of a general, synthesizer system nonspecific, mark-up language +for labelling text has been under discussion for some time. Festival +has supported an SGML based markup language through multiple versions +most recently STML (@cite{sproat97}). This is based on the earlier SSML +(Speech Synthesis Markup Language) which was supported by previous +versions of Festival (@cite{taylor96}). With this version of Festival +we support @emph{Sable} a similar mark-up language devised by a +consortium from Bell Labls, Sub Microsystems, AT&T and Edinburgh, +@cite{sable98}. Unlike the previous versions which were SGML based, the +implementation of Sable in Festival is now XML based. To the user they +different is negligable but using XML makes processing of files easier +and more standardized. Also Festival now includes an XML parser thus +reducing the dependencies in processing Sable text. + +Raw text has the problem that it cannot always easily be rendered as +speech in the way the author wishes. Sable offers a well-defined way of +marking up text so that the synthesizer may render it appropriately. + +@cindex CSS +@cindex Cascading style sheets +@cindex DSSSL +The definition of Sable is by no means settled and is still in +development. In this release Festival offers people working on Sable +and other XML (and SGML) based markup languages a chance to quickly +experiment with prototypes by providing a DTD (document type +descriptions) and the mapping of the elements in the DTD to Festival +functions. Although we have not yet (personally) investigated facilities +like cascading style sheets and generalized SGML specification languages +like DSSSL we believe the facilities offer by Festival allow rapid +prototyping of speech output markup languages. + +Primarily we see Sable markup text as a language that will be generated by +other programs, e.g. text generation systems, dialog managers etc. +therefore a standard, easy to parse, format is required, even if +it seems overly verbose for human writers. + +For more information of Sable and access to the mailing list see +@example +@url{http://www.cstr.ed.ac.uk/projects/sable.html} +@end example + +@menu +* Sable example:: an example of Sable with descriptions +* Supported Sable tags:: Currently supported Sable tags +* Adding Sable tags:: Adding new Sable tags +* XML/SGML requirements:: Software environment requirements for use +* Using Sable:: Rendering Sable files as speech +@end menu + +@node Sable example, Supported Sable tags, , XML/SGML mark-up +@section Sable example + +Here is a simple example of Sable marked up text + +@example + + + + + +The boy saw the girl in the park with the telescope. +The boy saw the girl in the park with the telescope. + +Good morning My name is Stuart, which is spelled + +stuart +though some people pronounce it +stuart. My telephone number +is 2787. + +I used to work in Buccleuch Place, +but no one can pronounce that. + +By the way, my telephone number is actually + + +@end example +@cindex SABLE DTD +@cindex @file{Sable.v0_2.dtd} +After the initial definition of the SABLE tags, through the file +@file{Sable.v0_2.dtd}, which is distributed as part of Festival, the +body is given. There are tags for identifying the language and the +voice. Explicit boundary markers may be given in text. Also duration +and intonation control can be explicit specified as can new +pronunciations of words. The last sentence specifies some external +filenames to play at that point. + +@node Supported Sable tags, Adding Sable tags, Sable example, XML/SGML mark-up +@section Supported Sable tags + +@cindex Sable tags +There is not yet a definitive set of tags but hopefully such a list +will form over the next few months. As adding support for new tags is +often trivial the problem lies much more in defining what tags there +should be than in actually implementing them. The following +are based on version 0.2 of Sable as described in +@url{http://www.cstr.ed.ac.uk/projects/sable_spec2.html}, though +some aspects are not currently supported in this implementation. +Further updates will be announces through the Sable mailing list. + +@table @code +@item LANGUAGE +Allows the specification of the language through the @code{ID} +attribute. Valid values in Festival are, @code{english}, +@code{en1}, @code{spanish}, @code{en}, and others depending +on your particular installation. +For example +@example + ... +@end example +If the language isn't supported by the particualr installation of +Festival "Some text in .." is said instead and the section is +ommitted. +@item SPEAKER +Select a voice. Accepts a parameter @code{NAME} which takes values +@code{male1}, @code{male2}, @code{female1}, etc. There +is currently no definition about what happens when a voice is selected +which the synthesizer doesn't support. An example is +@example + ... +@end example +@item AUDIO +This allows the specification of an external waveform that is to +be included. There are attributes for specifying volume and whether +the waveform is to be played in the background of the following +text or not. Festival as yet only supports insertion. +@example +My telephone number is +

+ Festival Documentation
+ V1.2.5 December 1997 +

+
diff --git a/doc/indexHeader.inc b/doc/indexHeader.inc new file mode 100644 index 0000000..88b599c --- /dev/null +++ b/doc/indexHeader.inc @@ -0,0 +1,20 @@ + + Festival Documentation Index + + + + + + + + +
+ +

+ Festival Documentation
+ V1.2.5 December 1997 +

+
+ diff --git a/doc/refcard.tex b/doc/refcard.tex new file mode 100644 index 0000000..f3393c1 --- /dev/null +++ b/doc/refcard.tex @@ -0,0 +1,321 @@ +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%% %% +%% Centre for Speech Technology Research %% +%% University of Edinburgh, UK %% +%% Copyright (c) 1996,1997 %% +%% All Rights Reserved. %% +%% %% +%% Permission is hereby granted, free of charge, to use and distribute %% +%% this software and its documentation without restriction, including %% +%% without limitation the rights to use, copy, modify, merge, publish, %% +%% distribute, sublicense, and/or sell copies of this work, and to %% +%% permit persons to whom this work is furnished to do so, subject to %% +%% the following conditions: %% +%% 1. The code must retain the above copyright notice, this list of %% +%% conditions and the following disclaimer. %% +%% 2. Any modifications must be clearly marked as such. %% +%% 3. Original authors' names are not deleted. %% +%% 4. The authors' names are not used to endorse or promote products %% +%% derived from this software without specific prior written %% +%% permission. %% +%% %% +%% THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK %% +%% DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING %% +%% ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT %% +%% SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE %% +%% FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES %% +%% WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN %% +%% AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, %% +%% ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF %% +%% THIS SOFTWARE. %% +%% %% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%% +%% A reference card for Festival with the standard commands +%% +\documentstyle{article} + +%\setlength{\textwidth}{9in} +\setlength{\textwidth}{380pt} +\setlength{\textheight}{7.0in} +\setlength{\topmargin}{-1.0in} +\setlength{\oddsidemargin}{-0.9in} + +%% Thanks to lso8219@cs.rit.edu (Loren S Osborn) for the sizes +% For A4 +% 3.6in +% For letter sized 242pt + +\def\excode#1{\mbox{\hspace{0.25in}{\small #1}}\\ } +\def\excodett#1{\mbox{\hspace{0.25in}{\small \tt #1}}\\ } +\def\explain#1{\mbox{\hspace{0.1in}{\it #1 }}\\} +\def\maintitle#1{{\bf #1} \\} +\def\weebreak{\hspace{1in}\\} + +\pagestyle{empty} + +\begin{document} + +\begin{tabular}{ccc} +\begin{minipage}{3.6in} +% page 1 col 1 +\begin{center} +{\bf The Festival Speech Synthesis System 1.4} +{\bf Reference Card} +\end{center} + +\maintitle{Festival on-line manual:} +{\small \tt http://www.cstr.ed.ac.uk/projects/festival/manual/} +\explain{If Netscape is running, in command interpreter} +\excodett{(manual nil)} +\hspace{1in}\\ +\maintitle{Making it talk:} +\excodett{festival --tts file.text} +\excodett{echo "hello" | festival --tts} +\hspace{1in}\\ +\maintitle{Command line interpreter} +\explain{If editline interface is supported, {\tt C-} denotes} +\explain{control key, {\tt M-} denotes meta key (diamond} +\mbox{\hspace{0.1in}{\it or maybe alt)}} +\begin{tabbing} +abc \= C-xxxx \= explain \kill + \> {\tt \small C-c} \> {\small \it stop and return to top-level}\\ + \> {\tt \small C-d} {\small \it or \tt (quit)} \\ + \> \> {\small \it Exit Festival}\\ + \> {\tt \small TAB} \> {\small \it symbol, function or file completion}\\ + \> {\tt \small C-p} {\small \it or up-arrow} \\ + \> \> {\small \it previous command}\\ + \> {\tt \small C-r} \> {\small \it backwards search command}\\ + \> {\tt \small M-h} \> {\small \it print help on current symbol}\\ + \> {\tt \small M-s} \> {\small \it say help on current symbol}\\ + \> {\tt \small M-m} \> {\small goto appropriate manual page}\\ + \> \> {\small \it (requires Netscape to be running)} +\end{tabbing} +\explain{Emacs keys may be used for editing command line} +\weebreak +\maintitle{Scheme commands: speech} +\explain{Say some string of text} +\excode{{\tt (SayText "}{\it text ...}{\tt ")}} +\explain{Say the contents of {\tt file.text}} +\excodett{(tts "file.text" nil)} +\maintitle{Voices} +\explain{Select voice through {\tt voice\_*} functions e.g.} +\excodett{(voice\_rab\_diphone)} +\excodett{(voice\_don\_diphone)} +\excodett{(voice\_ked\_diphone)} + +\end{minipage} & +\begin{minipage}{3.6in} +% page 1 col 2 +\maintitle{Scheme commands: general} +\explain{Setting a variable} +\excodett{(set! a 'fred) ; comment} +\excodett{(set! b '(b c d)} +\excodett{(set! sum (+ 2 3 4)} +\explain{Lists} +\excodett{festival> (set! x '(a b c))} +\excodett{(a b c)} +\excodett{festival> (set! y '(d e f))} +\excodett{(d e f)} +\excodett{festival> (append x y)} +\excodett{(a b c d e f)} +\excodett{festival> (car x)} +\excodett{a} +\excodett{festival> (cdr x)} +\excodett{(b c)} +\excodett{festival> (cons 'm x)} +\excodett{(m a b c)} +\explain{Functions} +\excodett{(define plus (a b) (+ a b))} +\excodett{festival> (plus 3 4)} +\excodett{7} +\excodett{festival> (plus 3 (plus 2 4))} +\excodett{9} +\explain{Printing} +\excodett{(pprint '(a b c))} +\excode{$\rightarrow$ {\tt (a b c)}} +\excode{{\tt (format t "Total \%2.3f} \% {\tt n" 3.12345)}} +\excode{$\rightarrow$ 3.123} +\explain{Others} +\excodett{(load "fred.scm")} +\excodett{(if (string-equal name "fred")} +\excode{\mbox{\hspace{0.1in}{\tt (pprint "is fred")}}} +\excode{\mbox{\hspace{0.1in}{\tt (pprint "is not fred"))}}} + +\end{minipage} & +\begin{minipage}{3.6in} +% page 1 col 3 +\maintitle{Text modes} +\explain{Second argument to {\tt tts} is {\it MODE}} +\explain{{\it MODE} may be {\tt nil}, {\tt email}, {\tt sable} e.g.} +\excodett{ (tts "file.sable" stml)} +\explain{If {\it MODE} is {\tt nil} use auto-mode} +\explain{To set all files ending in {\tt .ogi} to be} +\explain{synthesized in {\tt ogi} mode} +\excodett{(set! auto-text-mode-alist} +\excode{\mbox{\hspace{0.1in}{\tt (cons (cons "\\.ogi\$" 'ogi)}}} +\excode{\mbox{\hspace{0.2in}{\tt auto-text-mode-alist))}}} +\maintitle{Lexicon} +\explain{Adding new lexical entry} +\excodett{(lex.add\_entry "email"} +\excode{\mbox{\hspace{0.1in}{\tt ("email" n (((ii) 1) ((m ei l) 0))))}}} +\explain{Check similar words for pronunciation} +\excodett{(lex.lookup "mail")} +\explain{Add new personal entries in {\tt .festivalrc}} +\explain{first select the voice then call {\tt lex.add\_entry}} +\maintitle{Audio} +\explain{Audio should be setup at installation} +\explain{NAS (network audio server) multi-platform} +\excodett{(Parameter.set 'Audio\_Method 'netaudio)} +\explain{8bit ulaw {\tt /dev/audio} (Sun, FreeBSD, Linux)} +\excodett{(Parameter.set 'Audio\_Method 'sunaudio)} +\explain{Sun/FreeBSD/Linux 16 linear (compile-time options)} +\excodett{(Parameter.set 'Audio\_Method 'sun16audio)} +\excodett{(Parameter.set 'Audio\_Method 'freebsd16audio)} +\excodett{(Parameter.set 'Audio\_Method 'linux16audio)} +\explain{Arbitrary command, will execute given command} +\explain{on waveform. {\tt \$FILE} and {\tt \$SR} will be set} +\excodett{(Parameter.set 'Audio\_Method 'Audio\_Command)} +\excodett{(Parameter.set 'Audio\_Command} +\excode{\mbox{\hspace{0.3in}{\tt "adplay -raw -rate \$SR \$FILE")}}} +\explain{Default for command is unheadered, shorts (native)} +\explain{You can change format and sample rate with} +\excodett{(Parameter.set 'Audio\_Required\_Rate 8000)} +\excodett{(Parameter.set 'Audio\_Required\_Fromat 'riff)} +\explain{formats include: {\tt riff, aiff, nist, snd, esps}} + +\end{minipage} +\end{tabular} + +\begin{tabular}{ccc} +\begin{minipage}{3.6in} +% page 2 col 1 +\maintitle{Utterances} +\explain{Making utterances} +\excode{{\tt (Utterance }{\it TYPE} {\it DATA}{\tt)}} +\explain{{\it TYPE}s include: {\tt Text} {\tt Phones} {\tt Wave} e.g.} +\excodett{(Utterance Text "Hello world.")} +\excodett{(Utterance Phones (h @ l ou))} +\excodett{(Utterance Wave "sc001.wav")} +\explain{Synthesize (modules based on TYPE)} +\excode{{\tt (utt.synth }{\it UTT}{\tt)}} +\explain{Send waveform to audio device} +\excode{{\tt (utt.play }{\it UTT}{\tt)}} +\explain{An example} +\excodett{(set! utt1 (Utterance "Hello world."))} +\excodett{(utt.play (utt.synth utt1))} +\maintitle{Accessing items and features} +\explain{returns list of ITEMs} +\excode{{\tt (utt.relation.items }{\it UTT RELATIONNAME}{\tt )}} +\explain{Item feature access functions} +\excode{{\tt (item.feat }{\it ITEM FEATNAME}{\tt )}} +\excode{{\tt (item.name }{\it ITEM}{\tt )}} +\excode{{\tt (item.next }{\it ITEM}{\tt )}} +\excode{{\tt (item.prev }{\it ITEM}{\tt )}} +\excode{{\tt (item.set\_feat }{\it ITEM FEATNAME VAL}{\tt )}} +\excode{{\tt (item.relation }{\it ITEM RELATIONNAME}{\tt )}} +\maintitle{Item features} +\explain{Assume {\tt utt} is utterance, {\tt item} is item} +\explain{returns item's name} +\excodett{(item.feat item "name")} +\explain{returns item's next's name} +\excodett{(item.feat item "n.name")} +\explain{returns item's previous's name} +\excodett{(item.feat item "p.name")} +\explain{name of word related to Syllable item} +\excodett{(item.feat item "R:SylStructure.parent.name")} +\explain{name of next word of word related to syllable} +\excodett{(item.feat item "R:SylStructure.parent.R:Word.n.name")} +\explain{name of word related to next syllable} +\excodett{(item.feat item "n.R:SylStructure.parent.name")} +\maintitle{Multi-features} +\explain{Return all features of items in relation} +\excodett{(utt.features utt 'Syllable } +\excode{\mbox{\hspace{0.1in}{\tt '(duration stress n.name))}}} +\explain{See manual for builtin feature names} + +\end{minipage} & +\begin{minipage}{3.6in} +% page 2 col 2 + +\maintitle{Regex matching} +\begin{tabbing} +C-xxxx \= explain \kill +{\tt \small .} \> {\small \it matches any character}\\ +{\tt \small {\it X}*} \> {\small \it zero or more Xs}\\ +{\tt \small {\it X}+} \> {\small \it one or more Xs}\\ +{\tt \small {\it X}?} \> {\small \it zero or one Xs}\\ +{\tt \small [abc]} \> {\small \it range, matching a,b or c}\\ +{\tt \small [A-Z]} \> {\small \it range, matching all caps}\\ +{\tt \small \verb+[^A-Z]+} \> {\small \it range, matching all but caps}\\ +{\tt \small \verb+\\(XY\\)+} \\ + \> {\small \it group X and Y}\\ +{\tt \small \verb+X\\|Y+} \> {\small \it match X or Y} +\end{tabbing} +\explain{For example} +\begin{tabbing} +C-xxxx \= explain \kill +{\tt \small ".*a.*"} \> {\small \it matches all strings containing a}\\ +{\tt \small "a.*"} \> {\small \it all strings starting with a}\\ +{\tt \small "[A-Z].*"} \> {\small \it all strings starting with a capital}\\ +{\tt \small "[0-9]+"} \> {\small \it all strings of digits}\\ +{\tt \small "\verb.-?[0-9]+\\(\\..\verb.[0-9]+\\)?."} \\ + \> {\small \it any real number}\\ +{\tt \small "\verb.[^aeiouAEIOU]+."} \> \\ + \> {\small \it any string with no vowels}\\ +{\tt \small "\verb.\\(Saturday\\)\\|\\(Sunday\\)."} \\ + \> {\small \it Saturday or Sunday} +\end{tabbing} +\maintitle{String functions} +\explain{returns suffix of STR1 after STR2} +\excode{{\tt (string-after }{\it STR1 STR2}{\tt )}} +\explain{returns prefix of STR1 before STR2} +\excode{{\tt (string-before }{\it STR1 STR2}{\tt )}} +\explain{returns t if STR matches REGEX or nil} +\excode{{\tt (string-matches }{\it STR REGEX}{\tt )}} +\explain{returns t if STR1 equals STR2} +\excode{{\tt (string-equal }{\it STR1 STR2}{\tt )}} +\explain{returns non-nil if STR is in LIST} +\excode{{\tt (member\_string }{\it STR LIST}{\tt )}} + +\end{minipage} & +\begin{minipage}{3.6in} +% page 2 col 3 +\maintitle{Miscellaneous} +\explain{List all (potentional) voices} +\excodett{(voice.list)} +\explain{Return description of voice NAME} +\excode{{\tt (voice.description }{\it NAME}{\tt )}} +\explain{Speak description of voice NAME} +\excode{{\tt (voice.describe }{\it NAME}{\tt )}} +\explain{List all defined lexicons} +\excode{{\tt (lex.list)}} +\explain{List all defined phonesets} +\excode{{\tt (PhoneSet.list)}} +\explain{Describe current PhoneSet} +\excode{{\tt (PhoneSet.description)}} +\maintitle{More information} +\explain{More information of Festival is available from} +\excodett{http://www.cstr.ed.ac.uk/projects/festival.html} +\explain{Or by mailing} +\excodett{festival-help@cstr.ed.ac.uk} +\hspace{1in}\\ +\hspace{1in}\\ +\hspace{1in}\\ +\hspace{1in}\\ +\hspace{1in}\\ +\hspace{1in}\\ +\hspace{1in}\\ +\hspace{1in}\\ +\maintitle{Copyright} +\explain{(C) University of Edinburgh 1996-1999.} +\explain{All rights reserved} +\hspace{1in}\\ +\explain{{\small Festival is free software and may be}} +\explain{{\small used commercially or otherwise without}} +\explain{{\small further permission.}} +\end{minipage} +\end{tabular} + +\end{document} + diff --git a/examples/Makefile b/examples/Makefile new file mode 100644 index 0000000..e4dd1a0 --- /dev/null +++ b/examples/Makefile @@ -0,0 +1,72 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +TOP=.. +DIRNAME=examples +BUILD_DIRS= +ALL_DIRS=$(BUILD_DIRS) songs + +EXTEXTS = intro.text spintro.text benchmark.text +EXAMPLES = webdemo.scm ex1.email ex1.ogi example.sable example2.sable \ + tobi.stml example.th example.apml +SCRIPTS = saytime.sh text2pos.sh latest.sh \ + scfg_parse_text.sh text2wave.sh make_utts.sh dumpfeats.sh \ + durmeanstd.sh powmeanstd.sh run-festival-script.sh +SHELL_SCRIPTS = benchmark festival_client.pl +SCMS = toksearch.scm th-mode.scm addr-mode.scm +SAMPLEC = festival_client.c festival_client.h +OTHERS = apml.dtd +FILES=Makefile $(EXAMPLES) $(EXTEXTS) $(SCRIPTS) $(SHELL_SCRIPTS) $(SCMS) $(SAMPLEC) $(OTHERS) speech_pm_1.0.tar + +ALL = $(SCRIPTS:.sh=) +LOCAL_CLEAN = $(SCRIPTS:.sh=) + +include $(TOP)/config/common_make_rules + +$(ALL) : % : %.sh + rm -f $@ + @echo "#!/bin/sh" >$@ + @echo "\"true\" ; exec "$(FESTIVAL_HOME)/bin/festival --script '$$0 $$*' >>$@ + cat $< >>$@ + chmod +x $@ + +festival_client: festival_client.o festival_client.h + $(LINK_COMMAND) -o festival_client festival_client.o $(LIBS) + +festival_client.o: festival_client.c festival_client.h + $(CC_COMMAND) -DSTANDALONE festival_client.c -o festival_client.o + +# Do this manually to make sure Festival.tar (perl module) is up to date. + +speech_pm_1.0.tar: $(wildcard speech_pm_1.0/*) $(wildcard speech_pm_1.0/*/*) $(wildcard speech_pm_1.0/*/*/*) + -chmod +w speech_pm_1.0.tar + tar cvf speech_pm_1.0.tar `cat speech_pm_1.0/MANIFEST|sed -e 's/^/speech_pm_1.0\//'` diff --git a/examples/addr-mode.scm b/examples/addr-mode.scm new file mode 100644 index 0000000..a8cefdd --- /dev/null +++ b/examples/addr-mode.scm @@ -0,0 +1,361 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1998 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; An address mode for reading lists of names, addresses and +;;; telephone numbers. Takes quite an aggressive view of the data. +;;; +;;; This was used for the CSTR's entry in the evaluations at the +;;; ESCA workshop on Speech Synthesis in Janolen Caves, Blue Mountains, +;;; NSW, Australia. +;;; +;;; This can read things like +;;; Brown, Bill, 6023 Wiser Rd, Austin, TX 76313-2837, 817-229-7849 +;;; Green, Bob, 3076 Wabash Ct, Fort Worth, TX 76709-1368, (817)292-9015 +;;; Smith, Bobbie Q, 3337 St Laurence St, Fort Worth, TX 71611-5484, (817)839-3689 +;;; Jones, Billy, 5306 Dr Dana Lynn Dr, Fort Worth, TX 71637-2547, 817 845 6154 +;;; Henderson, Bryan J, 5808 Sycamore Creek Rd Apt R, Fort Worth, TX 76134-1906, (817)239-4634 +;;; Black, Alan W, 130 S 18th St #3, Pittsburgh, PA 15205, (412)268-8189 +;;; Bowman, K, 2610 W Bowie St, El Paso, TX 76019-1712, (817)268-7257 +;;; Sydney, Aaron A, 1521 NW Ballard Way, Seattle, WA 91807-4712, (206)783-8645 +;;; Anderson, A, 12012 Pinehurst Way NE, Seattle, NE 98125-5108, (212)404-9988 + +;; New lines without trailing continuation punctuation signal EOU +(defvar addr_eou_tree +'((n.whitespace matches ".*\n.*") ;; any new line + ((punc in ("," ":")) + ((0)) + ((1))))) + +(set! addr_phrase_cart_tree +' +((pbreak is "B") + ((B)) + ((pbreak is "BB") + ((BB)) + ((lisp_token_end_punc in ("?" "." ":" "'" "\"" "," ";")) + ((B)) + ((n.name is 0) ;; end of utterance + ((BB)) + ((NB))))))) + +(define (addr_init_func) + "Called on starting addr text mode." + (Parameter.set 'Phrase_Method 'cart_tree) + (set! phrase_cart_tree addr_phrase_cart_tree) + (set! int_lr_params + '((target_f0_mean 105) (target_f0_std 12) + (model_f0_mean 170) (model_f0_std 34))) + (Parameter.set 'Duration_Stretch 1.1) + (set! addr_previous_t2w_func english_token_to_words) + (set! english_token_to_words addr_token_to_words) + (set! token_to_words addr_token_to_words) + (set! addr_previous_eou_tree eou_tree) + (set! eou_tree addr_eou_tree)) + +(define (addr_exit_func) + "Called on exit addr text mode." + (Parameter.set 'Duration_Stretch 1.0) + (set! token_to_words addr_previous_t2w_func) + (set! english_token_to_words addr_previous_t2w_func) + (set! eou_tree addr_previous_eou_tree)) + +(set! addr_regex_ZIPCODE2 "[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]") + +(set! addr_regex_USPHONE3 "[0-9][0-9][0-9])[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]") +(set! addr_regex_USPHONE1 "[0-9][0-9][0-9]-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]") + +(set! addr_tiny_break (list '(name "") '(pbreak mB))) +(set! addr_small_break (list '(name "") '(pbreak B))) +(set! addr_large_break (list '(name "") '(pbreak BB))) + +(define (addr_within_name_part token) + "(addr_within_name_part token) +Heuristic guess if we are still in the name (i.e. pre-address) +this is desgined to stop Mr W Smith becoming West." + (cond + ((addr_preceding_number token) + ;; any preceding token with a digit + nil) + (t + t))) + +(define (addr_preceding_number tok) + (cond + ((null tok) nil) + ((string-matches (item.name tok) ".*[0-9].*") + t) + (t (addr_preceeding_number (item.prev tok))))) + +(define (addr_succeeding_non_number tok) + (cond + ((null tok) nil) + ((string-matches (item.name tok) ".*[A-Za-z].*") + t) + (t (addr_succeeding_non_number (item.next tok))))) + +(define (addr_token_to_words token name) + "(addr_token_to_words token name) +Address specific text reading mode. Lots of address specific abbreviations +and phrasing etc." + (set! utt_addr (item.get_utt token)) + (let ((type (item.feat token "token_type"))) + (cond + ((string-matches name "A") + (list (list '(name "a") '(pos nn)))) + ((addr_within_name_part token) + (builtin_english_token_to_words token name) + ) + ((string-matches name "\\([dD][Rr]\\|[Ss][tT]\\)") + (if (string-equal (item.feat token "token_pos") "street") + (if (string-matches name "[dD][rR]") + (list "drive") + (list "street")) + (if (string-matches name "[dD][rR]") ;; default on title side + (list "doctor") + (list "saint")))) + ((string-matches name addr_regex_ZIPCODE2) + ;; Zip code + (item.set_feat token "token_pos" "digits") + (append + (builtin_english_token_to_words token (string-before name "-")) + (list addr_small_break) + (builtin_english_token_to_words token (string-after name "-")) + (list addr_large_break))) + ((string-matches name addr_regex_USPHONE3) + (item.set_feat token "token_pos" "digits") + (append + (builtin_english_token_to_words token (string-before name ")")) + (list addr_small_break) + (builtin_english_token_to_words token + (string-after (string-before name "-") ")")) + (list addr_small_break) + (builtin_english_token_to_words token (string-after name "-")))) + ((string-matches name addr_regex_USPHONE1) + (item.set_feat token "token_pos" "digits") + (append + (builtin_english_token_to_words token (string-before name "-")) + (list addr_small_break) + (builtin_english_token_to_words token + (string-before (string-after name "-") "-")) + (list addr_small_break) + (builtin_english_token_to_words token + (string-after (string-after name "-") "-")))) + ((string-equal name "NE") + (cond + ((string-matches (item.feat token "n.name") addr_regex_ZIPCODE2) + (list "Nebraska")) + ;; could check if there is a state following it + (t + (list "North" "East")))) + ((set! full (addr_undo_abbrev name addr_addr_abbrevs)) + (cdr full)) + ((string-matches name "#.*") + (cons + "number" + (builtin_english_token_to_words token (string-after name "#")))) + ((string-matches name "[0-9][0-9][0-9][0-9][0-9]+") + ;; long number + (item.set_feat token "token_pos" "digits") + (builtin_english_token_to_words token name)) + ((or (string-matches name "[0-9]0[0-9][0-9]") + (string-matches name "[0-9][0-9]0[0-9]")) + (item.set_feat token "token_pos" "digits") + (mapcar + (lambda (a) + (if (string-equal a "zero") + "oh" + a)) + (builtin_english_token_to_words token name))) + ((and + (addr_succeeding_non_number token) + (string-matches name "[0-9][0-9][0-9][0-9]")) + ;; four digit number + (let (block number) + (item.set_feat token "token_pos" "number") + (set! block + (builtin_english_token_to_words + token (substring name 0 2))) + (if (string-equal (nth 2 (symbolexplode name)) "0") + (item.set_feat token "token_pos" "digits") + (item.set_feat token "token_pos" "number")) + (set! number + (builtin_english_token_to_words + token (substring name 2 2))) + (append + block + (list addr_tiny_break) + number))) + ((and + (addr_succeeding_non_number token) + (string-matches name "[0-9][0-9][0-9]")) + ;; four digit number + (let (block number) + (item.set_feat token "token_pos" "number") + (set! block + (builtin_english_token_to_words + token (substring name 0 1))) + (if (string-equal (nth 1 (symbolexplode name)) "0") + (item.set_feat token "token_pos" "digits") + (item.set_feat token "token_pos" "number")) + (set! number + (builtin_english_token_to_words + token (substring name 1 2))) + (append + block number))) + ((string-matches name "[0-9]+") + (item.set_feat token "token_pos" "digits") + (builtin_english_token_to_words token name)) + (t ;; for all other cases + (addr_previous_t2w_func token name))))) + +(define (addr_undo_abbrev name abbrevs) +"(addr_undo_abbrev name abbrevs) +General abbreviation undoer. Looks for name in reverse assoc +list and returns the value." + (cond + ((null abbrevs) nil) + ((member_string name (car (car abbrevs))) + (car abbrevs)) + (t + (addr_undo_abbrev name (cdr abbrevs))))) + +(set! tts_text_modes + (cons + (list + 'addr ;; mode name + (list ;; addr mode params + (list 'init_func addr_init_func) + (list 'exit_func addr_exit_func))) + tts_text_modes)) + +(set! addr_us_states + '((( AL Ala ) Alabama ) + (( AK ) Alaska ) + (( AR ) Arkansas ) + (( AZ Ariz ) Arizona ) + (( CA Cal Calif ) California ) + (( CO Colo ) Colorado ) + (( CT Conn ) Connecticutt ) + (( DC ) DC ) + (( DE Dela ) Delaware ) + (( FL Fla ) Florida ) + (( GA ) Georgia ) + (( HI ) Hawaii ) + (( IA Ind ) Indiana ) + (( ID ) Idaho ) + (( IL Ill ) Illinois ) + (( KS Kans ) Kansas ) + (( KY ) Kentucky ) + (( LA Lou Lous) Louisiana ) + (( MA Mass ) Massachusetts ) + (( MD ) Maryland ) + (( ME ) Maine ) + (( MI Mich ) Michigan ) + (( MN Minn ) Minnesota ) + (( MS Miss ) Mississippi ) + (( MT ) Montana ) + (( MO ) Missouri ) + (( NC ) North Carolina ) + (( ND ) North Dakota ) + (( NE Neb ) Nebraska ) + (( NH ) New Hampshire) + (( NV Nev ) Nevada ) + (( NY ) New York ) + (( OH ) Ohio ) + (( OK Okla ) Oklahoma ) + (( Or Ore ) Oregon ) + (( PA Penn ) Pennsylvania ) + (( RI ) Rhode Island ) + (( SC ) Sourth Carolina ) + (( SD ) Sourth Dakota ) + (( TN Tenn ) Tennessee ) + (( TX Tex ) Texas ) + (( UT ) Utah ) + (( VA Vir ) Virginia ) + (( VT ) Vermont ) + (( WA Wash ) Washington ) + (( WI Wisc ) Wisconsin ) + (( WV ) West Virginia ) + (( WY Wyom ) Wyoming ) + (( PR ) Puerto Rico ) + )) + +(set! addr_compass_points + '(((S So Sth) South) + ((N No Nor) North) + ((E) East) + ((W) West) + ((NE) North East) + ((NW) North West) + ((SE) South East) + ((SW) South West))) + +(set! addr_streets + '(((Hwy) Highway) + ((Rt Rte) Root) + ((Ct) Court) + ((Pl) Place) + ((Blvd Bld) Boulevard) + ((Ave) Avenue) + ((Rd) Road) + ((Apt App Appt) Apartment) + ((Cntr Ctr) Center) + ((Ter Terr Tr) Terrace) + ((Ln) Lane) + ((PO) pea oh) + )) + +(set! addr_uk_counties + '((( Hants ) Hampshire) + (( Soton ) Southampton ) + (( Berks ) Berkshire ) + (( Yorks ) Yorkshire ) + (( Leics ) Leicestershire ) + (( Shrops ) Shropshire ) + (( Cambs ) Cambridgeshire ) + (( Oxon ) Oxfordshire ) + (( Notts ) Nottinghamshire ) + (( Humbers ) Humberside ) + (( Glams ) Glamorganshire ) + (( Pembs ) Pembrookeshire ) + (( Lancs ) Lancashire ) + (( Berwicks ) Berwickshire ) + )) + +(set! addr_addr_abbrevs + (append + addr_us_states + addr_compass_points + addr_streets)) + +(provide 'addr-mode) diff --git a/examples/apml.dtd b/examples/apml.dtd new file mode 100644 index 0000000..7f6dd31 --- /dev/null +++ b/examples/apml.dtd @@ -0,0 +1,31 @@ + + + + + + + + + + + + + + + + + + + + + + diff --git a/examples/benchmark b/examples/benchmark new file mode 100755 index 0000000..7c5a465 --- /dev/null +++ b/examples/benchmark @@ -0,0 +1,86 @@ +#!/bin/sh + +default_libdir="/projects/festival/lib" + +while true + do + case "$1" in + -f ) festival="${2}" + shift 2 + ;; + -l ) libdir="$2" + shift 2 + ;; + * ) break;; + esac +done + +text=${1-"$HOME/projects/festival/examples/benchmark.text"} + +for i in . src/main ../src/main $HOME/projects/festival/src/main /cstr/bin + do + if [ -n "$festival" ] + then + break; + fi + if [ -x "$i/festival" ] + then + festival="$i/festival" + fi +done + +[ -n "$festival" ] || + { + echo "Can't find festival" + exit 1 + } + +if [ -z "$libdir" ] + then + case $festival in + *main/festival ) libdir=`dirname $festival`/../../lib;; + * ) libdir=$default_libdir;; + esac +fi + +echo Using $festival + +start_flag_file="/tmp/fest_start_$$" +end_flag_file="/tmp/fest_end_$$" +script="/tmp/fest_script_$$" + +echo -n > $flag_file; + +cat > $script <<__END__ + +(set! libdir "$libdir/") +(set! lexdir "$default_libdir/dicts/") +(set! voiced_dir "$default_libdir/voices/") + +(load (string-append libdir "init.scm")) +(if (probe_file (format nil "%s/.festivalrc" (getenv "HOME"))) + (load (format nil "%s/.festivalrc" (getenv "HOME")))) + + +(audio_mode 'async) +(set! tts_hooks (list utt.synth)) + +(puts "start...\n" nil) +(fclose (fopen "$start_flag_file" "w")) + +(tts_file "$text" (quote text)) + +(fclose (fopen "$end_flag_file" "w")) +(puts "...end\n" nil) +(audio_mode 'close) + +(quit) + +__END__ + +eval $festival --script $script + +perl -e 'print "running time = ", (stat($ARGV[1]))[8]-(stat($ARGV[0]))[8], " seconds\n";' $start_flag_file $end_flag_file + +/bin/rm -f $start_flag_file $end_flag_file $script + diff --git a/examples/benchmark.text b/examples/benchmark.text new file mode 100644 index 0000000..c6ee6c7 --- /dev/null +++ b/examples/benchmark.text @@ -0,0 +1,132 @@ + + + +FESTIVAL(1) User Commands FESTIVAL(1) + + + +NAME + festival - a text-to-speech system. + +SYNOPSIS + festival [options] [file0] [file1] ... + + + +DESCRIPTION + Festival is a general purpose text-to-speech system. As + well as simply rendering text as speech it can be used in an + interactive command mode for testing and developing various + aspects of speech synthesis technology. + + Festival has two major modes, command and tts (text-to- + speech). When in command mode input (from file or interac- + tively) is interpreted by the command interpreter. When in + tts mode input is rendered as speech. + + +OPTIONS + -h print help information + + -q Start without loading any initialization files. + + --libdir PATH + Specify alternate to default library directory (used + in initializing the variable load-path, and for + loading most intinialisation files) + + -b or --batch + Run in batch mode. In batch mode no input is read + from standard input + + -i or --interactive + Run in interactive mode. In intteractive mode input + (commands or text) is read from standard input. + When in command mode (the default) a readline based + Lisp read-eval-print command interpreter is + presented. + + --server + Run in server mode. Any file arguments are loaded + 9and interpreted, before going into server mode. In + server mode Festival waits for clients on port 1314 + (by default). Note server mode can give unauthor- + ised access to your machine, please read the section + in the manual entitled Server/client API before + using this mode. + + --tts Run in tts mode. All files argument are treated + as text files to be said. Unless interactive mode + + + +SunOS 5.5.1 Last change: 4th Feb 1996 1 + + + + + + +FESTIVAL(1) User Commands FESTIVAL(1) + + + + is explicitly specified no input is read from stan- + dard input, unless no files are specified. + + --language LANG + Where LANG is one of english (default) spanish or + welsh. Select that language for basic operation, + command or tts. This may be change during a session + with the command select_language. + + -v Display version number and exit. + + +BUGS + More than you can imagine. + + A manual with much detail (though not complete) is available + in info (or html) format. + + +AUTHOR + Alan W Black and Paul Taylor + (C) Centre for Speech Technology Research + University of Edinburgh + 80 South Bridge + Edinburgh EH1 1HN + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +SunOS 5.5.1 Last change: 4th Feb 1996 2 + + + diff --git a/examples/dumpfeats.sh b/examples/dumpfeats.sh new file mode 100644 index 0000000..82398cd --- /dev/null +++ b/examples/dumpfeats.sh @@ -0,0 +1,198 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;-*-mode:scheme-*- +;; ;; +;; Centre for Speech Technology Research ;; +;; University of Edinburgh, UK ;; +;; Copyright (c) 1996,1997 ;; +;; All Rights Reserved. ;; +;; ;; +;; Permission is hereby granted, free of charge, to use and distribute ;; +;; this software and its documentation without restriction, including ;; +;; without limitation the rights to use, copy, modify, merge, publish, ;; +;; distribute, sublicense, and/or sell copies of this work, and to ;; +;; permit persons to whom this work is furnished to do so, subject to ;; +;; the following conditions: ;; +;; 1. The code must retain the above copyright notice, this list of ;; +;; conditions and the following disclaimer. ;; +;; 2. Any modifications must be clearly marked as such. ;; +;; 3. Original authors' names are not deleted. ;; +;; 4. The authors' names are not used to endorse or promote products ;; +;; derived from this software without specific prior written ;; +;; permission. ;; +;; ;; +;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;; THIS SOFTWARE. ;; +;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Author: Alan W Black +;;; Date: December 1997 +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Dump features from a list of utterances +;;; + +;;; Because this is a --script type file it has to explicitly +;;; load the initfiles: init.scm and user's .festivalrc +(load (path-append libdir "init.scm")) + +(define (dumpfeats_help) + (format t "%s\n" + "Usage: dumpfeats [options] ... + Dump features from a set of utterances + Options + -relation + Relation from which the features have to be dumped from + -output + If output parameter contains a %s its treated as a skeleton + e.g feats/%s.feats and multiple files will be created one + each utterance. If output doesn't contain %s the output + is treated as a single file and all features and dumped in it. + -feats + If argument starts with a \"(\" it is treated as a list of + features to dump, otherwise it is treated as a filename whose + contents contain a set of features (without parenetheses). + -eval + A scheme file to be loaded before dumping. This may contain + dump specific features etc. If filename starts with a left + parenthis it it evaluated as lisp. + -from_file + A file with a list of utterance files names (used when there + are a very large number of files. +") + (quit)) + +;;; Default options values +(defvar utt_files nil) ;; utterance files to dump from +(defvar desired_relation nil) +(defvar output "-") +(defvar desired_features nil) +(defvar extra-file nil) + +;;; Get options +(define (get_options) + (let ((files nil) + (o argv)) + (if (or (member_string "-h" argv) + (member_string "-help" argv) + (member_string "--help" argv) + (member_string "-?" argv)) + (dumpfeats_help)) + (while o + (begin + (cond + ((string-equal "-relation" (car o)) + (if (not (cdr o)) + (dumpfeats_error "no stream file specified")) + (set! desired_relation (car (cdr o))) + (set! o (cdr o))) + ((string-equal "-output" (car o)) + (if (not (cdr o)) + (dumpfeats_error "no output file/skeleton specified")) + (set! output (car (cdr o))) + (set! o (cdr o))) + ((string-equal "-feats" (car o)) + (if (not (cdr o)) + (dumpfeats_error "no feats list/file specified")) + (if (string-matches (car (cdr o)) "^(.*") + (set! desired_features (read-from-string (car (cdr o)))) + (set! desired_features (load (car (cdr o)) t))) + (set! o (cdr o))) + ((string-equal "-from_file" (car o)) + (if (not (cdr o)) + (durmeanstd_error "no file of utts names file specified")) + (set! files + (append + (reverse (load (car (cdr o)) t)) files)) + (set! o (cdr o))) + ((string-equal "-eval" (car o)) + (if (not (cdr o)) + (dumpfeats_error "no file specified to load")) + (if (string-matches (car (cdr o)) "^(.*") + (eval (read-from-string (car (cdr o)))) + (load (car (cdr o)))) + (set! o (cdr o))) + (t + (set! files (cons (car o) files)))) + (set! o (cdr o)))) + (if files + (set! utt_files (reverse files))))) + +(define (dumpfeats_error message) + (format stderr "%s: %s\n" "dumpfeats" message) + (dumpfeats_help)) + +;;; No gc messages +(gc-status nil) + +(define (dump_all_features relname feats names outskeleton) +"(dump_all_features relname feats names outskeleton) +Dump all names features in RELNAME from utterances in NAMES +to a files or files specified by outskeleton." + (let (fd) + (if (not (string-matches outskeleton ".*%s.*")) + (set! fd (fopen outskeleton "w"))) + (mapcar + (lambda (uttfile) + (if (cdr names) ;; only output the utt name if there is more than one + (format stderr "%s\n" uttfile)) + ;; change fd to new file if in skeleton mode + (if (string-matches outskeleton ".*%s.*") + (set! fd (fopen (format nil outskeleton + (string-before + (basename uttfile) ".")) + "w"))) + (unwind-protect + (extract_feats + relname + feats + (utt.load nil uttfile) + fd) + nil) + (if (string-matches outskeleton ".*%s.*") + (fclose fd)) + t) + names) + (if (not (string-matches outskeleton ".*%s.*")) + (fclose fd)))) + +(define (extract_feats relname feats utt outfd) + "(extract_feats relname feats utt outfd) +Extract the features and write them to the file descriptor." + (mapcar + (lambda (si) + (mapcar + (lambda (f) + (set! fval (unwind-protect (item.feat si f) "0")) + (if (or (string-equal "" fval) + (string-equal " " fval)) + (format outfd "%l " fval) + (format outfd "%s " fval))) + feats) + (format outfd "\n") + t) + (utt.relation.items utt relname)) + t) + +(define (get_utt fname) + (utt.load nil fname)) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; The main work +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +(define (main) + (get_options) + + (dump_all_features + desired_relation + desired_features + utt_files + output) +) + +(main) diff --git a/examples/durmeanstd.sh b/examples/durmeanstd.sh new file mode 100644 index 0000000..76a8290 --- /dev/null +++ b/examples/durmeanstd.sh @@ -0,0 +1,212 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;-*-mode:scheme-*- +;; ;; +;; Centre for Speech Technology Research ;; +;; University of Edinburgh, UK ;; +;; Copyright (c) 1996,1997 ;; +;; All Rights Reserved. ;; +;; ;; +;; Permission is hereby granted, free of charge, to use and distribute ;; +;; this software and its documentation without restriction, including ;; +;; without limitation the rights to use, copy, modify, merge, publish, ;; +;; distribute, sublicense, and/or sell copies of this work, and to ;; +;; permit persons to whom this work is furnished to do so, subject to ;; +;; the following conditions: ;; +;; 1. The code must retain the above copyright notice, this list of ;; +;; conditions and the following disclaimer. ;; +;; 2. Any modifications must be clearly marked as such. ;; +;; 3. Original authors' names are not deleted. ;; +;; 4. The authors' names are not used to endorse or promote products ;; +;; derived from this software without specific prior written ;; +;; permission. ;; +;; ;; +;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;; THIS SOFTWARE. ;; +;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Author: Alan W Black +;;; Date: December 1997 +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Find the means and standard deviations of the durations of all +;;; phones in the given utterances. +;;; + +;;; Because this is a --script type file it has to explicitly +;;; load the initfiles: init.scm and user's .festivalrc +(load (path-append libdir "init.scm")) + +(define (durmeanstd_help) + (format t "%s\n" + "durmeanstd [options] festival/utts/*.utts + Find means and standard deviation of phone durations in utterances + Options + -output + File to save output in + -relation + Relation for phones (default Segment) + -log + Take log of durations first + -from_file + file with list of utt names +") + (quit)) + +;;; Default options values +(defvar utt_files nil) +(defvar outfile "durs.meanstd") +(defvar log_domain nil) +(defvar durrelation 'Segment) + +;;; Get options +(define (get_options) + (let ((files nil) + (o argv)) + (if (or (member_string "-h" argv) + (member_string "-help" argv) + (member_string "--help" argv) + (member_string "-?" argv)) + (durmeanstd_help)) + (while o + (begin + (cond + ((string-equal "-output" (car o)) + (if (not (cdr o)) + (durmeanstd_error "no output file specified")) + (set! outfile (car (cdr o))) + (set! o (cdr o))) + ((string-equal "-relation" (car o)) + (if (not (cdr o)) + (durmeanstd_error "no relation file specified")) + (set! durrelation (car (cdr o))) + (set! o (cdr o))) + ((string-equal "-from_file" (car o)) + (if (not (cdr o)) + (durmeanstd_error "no file of utts names file specified")) + (set! files + (append + (load (car (cdr o)) t) files)) + (set! o (cdr o))) + ((string-equal "-log" (car o)) + (set! log_domain t)) + (t + (set! files (cons (car o) files)))) + (set! o (cdr o)))) + (if files + (set! utt_files (reverse files))))) + +(define (durmeanstd_error message) + (format stderr "%s: %s\n" "durmeanstd" message) + (durmeanstd_help)) + +;;; No gc messages +(gc-status nil) + +;;; A simple sufficient statistics class +(define (suffstats.new) + (list + 0 ;; n + 0 ;; sum + 0 ;; sumx + )) + +(define (suffstats.set_n x n) + (set-car! x n)) +(define (suffstats.set_sum x sum) + (set-car! (cdr x) sum)) +(define (suffstats.set_sumx x sumx) + (set-car! (cdr (cdr x)) sumx)) +(define (suffstats.n x) + (car x)) +(define (suffstats.sum x) + (car (cdr x))) +(define (suffstats.sumx x) + (car (cdr (cdr x)))) +(define (suffstats.reset x) + (suffstats.set_n x 0) + (suffstats.set_sum x 0) + (suffstats.set_sumx x 0)) +(define (suffstats.add x d) + (suffstats.set_n x (+ (suffstats.n x) 1)) + (suffstats.set_sum x (+ (suffstats.sum x) d)) + (suffstats.set_sumx x (+ (suffstats.sumx x) (* d d))) +) + +(define (suffstats.mean x) + (/ (suffstats.sum x) (suffstats.n x))) +(define (suffstats.variance x) + (cond + ((or (< (suffstats.n x) 2 ) + (equal? (* (suffstats.n x) (suffstats.sumx x)) + (* (suffstats.sum x) (suffstats.sum x)))) + ;; avoid 0 variance + (/ (suffstats.mean x) 10.0)) + (t + (/ (- (* (suffstats.n x) (suffstats.sumx x)) + (* (suffstats.sum x) (suffstats.sum x))) + (* (suffstats.n x) (- (suffstats.n x) 1)))))) +(define (suffstats.stddev x) + (sqrt (suffstats.variance x))) + +;;; Index for each phone +(defvar phonelist nil) ;; index of phone to suffstats +(define (get_phone_data phone) + (let ((a (car (cdr (assoc phone phonelist))))) + (if a + a + (begin ;; first time for this phone + (set! phonelist + (cons + (list phone (suffstats.new)) + phonelist)) + (car (cdr (assoc phone phonelist))))))) + +(define (duration i) + (if (item.prev i) + (- (item.feat i "end") (item.feat i "p.end")) + (item.feat i "end"))) + +(define (cummulate_seg_durs utt_name) + (let ((utt (utt.load nil utt_name))) + (mapcar + (lambda (s) + (suffstats.add + (get_phone_data (item.name s)) + (if log_domain + (log (duration s)) + (duration s)))) + (utt.relation.items utt durrelation)))) + +(define (output_dur_data data outfile) + (let ((fd (fopen outfile "w"))) + (mapcar + (lambda (d) + (format fd "(%s %f %f)\n" + (car d) + (suffstats.mean (car (cdr d))) + (suffstats.stddev (car (cdr d))))) + data) + (fclose fd))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; The main work +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +(define (main) + (get_options) + + (mapcar + (lambda (u) + (unwind-protect + (cummulate_seg_durs u) + nil)) + utt_files) + (output_dur_data phonelist outfile) +) + +(main) diff --git a/examples/ex1.email b/examples/ex1.email new file mode 100644 index 0000000..fea3dac --- /dev/null +++ b/examples/ex1.email @@ -0,0 +1,34 @@ +From VM Wed Nov 27 15:33:27 1996 +Content-Length: 482 +Return-Path: awb@cstr.ed.ac.uk +Received: from margo (margo.cstr.ed.ac.uk [192.41.114.73]) by liddell.cstr.ed.ac.uk (8.6.13/8.6.10) with ESMTP id PAA08491 for ; Wed, 27 Nov 1996 15:32:57 GMT +Received: (awb@localhost) by margo (SMI-8.6/8.6.9) id PAA00818; Wed, 27 Nov 1996 15:32:54 GMT +Message-Id: <199611271532.PAA00818@margo> +In-Reply-To: <199611271531.PAA00812@margo> +References: <199611271531.PAA00812@margo> +From: Alan W Black +To: Alan W Black +Subject: Example mail message +Date: Wed, 27 Nov 1996 15:32:54 GMT + + + Alan W. Black writes on 27 November 1996: + > + > + > I'm looking for a demo mail message for Festival, but can't seem to + > find any suitable. It should at least have some quoted text, and + > have some interesting tokens like a URL or such like. + > + > Alan + +Well I'm not sure exactly what you mean but awb@cogsci.ed.ac.uk +has an interesting home page at http://www.cstr.ed.ac.uk/~awb/ which +might be what you're looking for. + +Alan + + > PS. Will you attend the course? + +I hope so + +bye for now diff --git a/examples/ex1.ogi b/examples/ex1.ogi new file mode 100644 index 0000000..e924bff --- /dev/null +++ b/examples/ex1.ogi @@ -0,0 +1,17 @@ + +This is an example. + + This is a slow example. + + This is a very slow example. + a normal one and a fast talking example. + Maybe this one is too fast. + + + My name is Mike, . +My telephone number is + + Mike here. Chris here. This is Gordon. +I'm most terribly sorry to interrupt at this time, but my name +is Roger. A good day to you. + diff --git a/examples/example.apml b/examples/example.apml new file mode 100644 index 0000000..14b156b --- /dev/null +++ b/examples/example.apml @@ -0,0 +1,114 @@ + + + + + + + +Good morning + Mr Smith + + + + + + + + I'm sorry to tell you + + +that you have been diagnosed + as suffering + from a mild + form + of what we call angina + pectoris. + + + + + + + This is + + + a spasm + of the chest + , + + + resulting from + + + overexertion when the heart + is diseased + + + + + + + To solve + this problem + , + + + there are two + drugs I would like you to take. + + + + + + The first + one + + +is Aspirin, + + + + which is + +an analgesic. + + + + + that is, + +it relieves + the pain. + + + + + + I have prescribed it + + + + to cure your angina + + + + + + The only + + problem + + + +is that this drug can be associated with + some sideeffects + . + + + \ No newline at end of file diff --git a/examples/example.sable b/examples/example.sable new file mode 100644 index 0000000..106a0f9 --- /dev/null +++ b/examples/example.sable @@ -0,0 +1,63 @@ + + + + + + +My telephone number is + + + diff --git a/examples/example.th b/examples/example.th new file mode 100644 index 0000000..3d0663a --- /dev/null +++ b/examples/example.th @@ -0,0 +1,4 @@ +*smile* Good morning. This is an example of Festival using a talking +head. This example uses a Festival text mode *frown* which can be +difficult to understand, *smile* but it does help me talk. + diff --git a/examples/example2.sable b/examples/example2.sable new file mode 100644 index 0000000..69185ca --- /dev/null +++ b/examples/example2.sable @@ -0,0 +1,38 @@ + + + + + + + +Homographs are words that are written the same but have different +pronunciations, such as lives and +lives. + + +You say either, while I +say either. + + + + + +We can say things fast. + + + +and slowly. + + + + + +And then at normal speed. + + + + + diff --git a/examples/festival_client.c b/examples/festival_client.c new file mode 100644 index 0000000..4004b94 --- /dev/null +++ b/examples/festival_client.c @@ -0,0 +1,451 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1999 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black (awb@cstr.ed.ac.uk) */ +/* Date : March 1999 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Client end of Festival server API in C designed specifically for */ +/* Galaxy Communicator use though might be of use for other things */ +/* */ +/* This is a standalone C client, no other Festival or Speech Tools */ +/* libraries need be link with this. Thus is very small. */ +/* */ +/* Compile with (plus socket libraries if required) */ +/* cc -o festival_client -DSTANDALONE festival_client.c */ +/* */ +/* Run as */ +/* festival_client -text "hello there" -o hello.snd */ +/* */ +/* */ +/* This is provided as an example, it is quite limited in what it does */ +/* but is functional compiling without -DSTANDALONE gives you a simple */ +/* API */ +/* */ +/*=======================================================================*/ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "festival_client.h" + +/* For testing endianness */ +int fapi_endian_loc = 1; + +static char *socket_receive_file_to_buff(int fd,int *size); + +void delete_FT_Wave(FT_Wave *wave) +{ + if (wave != 0) + { + if (wave->samples != 0) + free(wave->samples); + free(wave); + } +} + +int save_FT_Wave_snd(FT_Wave *wave, const char *filename) +{ + FILE *fd; + struct { + unsigned int magic; /* magic number */ + unsigned int hdr_size; /* size of this header */ + int data_size; /* length of data (optional) */ + unsigned int encoding; /* data encoding format */ + unsigned int sample_rate; /* samples per second */ + unsigned int channels; /* number of interleaved channels */ + } header; + short sw_short; + int i; + + if ((filename == 0) || + (strcmp(filename,"stdout") == 0) || + (strcmp(filename,"-") == 0)) + fd = stdout; + else if ((fd = fopen(filename,"wb")) == NULL) + { + fprintf(stderr,"save_FT_Wave: can't open file \"%s\" for writing\n", + filename); + return -1; + } + + header.magic = (unsigned int)0x2e736e64; + header.hdr_size = sizeof(header); + header.data_size = 2 * wave->num_samples; + header.encoding = 3; /* short */ + header.sample_rate = wave->sample_rate; + header.channels = 1; + if (FAPI_LITTLE_ENDIAN) + { /* snd is always sparc/68000 byte order */ + header.magic = SWAPINT(header.magic); + header.hdr_size = SWAPINT(header.hdr_size); + header.data_size = SWAPINT(header.data_size); + header.encoding = SWAPINT(header.encoding); + header.sample_rate = SWAPINT(header.sample_rate); + header.channels = SWAPINT(header.channels); + } + /* write header */ + if (fwrite(&header, sizeof(header), 1, fd) != 1) + return -1; + if (FAPI_BIG_ENDIAN) + fwrite(wave->samples,sizeof(short),wave->num_samples,fd); + else + { /* have to swap */ + for (i=0; i < wave->num_samples; i++) + { + sw_short = SWAPSHORT(wave->samples[i]); + fwrite(&sw_short,sizeof(short),1,fd); + } + } + + if (fd != stdout) + fclose(fd); + return 0; +} + +void delete_FT_Info(FT_Info *info) +{ + if (info != 0) + free(info); +} + +static FT_Info *festival_default_info() +{ + FT_Info *info; + info = (FT_Info *)malloc(1 * sizeof(FT_Info)); + + info->server_host = FESTIVAL_DEFAULT_SERVER_HOST; + info->server_port = FESTIVAL_DEFAULT_SERVER_PORT; + info->text_mode = FESTIVAL_DEFAULT_TEXT_MODE; + + info->server_fd = -1; + + return info; +} + +static int festival_socket_open(const char *host, int port) +{ + /* Return an FD to a remote server */ + struct sockaddr_in serv_addr; + struct hostent *serverhost; + int fd; + + fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); + + if (fd < 0) + { + fprintf(stderr,"festival_client: can't get socket\n"); + return -1; + } + memset(&serv_addr, 0, sizeof(serv_addr)); + if ((serv_addr.sin_addr.s_addr = inet_addr(host)) == -1) + { + /* its a name rather than an ipnum */ + serverhost = gethostbyname(host); + if (serverhost == (struct hostent *)0) + { + fprintf(stderr,"festival_client: gethostbyname failed\n"); + return -1; + } + memmove(&serv_addr.sin_addr,serverhost->h_addr, serverhost->h_length); + } + serv_addr.sin_family = AF_INET; + serv_addr.sin_port = htons(port); + + if (connect(fd, (struct sockaddr *)&serv_addr, sizeof(serv_addr)) != 0) + { + fprintf(stderr,"festival_client: connect to server failed\n"); + return -1; + } + + return fd; +} + +static int nist_get_param_int(char *hdr, char *field, int def_val) +{ + char *p; + int val; + + if (((p=strstr(hdr,field)) != NULL) && + (strncmp(" -i ",p+strlen(field),4) == 0)) + { + sscanf(p+strlen(field)+4,"%d",&val); + return val; + } + else + return def_val; +} + +static int nist_require_swap(char *hdr) +{ + char *p; + char *field = "sample_byte_format"; + + if ((p=strstr(hdr,field)) != NULL) + { + if (((strncmp(" -s2 01",p+strlen(field),7) == 0) && FAPI_BIG_ENDIAN) || + ((strncmp(" -s2 10",p+strlen(field),7) == 0) + && FAPI_LITTLE_ENDIAN)) + return 1; + } + return 0; /* if unknown assume native byte order */ +} + +static char *client_accept_s_expr(int fd) +{ + /* Read s-expression from server, as a char * */ + char *expr; + int filesize; + + expr = socket_receive_file_to_buff(fd,&filesize); + expr[filesize] = '\0'; + return expr; +} + +static FT_Wave *client_accept_waveform(int fd) +{ + /* Read waveform from server */ + char *wavefile; + int filesize; + int num_samples, sample_rate, i; + FT_Wave *wave; + + wavefile = socket_receive_file_to_buff(fd,&filesize); + wave = 0; + + /* I know this is NIST file and its an error if it isn't */ + if (filesize >= 1024) + { + num_samples = nist_get_param_int(wavefile,"sample_count",0); + sample_rate = nist_get_param_int(wavefile,"sample_rate",16000); + if ((num_samples*sizeof(short))+1024 == filesize) + { + wave = (FT_Wave *)malloc(sizeof(FT_Wave)); + wave->num_samples = num_samples; + wave->sample_rate = sample_rate; + wave->samples = (short *)malloc(num_samples*sizeof(short)); + memmove(wave->samples,wavefile+1024,num_samples*sizeof(short)); + if (nist_require_swap(wavefile)) + for (i=0; i < num_samples; i++) + wave->samples[i] = SWAPSHORT(wave->samples[i]); + } + } + + if (wavefile != 0) /* just in case we've got an ancient free() */ + free(wavefile); + return wave; +} + +static char *socket_receive_file_to_buff(int fd,int *size) +{ + /* Receive file (probably a waveform file) from socket using */ + /* Festival key stuff technique, but long winded I know, sorry */ + /* but will receive any file without closeing the stream or */ + /* using OOB data */ + static char *file_stuff_key = "ft_StUfF_key"; /* must == Festival's key */ + char *buff; + int bufflen; + int n,k,i; + char c; + + bufflen = 1024; + buff = (char *)malloc(bufflen); + *size=0; + + for (k=0; file_stuff_key[k] != '\0';) + { + n = read(fd,&c,1); + if (n==0) break; /* hit stream eof before end of file */ + if ((*size)+k+1 >= bufflen) + { /* +1 so you can add a NULL if you want */ + bufflen += bufflen/4; + buff = (char *)realloc(buff,bufflen); + } + if (file_stuff_key[k] == c) + k++; + else if ((c == 'X') && (file_stuff_key[k+1] == '\0')) + { /* It looked like the key but wasn't */ + for (i=0; i < k; i++,(*size)++) + buff[*size] = file_stuff_key[i]; + k=0; + /* omit the stuffed 'X' */ + } + else + { + for (i=0; i < k; i++,(*size)++) + buff[*size] = file_stuff_key[i]; + k=0; + buff[*size] = c; + (*size)++; + } + + } + + return buff; +} + +/***********************************************************************/ +/* Public Functions to this API */ +/***********************************************************************/ + +FT_Info *festivalOpen(FT_Info *info) +{ + /* Open socket to server */ + + if (info == 0) + info = festival_default_info(); + + info->server_fd = + festival_socket_open(info->server_host, info->server_port); + if (info->server_fd == -1) + return NULL; + + return info; +} + +FT_Wave *festivalStringToWave(FT_Info *info,char *text) +{ + FT_Wave *wave; + FILE *fd; + char *p; + char ack[4]; + int n; + + if (info == 0) + return 0; + + if (info->server_fd == -1) + { + fprintf(stderr,"festival_client: server connection unopened\n"); + return 0; + } + fd = fdopen(dup(info->server_fd),"wb"); + + /* Copy text over to server, escaping any quotes */ + fprintf(fd,"(tts_textall \"\n"); + for (p=text; p && (*p != '\0'); p++) + { + if ((*p == '"') || (*p == '\\')) + putc('\\',fd); + putc(*p,fd); + } + fprintf(fd,"\" \"%s\")\n",info->text_mode); + fclose(fd); + + /* Read back info from server */ + /* This assumes only one waveform will come back, also LP is unlikely */ + wave = 0; + do { + for (n=0; n < 3; ) + n += read(info->server_fd,ack+n,3-n); + ack[3] = '\0'; + if (strcmp(ack,"WV\n") == 0) /* receive a waveform */ + wave = client_accept_waveform(info->server_fd); + else if (strcmp(ack,"LP\n") == 0) /* receive an s-expr */ + client_accept_s_expr(info->server_fd); + else if (strcmp(ack,"ER\n") == 0) /* server got an error */ + { + fprintf(stderr,"festival_client: server returned error\n"); + break; + } + } while (strcmp(ack,"OK\n") != 0); + + return wave; +} + + +int festivalClose(FT_Info *info) +{ + if (info == 0) + return 0; + + if (info->server_fd != -1) + close(info->server_fd); + + return 0; +} + +#ifdef STANDALONE +int main(int argc, char **argv) +{ + char *server=0; + int port=-1; + char *text=0; + char *output=0; + char *mode=0; + int i; + FT_Info *info; + FT_Wave *wave; + + for (i=1; i < argc; i++) + { + if (strcmp(argv[i],"-server") == 0) + server = argv[++i]; + else if (strcmp(argv[i],"-port") == 0) + port = atoi(argv[++i]); + else if (strcmp(argv[i],"-text") == 0) + text = argv[++i]; + else if (strcmp(argv[i],"-mode") == 0) + mode = argv[++i]; + else if (strcmp(argv[i],"-o") == 0) + output = argv[++i]; + } + if (i > argc) + { + fprintf(stderr,"missing argument\n"); + exit(1); + } + + info = festival_default_info(); + if (server != 0) + info->server_host = server; + if (port != -1) + info->server_port = port; + if (mode != 0) + info->text_mode = mode; + + info = festivalOpen(info); + wave = festivalStringToWave(info,text); + + if (wave != 0) + save_FT_Wave_snd(wave,output); + + festivalClose(info); + + return 0; +} +#endif diff --git a/examples/festival_client.h b/examples/festival_client.h new file mode 100644 index 0000000..6f94c3b --- /dev/null +++ b/examples/festival_client.h @@ -0,0 +1,90 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1999 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black (awb@cstr.ed.ac.uk) */ +/* Date : March 1999 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Client end of Festival server API (in C) designed specifically for */ +/* Galaxy Communicator use, though might be of use for other things */ +/* */ +/*=======================================================================*/ +#ifndef _FESTIVAL_CLIENT_H_ +#define _FESTIVAL_CLIENT_H_ + +#define FESTIVAL_DEFAULT_SERVER_HOST "localhost" +#define FESTIVAL_DEFAULT_SERVER_PORT 1314 +#define FESTIVAL_DEFAULT_TEXT_MODE "fundamental" + +typedef struct FT_Info +{ + int encoding; + char *server_host; + int server_port; + char *text_mode; + + int server_fd; +} FT_Info; + +typedef struct FT_Wave +{ + int num_samples; + int sample_rate; + short *samples; +} FT_Wave; + +void delete_FT_Wave(FT_Wave *wave); +void delete_FT_Info(FT_Info *info); + +#define SWAPSHORT(x) ((((unsigned)x) & 0xff) << 8 | \ + (((unsigned)x) & 0xff00) >> 8) +#define SWAPINT(x) ((((unsigned)x) & 0xff) << 24 | \ + (((unsigned)x) & 0xff00) << 8 | \ + (((unsigned)x) & 0xff0000) >> 8 | \ + (((unsigned)x) & 0xff000000) >> 24) + +/* Sun, HP, SGI Mips, M68000 */ +#define FAPI_BIG_ENDIAN (((char *)&fapi_endian_loc)[0] == 0) +/* Intel, Alpha, DEC Mips, Vax */ +#define FAPI_LITTLE_ENDIAN (((char *)&fapi_endian_loc)[0] != 0) + + +/*****************************************************************/ +/* Public functions to interface */ +/*****************************************************************/ + +/* If called with NULL will attempt to access using defaults */ +FT_Info *festivalOpen(FT_Info *info); +FT_Wave *festivalStringToWave(FT_Info *info,char *text); +int festivalClose(FT_Info *info); + +#endif diff --git a/examples/festival_client.pl b/examples/festival_client.pl new file mode 100644 index 0000000..a39e4d4 --- /dev/null +++ b/examples/festival_client.pl @@ -0,0 +1,164 @@ +#!/usr/local/bin/perl + +# festival_client.pl - a perl socket client for festival +# +# Copyright (C) 1997 +# Kevin A. Lenzo (lenzo@cs.cmu.edu) 7/97 +# All rights reserved. +# +# The authors hereby grant permission to use, copy, modify, distribute, +# and license this software and its documentation for any purpose, provided +# that existing copyright notices are retained in all copies and that this +# notice is included verbatim in any distributions. No written agreement, +# license, or royalty fee is required for any of the authorized uses. +# Modifications to this software may be copyrighted by their authors +# and need not follow the licensing terms described here, provided that +# the new terms are clearly indicated on the first page of each file where +# they apply. +# +# IN NO EVENT SHALL THE AUTHORS OR DISTRIBUTORS BE LIABLE TO ANY PARTY +# FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES +# ARISING OUT OF THE USE OF THIS SOFTWARE, ITS DOCUMENTATION, OR ANY +# DERIVATIVES THEREOF, EVEN IF THE AUTHORS HAVE BEEN ADVISED OF THE +# POSSIBILITY OF SUCH DAMAGE. +# +# THE AUTHORS AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES, +# INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT. THIS SOFTWARE +# IS PROVIDED ON AN "AS IS" BASIS, AND THE AUTHORS AND DISTRIBUTORS HAVE +# NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR +# MODIFICATIONS. +# +######################################################################### +# +# A good deal of this was taken from the example in the +# perlipc man page. The rest is simply the Festival-specific +# stuff. + +use IO::Socket; + +package Festival; + +my ($audio_command, $audio_file, $file_stuff_key); +my ($host, $port, $kidpid, $handle, $line); + +$wave_type = "nist"; # the type of the audio files +$audio_command = "na_play"; # your local audio play command + +# uncomment the next line if your audio command requires a file +# $audio_file = "/tmp/client_tmp$$.nist"; # temp file for waveforms + +$file_stuff_key = "ft_StUfF_key"; # defined in speech tools + +$host = shift || 'localhost'; +$port = shift || 1314; +if ($host eq "-h") { +print STDOUT " + Usage: + + $0 [ []] + + OR + + $0 -h + + perl client for the Festival text-to-speech server + + Lines that do not begin with a ( are treated as text + to be spoken. Those that do begin with a ( are assumed + to Scheme commands to festival and are sent without + alteration. Use `quit' or `exit' to quit. + + Note that a server must be started before the client + will work. + + Use the `-h' (help) option for this message. + +"; + + exit(1); +} + +# create a tcp connection to the specified host and port + +$handle = IO::Socket::INET->new(Proto => "tcp", + PeerAddr => $host, + PeerPort => $port) + or die " + Can't connect to port $port on $host: $! + (Are you sure the server is running and accepting connections?) + +"; + +$handle->autoflush(1); # so output gets there right away +print STDERR "[Connected to $host:$port]\n"; + +# tell the server to send us back a 'file' of the right type +print $handle "(Parameter.set 'Wavefiletype '$wave_type)\n"; + +# split the program into two processes, identical twins +die "can't fork: $!" unless defined($kidpid = fork()); + +# the if{} block runs only in the parent process +if ($kidpid) { + # the parent handles the input so it can exit on quit + + while (defined ($line = )) { + last if ($line =~ /^(quit|exit)$/); + + if ($line =~ /^\(/) { + # just send it wholesale if it's a ( ) + print $handle $line; + } else { + # otherwise assume it's text to be spoken + chomp $line; + print $handle "(tts_textall \"$line\" 'file)\n"; + } + } + kill("TERM", $kidpid);# send SIGTERM to child +} +# the else{} block runs only in the child process +else { + # the child is forked off to get the results from the server + undef $line; + while (($line = $remains) || defined ($line = <$handle>)) { + undef $remains; + if ($line eq "WV\n") { # we have a waveform coming + undef $result; + if ($audio_file) { + open(AUDIO, ">$audio_file"); + } else { + open(AUDIO, "| $audio_command"); + } + while ($line = <$handle>) { + if ($line =~ s/$file_stuff_key(.*)$//s) { + $remains = $1; + print AUDIO $line; + last; + } + print AUDIO $line; + } + close AUDIO; + + if ($audio_file) { + # call the command if we weren't piping + system("$audio_command $audio_file"); + + # remove the evidence + unlink($audio_file); + } + } elsif ($line eq "LP\n") { + while ($line = <$handle>) { + if ($line =~ s/$file_stuff_key(.*)$//s) { + $remains = $1; + print STDOUT $line; + last; + } + print STDOUT $line; + } + } else { + # if we don't recognize it, echo it + print STDOUT $line; + } + } +} diff --git a/examples/intro.text b/examples/intro.text new file mode 100644 index 0000000..87c8dee --- /dev/null +++ b/examples/intro.text @@ -0,0 +1,4 @@ + +This is a short introduction to the Festival Speech Synthesis System. +Festival was developed by Alan Black and Paul Taylor, at the Centre +for Speech Technology Research, University of Edinburgh. diff --git a/examples/latest.sh b/examples/latest.sh new file mode 100644 index 0000000..d52259d --- /dev/null +++ b/examples/latest.sh @@ -0,0 +1,115 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;-*-mode:scheme-*- +;; ;; +;; Centre for Speech Technology Research ;; +;; University of Edinburgh, UK ;; +;; Copyright (c) 1996,1997 ;; +;; All Rights Reserved. ;; +;; ;; +;; Permission is hereby granted, free of charge, to use and distribute ;; +;; this software and its documentation without restriction, including ;; +;; without limitation the rights to use, copy, modify, merge, publish, ;; +;; distribute, sublicense, and/or sell copies of this work, and to ;; +;; permit persons to whom this work is furnished to do so, subject to ;; +;; the following conditions: ;; +;; 1. The code must retain the above copyright notice, this list of ;; +;; conditions and the following disclaimer. ;; +;; 2. Any modifications must be clearly marked as such. ;; +;; 3. Original authors' names are not deleted. ;; +;; 4. The authors' names are not used to endorse or promote products ;; +;; derived from this software without specific prior written ;; +;; permission. ;; +;; ;; +;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;; THIS SOFTWARE. ;; +;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Author: Alan W Black +;;; Date: September 1996 +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Gets the a summary of the latest news from Pathfinder.com (Time +;;; magazine and synthesizes it). My first web based speech application. +;;; +;;; This is far too dependent on Time's latest news pages format and +;;; not really very general, but its a start. Also they seem to change +;;; both the format and the name of the pages regularly so this probably +;;; no longer works. +;;; +;;; Note the news in Copyright Reuters, and should not be used except +;;; for personal use. This program can be viewed simply as a web +;;; browser (for one particular page) and does not itself contain any +;;; information under Time or Reuter copyright. +;;; + +;;; Because this is a --script type file I has to explicitly +;;; load the initfiles: init.scm and user's .festivalrc +(load (path-append libdir "init.scm")) + +(audio_mode 'async) ;; play waves while continuing synthesis + +;;; Give short introduction, so something can happen while we're +;;; getting the news +(SayText "And here is the news.") +(SayText "News stories are courtesy of Time Warners' Path finder Magazine + and Reuters News Media.") + +(format t "Getting news from Pathfinder Magazine ... \n") +(fflush nil) +;;; First get the page + +(set! tmpfile (make_tmp_filename)) +(set! tmpfile2 (string-append tmpfile "_2")) + +(get_url "http://www.pathfinder.com/news/latest" tmpfile) + +(format t "done\n") + +;; This has to be powerful awk, not the original awk. GNU awk or nawk +;; are what I'm looking for, but they have such random names, and may or +;; may not be on your system. +(if (string-matches *ostype* ".*Linux.*") + (defvar GOOD_AWK "awk") + (defvar GOOD_AWK "nawk")) + +;; Should now use some HTML to SSML conversion but hack it just now +(system + (string-append + GOOD_AWK " '{ if ($1 == \"
\") + inlist = 1; + if (inlist == 1) + { + if ($1 == \"
\") # title + { + getline # skip href + getline + line = $0 + sub(/^.*/,\"\",line); + sub(/ *<.b>.*$/,\"\",line); + printf(\"%s, \",line); + } + else if ($1 == \"
\") # summary + { + getline + line = $0 + sub(/\(.. ... .... ..:.. ...\)/,\"\",line) # remove time stamp + printf(\"%s\\n\\n\",line); + } + else if ($1 == \"
\") + inlist = 0; + } + }' < " tmpfile " > " tmpfile2)) + +;; Say the news +(tts_file tmpfile2 nil) + +(system (string-append "rm -f " tmpfile " " tmpfile2)) +(audio_mode 'close) ;; close gracefully + + diff --git a/examples/make_utts.sh b/examples/make_utts.sh new file mode 100644 index 0000000..014576d --- /dev/null +++ b/examples/make_utts.sh @@ -0,0 +1,558 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;-*-mode:scheme-*- +;; ;; +;; Centre for Speech Technology Research ;; +;; University of Edinburgh, UK ;; +;; Copyright (c) 1996,1997 ;; +;; All Rights Reserved. ;; +;; ;; +;; Permission is hereby granted, free of charge, to use and distribute ;; +;; this software and its documentation without restriction, including ;; +;; without limitation the rights to use, copy, modify, merge, publish, ;; +;; distribute, sublicense, and/or sell copies of this work, and to ;; +;; permit persons to whom this work is furnished to do so, subject to ;; +;; the following conditions: ;; +;; 1. The code must retain the above copyright notice, this list of ;; +;; conditions and the following disclaimer. ;; +;; 2. Any modifications must be clearly marked as such. ;; +;; 3. Original authors' names are not deleted. ;; +;; 4. The authors' names are not used to endorse or promote products ;; +;; derived from this software without specific prior written ;; +;; permission. ;; +;; ;; +;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;; THIS SOFTWARE. ;; +;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Author: Alan W Black +;;; Date: November 1997 +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Build a utterance from a number of stream label files including +;;; building the links between the stream items +;;; +;;; This used to be a shell script but that was just soooo slow +;;; and inflexible it was better to do it in Festival +;;; + +;;; Because this is a --script type file it has to explicitly +;;; load the initfiles: init.scm and user's .festivalrc +(if (not (symbol-bound? 'caar)) + (load (path-append libdir "init.scm"))) + +;;; Some parts are potentially editable +(defvar basic_relations '((Phrase segmental ()) + (Word segmental (Phrase)) + (Syllable segmental (Word) ) + (Segment segmental (Syllable)) + (IntEvent point (Syllable) ) + (Target point (Segment)) ;; virtually unused + ) + "The basic relations that exist and need to be combined into a single +utterance. Also their type, that is if they describe segments of +the utterance or points in the utterance.") + +(require 'tilt) + +(define (make_utts_help) + (format t "%s\n" + "make_utts [options] festival/relations/Segment/*.Segment + Build utterance forms for sets of stream_item labels + Options + -utt_dir + Directory where utterances will be saved (default + is festival/utts/) + -label_dir + The directory which contains subdirectories containing + label files for each relation, default is festival/relations/ + -style + What style of utterances, classic or unisyn + -tilt_events + IntEvent files are tilt event so uses special information + in the syllink feature to link to syllables. + -eval + Load in scheme file with run specific code, if file name + starts with a left parent the string itsefl is interpreted + -tokens + Overly non-general method to load in Tokens and markup + -pos + Do part of speech assignment + -phoneset + Specify the phoneset name, this is required for -tilt_events. +") + (quit)) + +;;; Default options values +(defvar seg_files nil) ;; files to build from +(defvar label_dir "festival/relations/") +(defvar style 'classic) +(defvar tilt_events nil) +(defvar with_tokens nil) +(defvar unisyn_build_with_silences t) +(defvar do_pos nil) +(defvar do_syn nil) +(defvar utt_dir "festival/utts/") + +;; may be redefined by user +(define (make_utts_user_function utt) utt) + +;;; Get options +(define (get_options) + (let ((files nil) + (o argv)) + (if (or (member_string "-h" argv) + (member_string "-help" argv) + (member_string "--help" argv) + (member_string "-?" argv)) + (make_utts_help)) + (while o + (begin + (cond + ((string-equal "-label_dir" (car o)) + (if (not (cdr o)) + (make_utts_error "no label_dir file specified")) + (set! label_dir (car (cdr o))) + (set! o (cdr o))) + ((string-equal "-utt_dir" (car o)) + (if (not (cdr o)) + (make_utts_error "no utt_dir file specified")) + (set! utt_dir (car (cdr o))) + (set! o (cdr o))) + ((string-equal "-phoneset" (car o)) + (if (not (cdr o)) + (make_utts_error "no phoneset specified")) + (load_library (string-append (car (cdr o)) "_phones.scm")) + (set! o (cdr o))) + ((string-equal "-eval" (car o)) + (if (not (cdr o)) + (make_utts_error "no file specified to load")) + (if (string-matches (car (cdr o)) "^(.*") + (eval (read-from-string (car (cdr o)))) + (load (car (cdr o)))) + (set! o (cdr o))) + ((string-equal "-tilt_events" (car o)) + (set! tilt_events t)) + ((string-equal "-style" (car o)) + (if (not (cdr o)) + (make_utts_error "no style specified")) + (set! style (car (cdr o))) + (set! o (cdr o))) + ((string-equal "-tokens" (car o)) + (set! with_tokens t)) + ((string-equal "-pos" (car o)) + (set! do_pos t)) + (t + (set! files (cons (car o) files)))) + (set! o (cdr o)))) + (if files + (set! seg_files (reverse files))))) + +(define (make_utts_error message) + (format stderr "%s: %s\n" "make_utts" message) + (make_utts_help)) + +;;; No gc messages +(gc-status nil) + +(define (make_utt name relations) + "(make_utt dir name relations) +Build an utterance from the stream_item label files in +dir/RELATION/name.RELATION and return it. This also creates +relations between each base relation." + (let (utt) + (cond + ((equal? 'classic style) + (set! utt (make_utt_classic name relations))) + ((equal? 'unisyn style) + (set! utt (make_utt_unisyn name relations))) + (t + (err "make_utts: unknown style" style))) + + (utt.set_feat utt "fileid" name) + (if do_pos + (find_pos utt)) + (if do_syn + (find_syn utt)) + + utt) +) + +;; These should probably be somewhere else +(defvar met_insertion 7) +(defvar met_deletion 7) +(defvar met_substitution 7) + +(define (make_utt_unisyn name relation) + "(make_utt_classic name relations) +Build utterance, with xml, metrical tree and tilt relations." + (let ((utt (Utterance nil nil))) + + (set! utt (wsi_build utt name label_dir tilt_events)) + + (add_xml_relation + utt (string-append label_dir "xml/rhet/" name ".xrhet")) + (add_xml_relation + utt (string-append label_dir "xml/syn/" name ".xsyn")) + (add_xml_relation + utt (string-append label_dir "xml/anaph/" name ".xana")) + (add_xml_relation + utt (string-append label_dir "xml/info/" name ".xinf")) + + (syntax_to_metrical_words utt) + (extend_tree + utt (list 'MetricalTree 'MetricalWord 'WordStructure 'Syllable)) + (extend_tree + utt (list 'ProsodicTree 'MetricalTree 'SylStructure 'Segment)) + + (add_match_features utt) + utt) +) + +(define (wsi_build utt file pathname do_il) + + (add_trans_word utt (string-append pathname "wrd/" file ".wrd")) + (add_trans_segment utt (string-append pathname "lab/" file ".lab")) + (if do_il + (add_trans_intonation utt (string-append pathname "tilt/" file ".tilt")) + nil + ) + utt +) + +(define (make_utt_classic name relations) + "(make_utt_classic name relations) +Build utterance in classic style for name and named relations." + (let (utt) + (if with_tokens + (set! utt (utt.load + nil + (string-append label_dir "/" "Token" "/" name ".Token"))) + (set! utt (Utterance Text nil))) + (set! current_utt utt) + (mapcar + (lambda (s) + (utt.relation.load + utt (car s) (string-append label_dir "/" + (car s) "/" name "." (car s)))) + relations) + ;; Now link them + (make_syl_structure utt) + (make_target_links utt) + (make_phrase_structure utt) + (if tilt_events + (tilt_link_syls utt) + (intevent_link_syls utt)) + + (if with_tokens + (relate_tokens_to_words utt)) + (make_utts_user_function utt) + utt) +) + +(define (find_pos utt) + "(find pos utt) +Assign part of speech using standard POS tagger. Also produce +standard reduced tagset in phr_pos. Irrelevant extra POS features +are removed. This assumes a POS tagger is set up at this point, +this can most easily be done be setting up a relevant voice." + (POS utt) + (mapcar + (lambda (w) + (item.set_feat + w "phr_pos" + (map_pos (item.feat w "pos") english_pos_map_wp39_to_wp20)) + (item.remove_feature w "pos_index") + (item.remove_feature w "pos_index_score") + ) + (utt.relation.items utt 'Word)) + utt) + +(define (map_pos pos map) + (cond + ((null map) pos) + ((member_string pos (car (car map))) + (car (cdr (car map)))) + (t + (map_pos pos (cdr map))))) + +(define (make_target_links utt) + "(make_target_links utt) +Make targets that fall within a segment. Targets contains all segments +and have that actual Targets as daughters." + (let ((targets (utt.relation.items utt 'Target)) + (segs (utt.relation.items utt 'Segment)) + tt1) + (utt.relation.create utt 'TTarget) + (mapcar + (lambda (tt) + (set! tt1 (utt.relation.append utt 'TTarget)) + ;; covert the target values to the newer naming convention + (item.set_feat tt1 "pos" (item.feat tt "end")) + (item.set_feat tt1 "f0" (parse-number (item.feat tt "name"))) + (item.relation.remove tt 'Target)) + targets) + (set! targets (utt.relation.items utt 'TTarget)) + (set! TARGSEGFACTOR 0.010) + (while segs + (utt.relation.append utt 'Target (car segs)) + (while (and targets (< (item.feat (car targets) "pos") + (+ (item.feat (car segs) "end") + TARGSEGFACTOR))) + (item.relation.append_daughter (car segs) 'Target (car targets)) + (set! targets (cdr targets))) + (set! segs (cdr segs))) + (utt.relation.delete utt 'TTarget) +)) + + +(define (make_phrase_structure utt) + "(make_phrase_structure utt) +Add words into phrases." + (let ((phrases (utt.relation.items utt 'Phrase)) + (words (utt.relation.items utt 'Word))) + (set! WORDPHRASEFACTOR 0.200) + (while phrases + (while (and words (< (item.feat (car words) 'end) + (+ (item.feat (car phrases) 'end) + WORDPHRASEFACTOR))) + (item.relation.append_daughter (car phrases) 'Phrase (car words)) + (set! words (cdr words))) + (set! phrases (cdr phrases))))) + +(define (relate_tokens_to_words utt) +"(relate_tokens_to_words utt) +A specific function for aligning the token stream to word stream." + (convert_token_stream utt) + (let ((tokens (utt.relation.items utt 'Token)) + (words (utt.relation.items utt 'Word))) + (link_tokens_words tokens words) + utt) +) + +(define (convert_token_stream utt) + "(convert_token_stream utt) +Replace Token Stream with Token relation. -- won't be needed when things +are properly converted." + (utt.relation.create utt 'Token) + (mapcar + (lambda (tok) + (utt.relation.append utt 'Token tok)) + (utt.stream utt 'Token)) + (utt.stream.delete utt 'Token) + ) + +(define (link_tokens_words tokens words) + "(link_tokens_words tokens words) +Advance through the tokens and words aligning them as required." + (cond + ((null words) + t) + ((null tokens) + (error (format nil "Extra words: %l\n" (mapcar item.name words)))) + ((or (string-equal "1" + (item.feat (car tokens) "punct-elem")) + (member_string (item.name (car tokens)) + '("(" ")"))) + (link_tokens_words (cdr tokens) words)) + ((string-equal "SPEECH-OMITTED" + (item.feat (car tokens) "R:SOLEML.parent.TYPE")) + (link_tokens_words (cdr tokens) words)) + ((and (string-matches (item.name (car words)) ".*'.*") + (string-equal "APOSTROPHE" + (item.feat (car tokens) "R:Token.n.TYPE"))) + (item.relation.append_daughter (car tokens) 'Token (car words)) + (item.relation.append_daughter (car (cdr tokens)) 'Token (car words)) + (if (string-matches (item.name (car words)) ".*'") + (link_tokens_words (cdr (cdr tokens)) (cdr words)) + (begin + (item.relation.append_daughter + (car (cdr (cdr tokens))) 'Token (car words)) + (link_tokens_words (cdr (cdr (cdr tokens))) (cdr words))))) + ((string-equal (downcase (item.name (car tokens))) + (downcase (item.name (car words)))) + (item.relation.append_daughter (car tokens) 'Token (car words)) + (link_tokens_words (cdr tokens) (cdr words))) + ;; there going to be more here !!! + (t + (error (format nil "Mismatch of tokens and words \n %l\n %l\n" + (mapcar item.name tokens) + (mapcar item.name words)))))) + +(define (do_utt name) + (let ((utt (make_utt name basic_relations))) + (utt.save utt (string-append utt_dir "/" name ".utt") 'est_ascii) + t)) + +(define (make_syl_structure utt) + "(make_syl_structure utt) +Make SylStructure relation linking Words, Syllables and Segments." + (let ((words (utt.relation.items utt 'Word)) + (syls (utt.relation.items utt 'Syllable)) + (segs (utt.relation.items utt 'Segment))) + (set! SYLWORDFACTOR 0.025) + (set! SEGSYLFACTOR 0.02) + (utt.relation.create utt 'SylStructure) + (while words + (utt.relation.append utt 'SylStructure (car words)) + (while (and syls (< (item.feat (car syls) 'end) + (+ (item.feat (car words) 'end) + SYLWORDFACTOR))) + (item.relation.append_daughter (car words) 'SylStructure (car syls)) + (while (and segs (< (item.feat (car segs) 'end) + (+ (item.feat (car syls) 'end) + SEGSYLFACTOR))) + (if (not (phone_is_silence (item.name (car segs)))) + (item.relation.append_daughter + (car syls) 'SylStructure (car segs))) + (set! segs (cdr segs))) + (set! syls (cdr syls))) + (set! words (cdr words))))) + +(define (tilt_link_syls utt) +"(tilt_link_syls utt) +Link syls to IntEvents, for Tilt. In this case the feature syllink +specifies the word.sylnum that the event should be linked to." + (let ((syls (utt.relation.items utt 'Syllable))) + (utt.relation.create utt 'Intonation) + (mapcar + (lambda (ie) + (let ((name (item.name ie)) + (syllink (item.feat ie "syllink")) + syl) + (cond + ((member_string name '("phrase_start" "phrase_end")) + ;; relate this IntEvent to silence segment +; (if (string-equal name "phrase_start") +; (set! syl (find_ie_phrase_syl utt ie 'syllable_start)) +; (set! syl (find_ie_phrase_syl utt ie 'syllable_end))) +; (utt.relation.append utt 'Intonation syl) +; (item.relation.append_daughter syl 'Intonation ie) + ) + ((and (string-equal (item.feat ie "int_event") "1") + (set! syl (find_related_syl utt syls syllink))) + (if (not (member 'Intonation (item.relations syl))) + (utt.relation.append utt 'Intonation syl)) + (item.relation.append_daughter syl 'Intonation ie) + (set_rel_peak_pos utt ie syl))))) + (utt.relation.items utt 'IntEvent)) ;; the IntEvents + )) + +(define (intevent_link_syls utt) +"(intevent_link_syls utt) +Non-tilt link of syllables to intevents through the Intonation relation." + (let ((syls (utt.relation.items utt 'Syllable))) + (utt.relation.create utt 'Intonation) + (mapcar + (lambda (ie) + (let ((syl (find_container_syl ie syls))) + (if (not (member 'Intonation (item.relations syl))) + (utt.relation.append utt 'Intonation syl)) + (item.relation.append_daughter syl 'Intonation ie))) + (utt.relation.items utt 'IntEvent)) ;; the IntEvents + )) + +(define (find_container_syl ie syls) + "(find_container_syl ie syls) +Find the syl thats cloests to the time on this ie." + (let ((pos (item.feat ie 'end)) + (ss syls) + syl) + (while (and ss (not syl)) + (let ((ss_start (item.feat (car ss) 'syllable_start)) + (ss_end (item.feat (car ss) 'syllable_end))) + (if (and (> pos ss_start) + (< pos (+ ss_end 0.030))) + (set! syl (car ss))) + (set! ss (cdr ss)))) + (if (not syl) + (error "Failed to find related syllable for IntEvent at" pos)) + syl)) + +(define (find_ie_phrase_syl utt ie direction) +"(find_ie_phrase_syl utt ie pos direction) +Find the syllable that should be related to this IntEvent. +As at this stage no real relations can be relied on this blindly +searches the Syllable stream for a segment at the right time +point." + (let ((syls (utt.relation.items utt 'Syllable)) + (pos (item.feat ie 'position)) + syl) + (while (and syls (not syl)) + (if (or (approx-equal? pos (item.feat (car syls) direction) 0.04) + (and (not (item.relation.next ie 'IntEvent)) + (not (cdr syls)))) + (set! syl (car syls))) + (set! syls (cdr syls))) + (if (not syl) + (error "Failed to find related syllable for phrase IntEvent at" pos)) + syl)) + +(define (set_rel_peak_pos utt ie syl) +"(set_rel_peak_pos ie syl) +Set the feature tilt:rel_pos to the distance from the start of +of the vowel in syl" + (item.set_feat + ie + "tilt:rel_pos" + (- (- (item.feat ie 'end) + (* (- 1.0 (item.feat ie 'tilt:tilt)) + (item.feat ie 'tilt:dur) + 0.5)) + (syl_vowel_start syl)))) + +(define (find_related_syl utt syls syllink) +"(find_related_syl utt syls syllink) +Find the syllable name by sylllink, which is of the form x[.y]. +x the word number and y is the syllable number." + (unwind-protect + (let (wordlab sylnum word syls syl) + (if (string-matches syllink ".*\\..*") + (begin + (set! wordlab (string-before syllink ".")) + (set! sylnum (- (parse-number (string-after syllink ".")) 1))) + (begin + (set! wordlab syllink) + (set! sylnum 0))) + (set! word (find_word_labelled + utt (utt.relation.items utt 'Word) wordlab)) + (if (not word) + (error "Failed to find word labelled:" wordlab)) + (set! syls (item.relation.daughters word 'SylStructure)) + (set! syl (nth sylnum syls)) + (if syl + syl + (car (last syls)))) + (begin + (error "Failed to find syllable labelled:" syllink)))) + +(define (find_word_labelled utt words lab) +"(find_word_labelled words lab) +Find the word whose label is lab." + (cond + ((null words) nil) + ((string-equal lab (item.feat (car words) "wordlab")) + (car words)) + (t + (find_word_labelled utt (cdr words) lab)))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; The main work +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +(define (main) + (get_options) + + (mapcar + (lambda (f) + (format t "%s\n" f) + (unwind-protect + (do_utt (path-basename f)) + (format stderr "utterance build or save failed\n"))) + seg_files)) + +(main) diff --git a/examples/powmeanstd.sh b/examples/powmeanstd.sh new file mode 100644 index 0000000..12ba13a --- /dev/null +++ b/examples/powmeanstd.sh @@ -0,0 +1,178 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;-*-mode:scheme-*- +;; ;; +;; Centre for Speech Technology Research ;; +;; University of Edinburgh, UK ;; +;; Copyright (c) 1996,1997 ;; +;; All Rights Reserved. ;; +;; ;; +;; Permission is hereby granted, free of charge, to use and distribute ;; +;; this software and its documentation without restriction, including ;; +;; without limitation the rights to use, copy, modify, merge, publish, ;; +;; distribute, sublicense, and/or sell copies of this work, and to ;; +;; permit persons to whom this work is furnished to do so, subject to ;; +;; the following conditions: ;; +;; 1. The code must retain the above copyright notice, this list of ;; +;; conditions and the following disclaimer. ;; +;; 2. Any modifications must be clearly marked as such. ;; +;; 3. Original authors' names are not deleted. ;; +;; 4. The authors' names are not used to endorse or promote products ;; +;; derived from this software without specific prior written ;; +;; permission. ;; +;; ;; +;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;; THIS SOFTWARE. ;; +;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Author: Alan W Black +;;; Date: December 1997 +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Find the means and standard deviations of the power of all +;;; phones in the given utterances. +;;; + +;;; Because this is a --script type file it has to explicitly +;;; load the initfiles: init.scm and user's .festivalrc +(load (path-append libdir "init.scm")) + +(define (powmeanstd_help) + (format t "%s\n" + "powmeanstd [options] festival/utts/*.utts + Find means and standard deviation of phone power in utterances + Options + -output + File to save output in + -log + Take log of power first +") + (quit)) + +;;; Default options values +(defvar utt_files nil) +(defvar outfile "pow.meanstd") +(defvar log_domain nil) + +;;; Get options +(define (get_options) + (let ((files nil) + (o argv)) + (if (or (member_string "-h" argv) + (member_string "-help" argv) + (member_string "--help" argv) + (member_string "-?" argv)) + (powmeanstd_help)) + (while o + (begin + (cond + ((string-equal "-output" (car o)) + (if (not (cdr o)) + (powmeanstd_error "no output file specified")) + (set! outfile (car (cdr o))) + (set! o (cdr o))) + ((string-equal "-log" (car o)) + (set! log_domain t)) + (t + (set! files (cons (car o) files)))) + (set! o (cdr o)))) + (if files + (set! utt_files (reverse files))))) + +(define (powmeanstd_error message) + (format stderr "%s: %s\n" "powmeanstd" message) + (powmeanstd_help)) + +;;; No gc messages +(gc-status nil) + +;;; A simple sufficient statistics class +(define (suffstats.new) + (list + 0 ;; n + 0 ;; sum + 0 ;; sumx + )) + +(define (suffstats.set_n x n) + (set-car! x n)) +(define (suffstats.set_sum x sum) + (set-car! (cdr x) sum)) +(define (suffstats.set_sumx x sumx) + (set-car! (cdr (cdr x)) sumx)) +(define (suffstats.n x) + (car x)) +(define (suffstats.sum x) + (car (cdr x))) +(define (suffstats.sumx x) + (car (cdr (cdr x)))) +(define (suffstats.reset x) + (suffstats.set_n x 0) + (suffstats.set_sum x 0) + (suffstats.set_sumx x 0)) +(define (suffstats.add x d) + (suffstats.set_n x (+ (suffstats.n x) 1)) + (suffstats.set_sum x (+ (suffstats.sum x) d)) + (suffstats.set_sumx x (+ (suffstats.sumx x) (* d d))) +) + +(define (suffstats.mean x) + (/ (suffstats.sum x) (suffstats.n x))) +(define (suffstats.variance x) + (/ (- (* (suffstats.n x) (suffstats.sumx x)) + (* (suffstats.sum x) (suffstats.sum x))) + (* (suffstats.n x) (- (suffstats.n x) 1)))) +(define (suffstats.stddev x) + (sqrt (suffstats.variance x))) + +;;; Index for each phone +(defvar phonelist nil) ;; index of phone to suffstats +(define (get_phone_data phone) + (let ((a (car (cdr (assoc phone phonelist))))) + (if a + a + (begin ;; first time for this phone + (set! phonelist + (cons + (list phone (suffstats.new)) + phonelist)) + (car (cdr (assoc phone phonelist))))))) + +(define (cummulate_seg_pow utt_name) + (let ((utt (utt.load nil utt_name))) + (mapcar + (lambda (s) + (suffstats.add + (get_phone_data (item.name s)) + (if log_domain + (log (item.feat s "power")) + (item.feat s "power")))) + (utt.relation.items utt 'Segment)))) + +(define (output_pow_data data outfile) + (let ((fd (fopen outfile "w"))) + (mapcar + (lambda (d) + (format fd "(%s %f %f)\n" + (car d) + (suffstats.mean (car (cdr d))) + (suffstats.stddev (car (cdr d))))) + data) + (fclose fd))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; The main work +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +(define (main) + (get_options) + + (mapcar cummulate_seg_pow utt_files) + (output_pow_data phonelist outfile) +) + +(main) diff --git a/examples/run-festival-script.sh b/examples/run-festival-script.sh new file mode 100755 index 0000000..b384c6e --- /dev/null +++ b/examples/run-festival-script.sh @@ -0,0 +1,47 @@ +#!/bin/sh +#####################################################-*-mode:shell-script-*- +## ## +## Carnegie Mellon University +## Copyright (c) 2005 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## CARNEGIE MELLON UNIVERSITY AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL CARNEGIE MELLON UNIVERSITY NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## Author: Alan W Black ## +## Date: Nov 2008 ## +########################################################################### +## Run a festival script with an explicit libpath so an implementation ## +## can be moved from where it was compiled and the scripts can still be ## +## be used ## +########################################################################### + +FESTIVAL=$1 +FESTLIBDIR=$2 +SCRIPT=$3 +shift +shift +shift +exec $FESTIVAL --libdir $FESTLIBDIR --script $3 $* diff --git a/examples/saytime.sh b/examples/saytime.sh new file mode 100644 index 0000000..3efc2cf --- /dev/null +++ b/examples/saytime.sh @@ -0,0 +1,158 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;-*-mode:scheme-*- +;; ;; +;; Centre for Speech Technology Research ;; +;; University of Edinburgh, UK ;; +;; Copyright (c) 1996,1997 ;; +;; All Rights Reserved. ;; +;; ;; +;; Permission is hereby granted, free of charge, to use and distribute ;; +;; this software and its documentation without restriction, including ;; +;; without limitation the rights to use, copy, modify, merge, publish, ;; +;; distribute, sublicense, and/or sell copies of this work, and to ;; +;; permit persons to whom this work is furnished to do so, subject to ;; +;; the following conditions: ;; +;; 1. The code must retain the above copyright notice, this list of ;; +;; conditions and the following disclaimer. ;; +;; 2. Any modifications must be clearly marked as such. ;; +;; 3. Original authors' names are not deleted. ;; +;; 4. The authors' names are not used to endorse or promote products ;; +;; derived from this software without specific prior written ;; +;; permission. ;; +;; ;; +;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;; THIS SOFTWARE. ;; +;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Author: Alan W Black +;;; Date: wasting time one August morning in 1996 +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Here is a short example of a Festival program that speaks the +;;; current time. It uses UNIX date to get the time then builds +;;; a string with an expression of the current time. +;;; +;;; The string generated for synthesis is of the form +;;; The time is now +;;; + +;;; Because this is a --script type file I has to explicitly +;;; load the initfiles: init.scm and user's .festivalrc +(load (path-append libdir "init.scm")) + +(define (get-the-time) +"Returns a list of hour and minute and second, for later processing" + (let (date) + (system "date | awk '{print $4}' | tr : ' ' >/tmp/saytime.tmp") + (set! date (load "/tmp/saytime.tmp" t)) ;; loads the file unevaluated + (system "rm /tmp/saytime.tmp") + date) +) + +(define (round-up-time time) +"Rounds time up/down to nearest five minute interval" + (let ((hour (car time)) + (min (car (cdr time))) + (sec (car (cdr (cdr time))))) + (set! min (round-min (+ 2 min))) + (list hour min sec))) + +(define (round-min min) +"Returns minutes rounded down to nearest 5 minute interval" + (cond + ((< min 5) + 0) + (t + (+ 5 (round-min (- min 5)))))) + +(define (approx time) +"Returns a string stating the approximation of the time. + exactly -- within a minute either side + almost -- 1-2 minutes before + just after - 1-2 minutes after + a little after 2-3 minutes after +" + (let ((rm (round-min (car (cdr time)))) + (min (car (cdr time)))) + (cond + ((or (< (- min rm) 1) + (> (- min rm) 3)) + "exactly ") + ((< (- min rm) 2) + "just after ") + ((< (- min rm) 3) + "a little after ") + (t + "almost ")))) + +(define (hour-string time) +"Return description of hour" + (let ((hour (car time))) + (if (> (car (cdr time)) 30) + (set! hour (+ 1 hour))) + (cond + ((or (eq hour 0) (eq hour 24)) + "midnight ") + ((> hour 12) + (string-append (- hour 12) ", ")) + (t + (string-append hour ", "))))) + +(define (minute-string time) +"Return description of minute" + (let ((min (car (cdr time)))) + (cond + ((or (eq min 0) (eq min 60)) " ") + ((eq min 5) "five past ") + ((eq min 10) "ten past ") + ((eq min 15) "quarter past ") + ((eq min 20) "twenty past ") + ((eq min 25) "twenty-five past ") + ((eq min 30) "half past ") + ((eq min 35) "twenty-five to ") + ((eq min 40) "twenty to ") + ((eq min 45) "quarter to ") + ((eq min 50) "ten to ") + ((eq min 55) "five to ") + (t + "something else ")))) + +(define (ampm-string time) +"Return morning/afternoon or evening string" + (let ((hour (car time))) + (cond + ((or (eq hour 0) (eq hour 12) (eq hour 24)) + " ") + ((< hour 12) + "in the morning. ") + ((< hour 18) + "in the afternoon. ") + (t + "in the evening. ")))) + +;;; +;;; Now with all the functions defined we can get the time +;;; +(set! actual-time (get-the-time)) +(set! round-time (round-up-time actual-time)) + +;;; Construct the time expression +(set! time-string + (string-append + "The time is now, " + (approx actual-time) + (minute-string round-time) + (hour-string round-time) + (ampm-string round-time))) + +(format t "%s\n" time-string) + +;;; Synthesize it +(SayText time-string) + diff --git a/examples/scfg_parse_text.sh b/examples/scfg_parse_text.sh new file mode 100644 index 0000000..028fa7f --- /dev/null +++ b/examples/scfg_parse_text.sh @@ -0,0 +1,147 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;-*-mode:scheme-*- +;; ;; +;; Centre for Speech Technology Research ;; +;; University of Edinburgh, UK ;; +;; Copyright (c) 1996,1997 ;; +;; All Rights Reserved. ;; +;; ;; +;; Permission is hereby granted, free of charge, to use and distribute ;; +;; this software and its documentation without restriction, including ;; +;; without limitation the rights to use, copy, modify, merge, publish, ;; +;; distribute, sublicense, and/or sell copies of this work, and to ;; +;; permit persons to whom this work is furnished to do so, subject to ;; +;; the following conditions: ;; +;; 1. The code must retain the above copyright notice, this list of ;; +;; conditions and the following disclaimer. ;; +;; 2. Any modifications must be clearly marked as such. ;; +;; 3. Original authors' names are not deleted. ;; +;; 4. The authors' names are not used to endorse or promote products ;; +;; derived from this software without specific prior written ;; +;; permission. ;; +;; ;; +;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;; THIS SOFTWARE. ;; +;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Author: Alan W Black +;;; Date: October 1997 +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Parse arbitrary text using the given SCFG. +;;; +;;; Tokenizes given text file and runs the part of speech tagger on it +;;; Parses it with respect to a grammar trained from the UPenn WSJ +;;; tree bank (you may optionall specify a different grammar). +;;; +;;; This may be slow for long sentences as there is |w|^3 factor +;;; involved in parsing algorithm. + +;;; Because this is a --script type file I has to explicitly +;;; load the initfiles: init.scm and user's .festivalrc +(load (path-append libdir "init.scm")) + +(require 'scfg) + +;;; Process command line arguments +(define (scfg_parse_text_help) + (format t "%s\n" + "scfg_parse_text [options] textfile + Parse arbitrary text + Options + -o ofile File to save parses (default is stdout). + -grammar ifile Alternative grammar, default uses standard grammar + from Festival distribution. + -full_parse Output full parse with probabilities rather than + simplified form (which is the default).") + (quit)) + +;;; No gc messages +(gc-status nil) + +;;; Default argument values +(defvar grammarfile (path-append libdir "scfg_wsj_wp20.gram")) +(defvar outfile "-") +(defvar outfd t) +(defvar parse_type 'brackets_only) +(defvar text_files '("-")) + +;;; Get options +(define (get_options) + + (let ((files nil) + (o argv)) + (if (or (member_string "-h" argv) + (member_string "-help" argv) + (member_string "--help" argv) + (member_string "-?" argv)) + (scfg_parse_text_help)) + (while o + (begin + (cond + ((string-equal "-o" (car o)) + (if (not (cdr o)) + (scfg_error "no output file specified")) + (set! outfile (car (cdr o))) + (set! outfd (fopen outfile "w")) + (set! o (cdr o))) + ((string-equal "-grammar" (car o)) + (if (not (cdr o)) + (scfg_error "no grammar file specified")) + (set! grammarfile (car (cdr o))) + (set! o (cdr o))) + ((string-equal "-full_parse" (car o)) + (set! parse_type 'full_parse)) + (t + (set! files (cons (car o) files)))) + (set! o (cdr o)))) + (if files + (set! text_files (reverse files))))) + +(define (scfg_error message) + (format stderr "%s: %s\n" "scfg_parse_text" message) + (scfg_parse_text_help)) + +;;; Functions that do the work +(define (find-parse utt) +"Main function for processing TTS utterances. Tokenizes, predicts POS and +then parses." + (Token utt) + (POS utt) + (Phrasify utt) ;; cause it maps the POS tags + (ProbParse utt) +) + +(define (output-parse utt) +"Output the parse tree for each utt" + (if (equal? parse_type 'brackets_only) + (pprintf (scfg_simplify_relation_tree + (utt.relation_tree utt 'Syntax)) outfd) + (pprintf (utt.relation_tree utt 'Syntax) outfd)) + (format outfd "\n") + utt) + +;;; +;;; Redefine what happens to utterances during text to speech +;;; +(set! tts_hooks (list find-parse output-parse)) + +(define (main) + (get_options) + + ;; Load the grammar + (set! scfg_grammar (load grammarfile t)) + + ;; Parse the files + (mapcar + (lambda (f) (tts_file f)) + text_files)) + +;;; Do the work +(main) diff --git a/examples/songs/Makefile b/examples/songs/Makefile new file mode 100644 index 0000000..c3623c8 --- /dev/null +++ b/examples/songs/Makefile @@ -0,0 +1,44 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +TOP=../.. +DIRNAME=examples/songs +BUILD_DIRS= +ALL_DIRS=$(BUILD_DIRS) + +SONGS = america1.xml america2.xml america3.xml america4.xml \ + daisy.xml doremi.xml lochlomond.xml \ + spice1.xml spice2.xml spice3.xml spice4.xml +FILES=Makefile $(SONGS) + +include $(TOP)/config/common_make_rules + diff --git a/examples/songs/america1.xml b/examples/songs/america1.xml new file mode 100644 index 0000000..e52e987 --- /dev/null +++ b/examples/songs/america1.xml @@ -0,0 +1,16 @@ + + + +oh +beautiful +for +spacious +skies +for +amber +waves +of +grain + diff --git a/examples/songs/america2.xml b/examples/songs/america2.xml new file mode 100644 index 0000000..5642df1 --- /dev/null +++ b/examples/songs/america2.xml @@ -0,0 +1,16 @@ + + + +oh +beautiful +for +spacious +skies +for +amber +waves +of +grain + diff --git a/examples/songs/america3.xml b/examples/songs/america3.xml new file mode 100644 index 0000000..67e036c --- /dev/null +++ b/examples/songs/america3.xml @@ -0,0 +1,16 @@ + + + +oh +beautiful +for +spacious +skies +for +amber +waves +of +grain + diff --git a/examples/songs/america4.xml b/examples/songs/america4.xml new file mode 100644 index 0000000..8f9e021 --- /dev/null +++ b/examples/songs/america4.xml @@ -0,0 +1,16 @@ + + + +oh +beautiful +for +spacious +skies +for +amber +waves +of +grain + diff --git a/examples/songs/daisy.xml b/examples/songs/daisy.xml new file mode 100644 index 0000000..48682e1 --- /dev/null +++ b/examples/songs/daisy.xml @@ -0,0 +1,49 @@ + + + +Daisy +Daisy +Give +me +your +answer +do + +I'm +half +crazy +all +for +the +love +of +you + +we +don't +need +a +stylish +marriage +I +can't +afford +a +carriage + +but +you'll +look +sweet +upon +the +seat +of +a +bicycle +built +for +two + diff --git a/examples/songs/doremi.xml b/examples/songs/doremi.xml new file mode 100644 index 0000000..7968a72 --- /dev/null +++ b/examples/songs/doremi.xml @@ -0,0 +1,14 @@ + + + +doe +ray +me +fah +sew +lah +tee +doe + diff --git a/examples/songs/lochlomond.xml b/examples/songs/lochlomond.xml new file mode 100644 index 0000000..b4068e6 --- /dev/null +++ b/examples/songs/lochlomond.xml @@ -0,0 +1,47 @@ + + + +Oh +you'll +take +the +high +road +and +I'll +take +the +low +road + +and +I'll +be +in +Scotland +before +ye + +but +me +and +my +true +love + +will +never +meet +again +on +the +bonnie +bonnie +banks +of +lock +lahman + + diff --git a/examples/songs/spice1.xml b/examples/songs/spice1.xml new file mode 100644 index 0000000..3f319af --- /dev/null +++ b/examples/songs/spice1.xml @@ -0,0 +1,29 @@ + + + +if +you +wanna +be +my +lover + +you +gotta +get +with +my +friends + +make +it +last +forever + +friendship +never +in +n + diff --git a/examples/songs/spice2.xml b/examples/songs/spice2.xml new file mode 100644 index 0000000..0faf9e2 --- /dev/null +++ b/examples/songs/spice2.xml @@ -0,0 +1,31 @@ + + + +if +you +wanna +be +my +lover + +you +gotta +get +with +my + +gotta +get +with +my +friends + +forever + +friendship +never +in +n + diff --git a/examples/songs/spice3.xml b/examples/songs/spice3.xml new file mode 100644 index 0000000..6ad3865 --- /dev/null +++ b/examples/songs/spice3.xml @@ -0,0 +1,29 @@ + + + +if +you +wanna +be +my +lover + +you +gotta +get +with +my +friends + +make +it +last +forever + +friendship +never +in +n + diff --git a/examples/songs/spice4.xml b/examples/songs/spice4.xml new file mode 100644 index 0000000..096e70e --- /dev/null +++ b/examples/songs/spice4.xml @@ -0,0 +1,29 @@ + + + +if +you +wanna +be +my +lover + +you +gotta +get +with +my +friends + +make +it +last +forever + +friendship +never +in +n + diff --git a/examples/speech_pm_1.0.tar b/examples/speech_pm_1.0.tar new file mode 100644 index 0000000..e073361 Binary files /dev/null and b/examples/speech_pm_1.0.tar differ diff --git a/examples/spintro.text b/examples/spintro.text new file mode 100644 index 0000000..7da306d --- /dev/null +++ b/examples/spintro.text @@ -0,0 +1,4 @@ + +Bienvenido a Festival, nuestro conversor de texto a voz. Festival +es un conversor multilenguaje de texto a voz, desarrollado en la +Universidad de Edimburgo. diff --git a/examples/text2pos.sh b/examples/text2pos.sh new file mode 100644 index 0000000..cef5732 --- /dev/null +++ b/examples/text2pos.sh @@ -0,0 +1,83 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;-*-mode:scheme-*- +;; ;; +;; Centre for Speech Technology Research ;; +;; University of Edinburgh, UK ;; +;; Copyright (c) 1996,1997 ;; +;; All Rights Reserved. ;; +;; ;; +;; Permission is hereby granted, free of charge, to use and distribute ;; +;; this software and its documentation without restriction, including ;; +;; without limitation the rights to use, copy, modify, merge, publish, ;; +;; distribute, sublicense, and/or sell copies of this work, and to ;; +;; permit persons to whom this work is furnished to do so, subject to ;; +;; the following conditions: ;; +;; 1. The code must retain the above copyright notice, this list of ;; +;; conditions and the following disclaimer. ;; +;; 2. Any modifications must be clearly marked as such. ;; +;; 3. Original authors' names are not deleted. ;; +;; 4. The authors' names are not used to endorse or promote products ;; +;; derived from this software without specific prior written ;; +;; permission. ;; +;; ;; +;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;; THIS SOFTWARE. ;; +;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Author: Alan W Black +;;; Date: August 1996 +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Reads in text from stdin and outputs text/pos on stdout +;;; +;;; Designed to show how simple filters can be written in Festival +;;; +;;; First we defined a function that processes an utterance enough +;;; to predict part of speech, namely, tokenize it, find the words +;;; and then run the POS tagger on it. +;;; Then we define a function to extract the word and pos tag itself +;;; +;;; We redefine the basic functions run on utterances during text to +;;; speech to be our two newly-defined function and then simply +;;; run tts on standard input. +;;; + +;;; Because this is a --script type file I has to explicitly +;;; load the initfiles: init.scm and user's .festivalrc +(load (path-append libdir "init.scm")) + +(define (find-pos utt) +"Main function for processing TTS utterances. Predicts POS and +prints words with their POS" + (Token utt) + (POS utt) +) + +(define (output-pos utt) +"Output the word/pos for each word in utt" + (mapcar + (lambda (word) + (format t "%l/%l\n" + (item.feat word "name") + (item.feat word "pos"))) + (utt.relation.items utt 'Word))) + +;;; +;;; Redefine what happens to utterances during text to speech +;;; +(set! tts_hooks (list find-pos output-pos)) + +;;; Stop those GC messages +(gc-status nil) + +;;; Do the work +(tts_file "-") + + + diff --git a/examples/text2wave.sh b/examples/text2wave.sh new file mode 100755 index 0000000..7f91200 --- /dev/null +++ b/examples/text2wave.sh @@ -0,0 +1,176 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;-*-mode:scheme-*- +;; ;; +;; Centre for Speech Technology Research ;; +;; University of Edinburgh, UK ;; +;; Copyright (c) 1996,1997 ;; +;; All Rights Reserved. ;; +;; ;; +;; Permission is hereby granted, free of charge, to use and distribute ;; +;; this software and its documentation without restriction, including ;; +;; without limitation the rights to use, copy, modify, merge, publish, ;; +;; distribute, sublicense, and/or sell copies of this work, and to ;; +;; permit persons to whom this work is furnished to do so, subject to ;; +;; the following conditions: ;; +;; 1. The code must retain the above copyright notice, this list of ;; +;; conditions and the following disclaimer. ;; +;; 2. Any modifications must be clearly marked as such. ;; +;; 3. Original authors' names are not deleted. ;; +;; 4. The authors' names are not used to endorse or promote products ;; +;; derived from this software without specific prior written ;; +;; permission. ;; +;; ;; +;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;; THIS SOFTWARE. ;; +;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Author: Alan W Black +;;; Date: November 1997 +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Text to a single waveform like festival_client but without +;;; starting hte server +;;; + +;;; Because this is a --script type file I has to explicitly +;;; load the initfiles: init.scm and user's .festivalrc +(load (path-append libdir "init.scm")) + +;;; Process command line arguments +(define (text2wave_help) + (format t "%s\n" + "text2wave [options] textfile + Convert a textfile to a waveform + Options + -mode Explicit tts mode. + -o ofile File to save waveform (default is stdout). + -otype Output waveform type: ulaw, snd, aiff, riff, nist etc. + (default is riff) + -F Output frequency. + -scale Volume factor + -eval File or lisp s-expression to be evaluated before + synthesis. +") + (quit)) + +;;; No gc messages +(gc-status nil) + +;;; Default argument values +(defvar outfile "-") +(defvar output_type 'riff) +(defvar frequency nil) ;; default is no frequency modification +(defvar text_files '("-")) +(defvar mode nil) +(defvar volume "1.0") +(defvar wavefiles nil) + +;;; Get options +(define (get_options) + + (let ((files nil) + (o argv)) + (if (or (member_string "-h" argv) + (member_string "-help" argv) + (member_string "--help" argv) + (member_string "-?" argv)) + (text2wave_help)) + (while o + (begin + (cond + ((string-equal "-o" (car o)) + (if (not (cdr o)) + (text2wave_error "no output file specified")) + (set! outfile (car (cdr o))) + (set! o (cdr o))) + ((string-equal "-otype" (car o)) + (if (not (cdr o)) + (text2wave_error "no output filetype specified")) + (set! output_type (car (cdr o))) + (set! o (cdr o))) + ((or (string-equal "-f" (car o)) ;; for compatibility and memory loss + (string-equal "-F" (car o))) + (if (not (cdr o)) + (text2wave_error "no frequency specified")) + (set! frequency (car (cdr o))) + (set! o (cdr o))) + ((string-equal "-scale" (car o)) + (if (not (cdr o)) + (text2wave_error "no scale specified")) + (set! volume (car (cdr o))) + (set! o (cdr o))) + ((string-equal "-mode" (car o)) + (if (not (cdr o)) + (text2wave_error "no mode specified")) + (set! mode (car (cdr o))) + (set! o (cdr o))) + ((string-equal "-eval" (car o)) + (if (not (cdr o)) + (text2wave_error "no file specified to load")) + (if (string-matches (car (cdr o)) "^(.*") + (eval (read-from-string (car (cdr o)))) + (load (car (cdr o)))) + (set! o (cdr o))) + (t + (set! files (cons (car o) files)))) + (set! o (cdr o)))) + (if files + (set! text_files (reverse files))))) + +(define (text2wave_error message) + (format stderr "%s: %s\n" "text2wave" message) + (text2wave_help)) + +(define (save_record_wave utt) +"Saves the waveform and records its so it can be joined into a +a single waveform at the end." + (let ((fn (make_tmp_filename))) + (utt.save.wave utt fn) + (set! wavefiles (cons fn wavefiles)) + utt)) + +(define (combine_waves) + "Join all the waves together into the desired output file +and delete the intermediate ones." + (let ((wholeutt (utt.synth (Utterance Text "")))) + (mapcar + (lambda (d) + (utt.import.wave wholeutt d t) + (delete-file d)) + (reverse wavefiles)) + (if frequency + (utt.wave.resample wholeutt (parse-number frequency))) + (if (not (equal? volume "1.0")) + (begin + (utt.wave.rescale wholeutt (parse-number volume)))) + (utt.save.wave wholeutt outfile output_type) + )) + +;;; +;;; Redefine what happens to utterances during text to speech +;;; +(set! tts_hooks (list utt.synth save_record_wave)) + +(define (main) + (get_options) + + ;; do the synthesis + (mapcar + (lambda (f) + (if mode + (tts_file f mode) + (tts_file f (tts_find_text_mode f auto-text-mode-alist)))) + text_files) + + ;; Now put the waveforms together at again + (combine_waves) +) + +;;; Do the work +(main) diff --git a/examples/th-mode.scm b/examples/th-mode.scm new file mode 100644 index 0000000..a83127f --- /dev/null +++ b/examples/th-mode.scm @@ -0,0 +1,180 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; This contains an example of how to use Festival with a +;;; a talking head. This is a combination of various answers given +;;; to different groups who have been using Festival with talking +;;; heads +;;; +;;; This version has not actually been used by any talking head +;;; but just serves as an example. +;;; +;;; The basic mode produces /tmp/th.ph (phone info) and /tmp/th.com +;;; (commands: smile frown) for each utterance in the file. These +;;; files are produced and the program makefaces called before +;;; waveform synthesis for each utterance. The play command then +;;; calls xanim with the generate animation and waveform. +;;; +;;; There are probabaly better way to do this. Using Festival as a +;;; server to generate the phone and command files might +;;; be more reasonable. Note festival not supports the returning +;;; of Lisp data to the client as well as waveform data. +;;; In that case you'd want to change th_output_info to use +;;; the send_cleint command and package the phone info into an +;;; s-expression. + +(defvar th-prepare-prog "makefaces" + " A program that takes phones and other data and produces the +animated face.") + +(define (utt.save.phonedata utt filename) +"(utt.save.mydata UTT FILE) + Saves phone, duration, stress, F0 word pos." + (let ((fd (fopen filename "w"))) + (mapcar + (lambda (seg) + (format fd "%s %2.4f %s %s" + (item.feat seg "name") + (item.feat seg "segment_duration") + (item.feat seg "R:SylStructure.parent.stress") + (item.feat seg "R:Target.daughter1.name")) + ;; output word name and part of speech if start of word + (if (and (not (item.relation.next seg "SylStructure")) + (not (item.next + (item.relation.parent seg "SylStructure")))) + (format fd " %s %s" + (item.feat seg "R:SylStructure.parent.parent.name") + (item.feat seg "R:SylStructure.parent.parent.pos"))) + (format fd "\n")) + (utt.relation.items utt 'Segment)) + (fclose fd) + utt)) + +(define (utt.save.commands utt filename) +"(utt.save.commands UTT FILE) + Save commands with time stamps. Commands are those tokens which +start and end with an asterisk." + (let ((fd (fopen filename "w"))) + (format fd "#\n") + (mapcar + (lambda (tok_item) + (if (string-matches (item.name tok_item) "\\*.+\\*") + (format fd "%2.4f 100 %s\n" + (find_com_time utt tok_item) + (item.name tok_item)))) + (utt.relation.items utt 'Token)) + (fclose fd) + utt)) + +(define (find_com_time utt tok_item) +"Returns time of tok_item. Looks backward for first token that +is related to a word and returns the end time of that word." + (cond + ((item.daughtern tok_item) + (item.feat (item.daughtern tok_item) "word_end")) + ((not (item.prev tok_item)) ;; start of stream + 0.0) + (t + (find_com_time utt (item.prev tok_item))))) + +(define (th_output_info utt) + "(th_output_info utt) +This is called after linguistic analysis but before waveform synthesis. +It collects the phone and duration data and also any th commands +found in the utterance. The file names are then passed to some +external program which will process them for the talking head." + (set! th-current-file "/tmp/th") ;; this should have a process id in it + (utt.save.phonedata utt (string-append th-current-file ".ph")) + (utt.save.commands utt (string-append th-current-file ".com")) + ;; It would be good to background this process as long as you + ;; resync at play time + (system (format nil "%s %s %s" + th-prepare-prog + (string-append th-current-file ".ph") + (string-append th-current-file ".ph"))) + utt) + +;;; +;;; Define a new text mode for talking heads +;;; + +(define (th_init_func) + "Called on starting talking head text mode." + (set! th_previous_t2w_func token_to_words) + (set! th_previous_after_analysis_hooks after_analysis_hooks) + (set! after_analysis_hooks (list th_output_info)) + (set! english_token_to_words th_token_to_words) + (set! token_to_words th_token_to_words) + + ;; We assume the prepare talking head program generates a movie + ;; that can be played by something, so we redefie the audio + ;; player to play the generated animation and waveform. + (set! th_previous_Parameter Parameter) + (audio_mode 'sync) ;; ensure new Audio command gets passed to new audiosp + (Parameter.set 'Audio_Required_Format 'riff) + (Parameter.set 'Audio_Command "xanim /tmp/th.anime $FILE") + (Parameter.set 'Audio_Method 'Audio_Command) + (audio_mode 'async) +) + +(define (th_exit_func) + "Called on exit talking head text mode." + (set! token_to_words th_previous_t2w_func) + (set! english_token_to_words th_previous_t2w_func) + (set! after_analysis_hooks th_previous_after_analysis_hooks) + + (audio_mode 'sync) ;; so we can reset the audio + (set! Parameter th_previous_Parameter) +) + +(define (th_token_to_words token name) +"(th_token_to_words TOKEN NAME) +Talking head specific token to word rules." + (cond + ((string-matches name "\\*.*\\*") + ;; Symbols started and ended with an asterisk as treated as commands + ;; and not rendered as speech + nil) + (t + (th_previous_t2w_func token name)))) + +(set! tts_text_modes + (cons + (list + 'th ;; mode name + (list ;; ogimarkup mode params + (list 'init_func th_init_func) + (list 'exit_func th_exit_func))) + tts_text_modes)) + +(provide 'th-mode) diff --git a/examples/tobi.stml b/examples/tobi.stml new file mode 100644 index 0000000..17e2169 --- /dev/null +++ b/examples/tobi.stml @@ -0,0 +1,28 @@ + + + + + + + + + + + + + +I to go on a + . + + wanted to go on a + , +but wanted to go + + + + + + + diff --git a/examples/toksearch.scm b/examples/toksearch.scm new file mode 100644 index 0000000..9e17ec3 --- /dev/null +++ b/examples/toksearch.scm @@ -0,0 +1,109 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; A search for token occurrences in buckets of text +;;; +;;; This is only an example to aid you, this actually depends on +;;; the availability of databases we don't have permission to +;;; distribute. + +(set! text_dir "/home/awb/data/text/") + +;;; The databases themselves are identified by a file which names all +;;; the files in that databases. e.g. This expects bin/gutenberg.files +;;; to exists which should contain something like +;;; gutenberg/etext90/bill11.txt +;;; gutenberg/etext90/const11.txt +;;; gutenberg/etext90/getty11.txt + +(set! db_names + '("gutenberg" ;; books from gutenberg 21906570 + "desktopshop" ;; books, documents etc 23090463 + "time" ;; Time Magazine 1990-1994 6770175 + "hutch" ;; Hutchinson Encyclopedia 1715268 + "dicts" ;; Dictionaries and Encyclopedias 4248109 + "stw-ref" ;; Standard Reference libraries 3330448 + "treebank" ;; WSJ articles from PENN treebank 1109895 + "email" ;; awb's email + )) + +;;; Identify the tokens you want extracted +;;; Tokens may be regular expressions +(set! desired_tokens + '(lead wound tear axes Jan bass Nice Begin Chi Colon + St Dr III IV V X VII II "[0-9]+")) + +;;; First pass: to get examples and context for labelling +(set! desired_feats + '(filepos + p.p.p.p.name p.p.p.name p.p.name p.name + name + n.name nn.name n.n.n.name n.n.n.n.name)) +;;; Second: pass to get desried features for tree building +;;; Typically this has to be specific for a particular homograph +;;; so you'll probably want to do multiple second passes one for each +;;; homograph type +;(set! desired_feats +; '(filepos +; lisp_tok_rex +; p.punc +; punc +; n.punc +; pp.cap p.cap n.cap nn.cap +; )) + +(define (tok_search_db dbname) +"Search through DB for named tokens and save found occurrences." + (let ((outfile (string-append text_dir "fullhgs/" dbname ".out"))) + (delete-file outfile) + (mapcar + (lambda (fname) ;; for each file in the database + (extract_tokens ;; call internal function to extract tokens + (string-append text_dir fname) ;; full pathname to extract from + (mapcar ;; list of tokens and features + (lambda (t) ;; to extract + (cons t desired_feats)) + desired_tokens) + outfile)) + (load (string-append text_dir "bin/" dbname ".files") t)) + t)) + +(define (tok_do_all) +"Search all dbs for desired tokens." + (mapcar + (lambda (db) + (print db) + (tok_search_db db)) + db_names) + t) + diff --git a/examples/webdemo.scm b/examples/webdemo.scm new file mode 100644 index 0000000..11bb58c --- /dev/null +++ b/examples/webdemo.scm @@ -0,0 +1,103 @@ +;;; +;;; Sentences presynthesized on demo web page +;;; + +(set! utt1 +(Utterance Text +" +This is a short introduction to the Festival Speech Synthesis System. +Festival was developed by Alan Black and Paul Taylor, at the Centre +for Speech Technology Research, University of Edinburgh. +")) + +(set! utt2 +(Utterance Text +" +Festival currently uses a diphone synthesizer, both +residual excited LPC and PSOLA methods are supported. +The upper levels, duration and intonation, are generated from +statistically trained models, built from databases of natural speech. +The architecture of the system is designed to be flexible, including +various tools, which allow new modules to be added easily. +")) + +(define (make_waves) +"Synthesize the two examples and save them in the desired formats" + (Synth utt1) + (Parameter.set 'Wavefiletype 'riff) + (utt.save.wave utt1 "intro.wav") + (Parameter.set 'Wavefiletype 'ulaw) + (utt.save.wave utt1 "intro.au") + + (Synth utt2) + (Parameter.set 'Wavefiletype 'riff) + (utt.save.wave utt2 "intro2.wav") + (Parameter.set 'Wavefiletype 'ulaw) + (utt.save.wave utt2 "intro2.au") +) + +(set! welsh1 +(Utterance Text +"Dwi'n gallu llefaru pob llinell heb atal, oherwydd does dim tafod gyda fi.")) + +(define (make_welsh) + (voice_welsh_hl) + (Synth welsh1) + (Parameter.set 'Wavefiletype 'riff) + (utt.save.wave welsh1 "welsh1.wav") + (Parameter.set 'Wavefiletype 'ulaw) + (utt.save.wave welsh1 "welsh1.au")) + +(set! spanish1 +(Utterance Text +"m'uchos 'a~nos despu'es, fr'ente al pelot'on de fusilami'ento, el +coron'el aureli'ano buend'ia hab'ia de record'ar de aqu'el d'ia +lej'ano, en que su p'adre lo llev'o a conoc'er el hi'elo.")) + +(define (make_spanish) + (voice_spanish_el) + (Synth spanish1) + (Parameter.set 'Wavefiletype 'riff) + (utt.save.wave spanish1 "spanish1.wav") + (Parameter.set 'Wavefiletype 'ulaw) + (utt.save.wave spanish1 "spanish1.au")) + + +(set! utt_pos (Utterance Text +"My cat who lives dangerously had nine lives. ")) + +(set! utt_Bdi (Utterance Text +"He wanted to go for a drive in.")) +(set! utt_Bditc (Utterance Text +"He wanted to go for a drive in the country.")) + +(define (make_others) + (Synth utt_pos) + (Synth utt_Bdi) + (Synth utt_Bditc) + (Parameter.set 'Wavefiletype 'riff) + (utt.save.wave utt_pos "cat.wav") + (utt.save.wave utt_Bdi "Bdi.wav") + (utt.save.wave utt_Bditc "Bditc.wav") + (Parameter.set 'Wavefiletype 'ulaw) + (utt.save.wave utt_pos "cat.au") + (utt.save.wave utt_Bdi "Bdi.au") + (utt.save.wave utt_Bditc "Bditc.au")) + +(set! utt_diph (Utterance Text +"This is a short introduction to the Festival Speech Synthesis System.")) +(set! utt_sucs (Utterance Text +"This is a short introduction to the Festival Speech Synthesis System.")) + +(define (make_diphsbs) + (Synth utt_diph) + (Parameter.set 'Wavefiletype 'riff) + (utt.save.wave utt_diph "diph1.wav") + (Parameter.set 'Wavefiletype 'ulaw) + (utt.save.wave utt_diph "diph1.au") + (voice_gsw_450) + (Synth utt_sucs) + (Parameter.set 'Wavefiletype 'riff) + (utt.save.wave utt_sucs "sbs1.wav") + (Parameter.set 'Wavefiletype 'ulaw) + (utt.save.wave utt_sucs "sbs1.au")) diff --git a/festival-2.1-release.tar.gz b/festival-2.1-release.tar.gz new file mode 100644 index 0000000..ddd0708 Binary files /dev/null and b/festival-2.1-release.tar.gz differ diff --git a/install-sh b/install-sh new file mode 100755 index 0000000..e9de238 --- /dev/null +++ b/install-sh @@ -0,0 +1,251 @@ +#!/bin/sh +# +# install - install a program, script, or datafile +# This comes from X11R5 (mit/util/scripts/install.sh). +# +# Copyright 1991 by the Massachusetts Institute of Technology +# +# Permission to use, copy, modify, distribute, and sell this software and its +# documentation for any purpose is hereby granted without fee, provided that +# the above copyright notice appear in all copies and that both that +# copyright notice and this permission notice appear in supporting +# documentation, and that the name of M.I.T. not be used in advertising or +# publicity pertaining to distribution of the software without specific, +# written prior permission. M.I.T. makes no representations about the +# suitability of this software for any purpose. It is provided "as is" +# without express or implied warranty. +# +# Calling this script install-sh is preferred over install.sh, to prevent +# `make' implicit rules from creating a file called install from it +# when there is no Makefile. +# +# This script is compatible with the BSD install script, but was written +# from scratch. It can only install one file at a time, a restriction +# shared with many OS's install programs. + + +# set DOITPROG to echo to test this script + +# Don't use :- since 4.3BSD and earlier shells don't like it. +doit="${DOITPROG-}" + + +# put in absolute paths if you don't have them in your path; or use env. vars. + +mvprog="${MVPROG-mv}" +cpprog="${CPPROG-cp}" +chmodprog="${CHMODPROG-chmod}" +chownprog="${CHOWNPROG-chown}" +chgrpprog="${CHGRPPROG-chgrp}" +stripprog="${STRIPPROG-strip}" +rmprog="${RMPROG-rm}" +mkdirprog="${MKDIRPROG-mkdir}" + +transformbasename="" +transform_arg="" +instcmd="$mvprog" +chmodcmd="$chmodprog 0755" +chowncmd="" +chgrpcmd="" +stripcmd="" +rmcmd="$rmprog -f" +mvcmd="$mvprog" +src="" +dst="" +dir_arg="" + +while [ x"$1" != x ]; do + case $1 in + -c) instcmd="$cpprog" + shift + continue;; + + -d) dir_arg=true + shift + continue;; + + -m) chmodcmd="$chmodprog $2" + shift + shift + continue;; + + -o) chowncmd="$chownprog $2" + shift + shift + continue;; + + -g) chgrpcmd="$chgrpprog $2" + shift + shift + continue;; + + -s) stripcmd="$stripprog" + shift + continue;; + + -t=*) transformarg=`echo $1 | sed 's/-t=//'` + shift + continue;; + + -b=*) transformbasename=`echo $1 | sed 's/-b=//'` + shift + continue;; + + *) if [ x"$src" = x ] + then + src=$1 + else + # this colon is to work around a 386BSD /bin/sh bug + : + dst=$1 + fi + shift + continue;; + esac +done + +if [ x"$src" = x ] +then + echo "install: no input file specified" + exit 1 +else + true +fi + +if [ x"$dir_arg" != x ]; then + dst=$src + src="" + + if [ -d $dst ]; then + instcmd=: + chmodcmd="" + else + instcmd=mkdir + fi +else + +# Waiting for this to be detected by the "$instcmd $src $dsttmp" command +# might cause directories to be created, which would be especially bad +# if $src (and thus $dsttmp) contains '*'. + + if [ -f $src -o -d $src ] + then + true + else + echo "install: $src does not exist" + exit 1 + fi + + if [ x"$dst" = x ] + then + echo "install: no destination specified" + exit 1 + else + true + fi + +# If destination is a directory, append the input filename; if your system +# does not like double slashes in filenames, you may need to add some logic + + if [ -d $dst ] + then + dst="$dst"/`basename $src` + else + true + fi +fi + +## this sed command emulates the dirname command +dstdir=`echo $dst | sed -e 's,[^/]*$,,;s,/$,,;s,^$,.,'` + +# Make sure that the destination directory exists. +# this part is taken from Noah Friedman's mkinstalldirs script + +# Skip lots of stat calls in the usual case. +if [ ! -d "$dstdir" ]; then +defaultIFS=' +' +IFS="${IFS-${defaultIFS}}" + +oIFS="${IFS}" +# Some sh's can't handle IFS=/ for some reason. +IFS='%' +set - `echo ${dstdir} | sed -e 's@/@%@g' -e 's@^%@/@'` +IFS="${oIFS}" + +pathcomp='' + +while [ $# -ne 0 ] ; do + pathcomp="${pathcomp}${1}" + shift + + if [ ! -d "${pathcomp}" ] ; + then + $mkdirprog "${pathcomp}" + else + true + fi + + pathcomp="${pathcomp}/" +done +fi + +if [ x"$dir_arg" != x ] +then + $doit $instcmd $dst && + + if [ x"$chowncmd" != x ]; then $doit $chowncmd $dst; else true ; fi && + if [ x"$chgrpcmd" != x ]; then $doit $chgrpcmd $dst; else true ; fi && + if [ x"$stripcmd" != x ]; then $doit $stripcmd $dst; else true ; fi && + if [ x"$chmodcmd" != x ]; then $doit $chmodcmd $dst; else true ; fi +else + +# If we're going to rename the final executable, determine the name now. + + if [ x"$transformarg" = x ] + then + dstfile=`basename $dst` + else + dstfile=`basename $dst $transformbasename | + sed $transformarg`$transformbasename + fi + +# don't allow the sed command to completely eliminate the filename + + if [ x"$dstfile" = x ] + then + dstfile=`basename $dst` + else + true + fi + +# Make a temp file name in the proper directory. + + dsttmp=$dstdir/#inst.$$# + +# Move or copy the file name to the temp name + + $doit $instcmd $src $dsttmp && + + trap "rm -f ${dsttmp}" 0 && + +# and set any options; do chmod last to preserve setuid bits + +# If any of these fail, we abort the whole thing. If we want to +# ignore errors from any of these, just make sure not to ignore +# errors from the above "$doit $instcmd $src $dsttmp" command. + + if [ x"$chowncmd" != x ]; then $doit $chowncmd $dsttmp; else true;fi && + if [ x"$chgrpcmd" != x ]; then $doit $chgrpcmd $dsttmp; else true;fi && + if [ x"$stripcmd" != x ]; then $doit $stripcmd $dsttmp; else true;fi && + if [ x"$chmodcmd" != x ]; then $doit $chmodcmd $dsttmp; else true;fi && + +# Now rename the file to the real destination. + + $doit $rmcmd -f $dstdir/$dstfile && + $doit $mvcmd $dsttmp $dstdir/$dstfile + +fi && + + +exit 0 diff --git a/lib/Makefile b/lib/Makefile new file mode 100644 index 0000000..bd89321 --- /dev/null +++ b/lib/Makefile @@ -0,0 +1,107 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +# # +# Makefile for lib directory # +# # +########################################################################### +TOP=.. +DIRNAME=lib +BUILD_DIRS=etc multisyn +ALL_DIRS=$(BUILD_DIRS) + +PHONESETS = mrpa_phones.scm mrpa_allophones.scm radio_phones.scm \ + holmes_phones.scm darpa_phones.scm phoneset.scm \ + cmusphinx2_phones.scm unilex_phones.scm +DURSTATS = mrpa_durs.scm klatt_durs.scm gswdurtreeZ.scm f2bdurtreeZ.scm +INTSTATS = tobi.scm f2bf0lr.scm tobi_rules.scm \ + tilt.scm apml.scm apml_f2bf0lr.scm apml_kaldurtreeZ.scm +DBS = +LTSRULES = engmorph.scm engmorphsyn.scm lts.scm lts_build.scm +BRKMODELS = sec.ts20.quad.ngrambin sec.B.hept.ngrambin +MODES = email-mode.scm ogimarkup-mode.scm sable-mode.scm soleml-mode.scm \ + singing-mode.scm +GENERAL = init.scm synthesis.scm module_description.scm \ + lexicons.scm \ + festival.scm intonation.scm duration.scm pos.scm phrase.scm \ + voices.scm tts.scm festdoc.scm languages.scm token.scm \ + mbrola.scm display.scm postlex.scm tokenpos.scm \ + festtest.scm cslush.scm cart_aux.scm pauses.scm \ + scfg.scm mettree.scm java.scm clunits.scm clunits_build.scm \ + siteinit.scm +HTS = hts.scm +OTHERS = Sable.v0_2.dtd sable-latin.ent festival.el scfg_wsj_wp20.gram \ + speech.properties Singing.v0_1.dtd + +SIOD = siod.scm web.scm cstr.scm fringe.scm + +FILES=Makefile VCLocalRules $(PHONESETS) $(DURSTATS) $(INTSTATS) $(DBS) \ + $(BRKMODELS) $(GENERAL) $(LTSRULES) $(OTHERS) $(MODES) $(HTS) + +LOCAL_CLEAN=$(SIOD) + +ALL=.copy_from_est .sub_directories + +include $(TOP)/config/common_make_rules + +.copy_from_est: $(SIOD) + @: + +$(SIOD) : % : $(EST)/lib/siod/% + @echo 'Copy $* from EST/lib/siod' + @$(RM) -f $* + @{ \ + echo '' ;\ + echo '' ;\ + echo '' ;\ + echo '' ;\ + echo ' ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;' ;\ + echo ' ;;; DO NOT EDIT THIS FILE ON PAIN OF MORE PAIN.' ;\ + echo ' ;;; ' ;\ + echo ' ;;; The master copy of this file is in $(EST)/lib/siod/$*' ;\ + echo ' ;;; and is copied here at build time.' ;\ + echo ' ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;' ;\ + echo '' ;\ + echo '' ;\ + echo '' ;\ + echo '' ;\ + echo '' ;\ + echo '' ;\ + echo '' ;\ + echo '' ;\ + echo '' ;\ + echo '' ;\ + echo '' ;\ + echo '' ;\ + } | cat - $(EST)/lib/siod/$* |sed -e '/mode: *scheme/s//mode: view/' > $* + @chmod a-w $* + diff --git a/lib/Sable.v0_2.dtd b/lib/Sable.v0_2.dtd new file mode 100644 index 0000000..63e7f23 --- /dev/null +++ b/lib/Sable.v0_2.dtd @@ -0,0 +1,137 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +%ISOlat1; + + + diff --git a/lib/Singing.v0_1.dtd b/lib/Singing.v0_1.dtd new file mode 100644 index 0000000..b0dd8a8 --- /dev/null +++ b/lib/Singing.v0_1.dtd @@ -0,0 +1,34 @@ + + + + + + + + + + + + + + + + + + + +%ISOlat1; + + + diff --git a/lib/VCLocalRules b/lib/VCLocalRules new file mode 100644 index 0000000..45bb832 --- /dev/null +++ b/lib/VCLocalRules @@ -0,0 +1,8 @@ +SIOD = siod.scm web.scm cstr.scm fringe.scm + +.copy_from_est: $(SIOD) + +$(SIOD) : + @echo 'Copy $@ from EST/lib/siod' + -del $@ + copy $(EST)\lib\siod\$@ $@ \ No newline at end of file diff --git a/lib/apml.scm b/lib/apml.scm new file mode 100644 index 0000000..613f207 --- /dev/null +++ b/lib/apml.scm @@ -0,0 +1,547 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 2002 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Author: Rob Clark +;;; Date: July 2002 +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; Sets up the current voice to synthesise from APML. +;; +;; + +(require 'apml_f2bf0lr) +(require 'apml_kaldurtreeZ) + +;; Default pitch settings (if unspecified in current voice.) + +(defvar apml_default_pitch_mean 170 ) +(defvar apml_default_pitch_standard_deviation 34 ) + +;; apml sythesis wrappers. + +(define (apml_client_synth apml) + "(apml_client_synth apml) +Synthesise apml and return waveform(s) to client." + (utt.send.wave.client (apml_synth apml))) + +(define (apml_synth apml) +"(apml_synth xml) +Synthesis an apml string." +(let ((tmpfile (make_tmp_filename)) + utt) + (string_to_file tmpfile apml) + (set! utt (apml_file_synth tmpfile)) + (delete-file tmpfile) + utt)) + +(define (apml_file_synth filename) + "(apml_file_synth filename) +Synthesis an apml file." + (let ((utt (Utterance Tokens nil))) + (utt.load utt filename) + (utt.synth utt))) + +(define (string_to_file file s) +"(string_to_file file string) + Write string to file." +(let ((fd)) + (set! fd (fopen file "wb")) + (format fd "%s" s) + (fclose fd))) + + +;;; +;;; Phrasing. +;;; + +;; phrasing CART. +; +; It has been decided that by default, only punctuation should affect +; phrasing (and subsequently pauses) +; +(set! apml_phrase_tree + ' + ((lisp_apml_punc in ("?" "." ":")) ; big punctuation + ((BB)) + ((lisp_apml_punc in ("'" "\"" "," ";")) ; else little punctuation + ((B)) + ((lisp_apml_last_word is 1) + ((BB)) ; need a BB at the end! + ((NB)))))) ; else nothing + +;; feature functions for phrasing +(define (apml_punc word) + (item.feat (item.relation.parent word 'Token) 'punc)) + +(define (apml_last_word word) + (if (item.next word) + "0" "1")) + + +;;; +;;; Pauses +;;; + +;; feature functions for pauses +(define (apml_is_pause word) + (if (item.relation (item.relation.parent word 'Token) 'Pause) + t + nil)) + +(define (apml_pause word) + (if (item.relation word 'Pause) + (item.feat (item.relation.parent (item.relation.parent word 'Token) 'Pause) "sec") + 0)) + +(define (Apml_Pauses utt) + "(Pauses UTT) +Predict pause insertion for apml." + (let ((words (utt.relation.items utt 'Word)) lastword tpname) + (if words + (begin + (insert_initial_pause utt) ;; always have a start pause + (set! lastword (car (last words))) + (mapcar + (lambda (w) + (let ((pbreak (item.feat w "pbreak")) + (emph (item.feat w "R:Token.parent.EMPH"))) + (cond + ((apml_is_pause w) + (insert_pause utt w)) + ((or (string-equal "B" pbreak) + (string-equal "BB" pbreak)) + (insert_pause utt w)) + ((equal? w lastword) + (insert_pause utt w))))) + words) + ;; The embarassing bit. Remove any words labelled as punc or fpunc + (mapcar + (lambda (w) + (let ((pos (item.feat w "pos"))) + (if (or (string-equal "punc" pos) + (string-equal "fpunc" pos)) + (let ((pbreak (item.feat w "pbreak")) + (wp (item.relation w 'Phrase))) + (if (and (string-matches pbreak "BB?") + (item.relation.prev w 'Word)) + (item.set_feat + (item.relation.prev w 'Word) "pbreak" pbreak)) + (item.relation.remove w 'Word) + ;; can't refer to w as we've just deleted it + (item.relation.remove wp 'Phrase))))) + words))) + utt)) + + + +;;; +;;; Intonation. +;;; + +;; Accent prediction (well transfer really). +;; +;; We treat L+H* L-H% on a single syllable as a special case. + +(set! apml_accent_cart + ' + ((lisp_apml_accent is "Hstar") + ((H*)) + ((lisp_apml_accent is "Lstar") + ((L*)) + ((lisp_apml_LHLH is "LHLH") + ((L+H*L-H%)) + ((lisp_apml_accent is "LplusHstar") + ((L+H*)) + ((lisp_apml_accent is "LstarplusH") + ((L*+H)) + ((NONE)))))))) + +(set! apml_boundary_cart + ' + ((lisp_apml_boundary is "LL") + ((L-L%)) + ((lisp_apml_LHLH is "LHLH") + ((NONE)) ; this is dealt with by the accent feature + ((lisp_apml_boundary is "LH") + ((L-H%)) + ((lisp_apml_boundary is "HH") + ((H-H%)) + ((lisp_apml_boundary is "HL") + ((H-L%)) + ((NONE)))))))) + +;; feature functions. +(define (apml_accent syl) + (let ((token (item.relation.parent (item.relation.parent syl 'SylStructure) 'Token))) + (if (and (eq (item.feat syl 'stress) 1) + (item.relation.parent token 'Emphasis)) + (item.feat (item.relation.parent token 'Emphasis) 'x-pitchaccent) + 0))) + +(define (apml_boundary syl) + (let ((token (item.relation.parent (item.relation.parent syl 'SylStructure) 'Token))) + (if (and (> (item.feat syl 'syl_break) 0) + (item.relation.parent token 'Boundary)) + (item.feat (item.relation.parent token 'Boundary) 'type) + 0))) + +(define (apml_LHLH syl) + (let ((accent (apml_accent syl)) + (boundary (apml_boundary syl))) + (if (and (string-equal accent "LplusHstar") + (string-equal boundary "LH")) + "LHLH" + 0))) + + +(define (apml_seg_is_LHLH_vowel seg) + (if (and (string-equal (apml_LHLH (item.relation.parent seg 'SylStructure)) + "LHLH") + (string-equal (item.feat seg 'ph_vc) "+")) + "LHLH" + 0)) + + +;;;; feature functions: + +(define (apml_tgtype syl) + (let ((l (apml_boundl (item.relation.parent syl 'SylStructure))) + (r (apml_boundr (item.relation.parent syl 'SylStructure)))) + (if (eq (item.feat syl 'accented) 0) + 0 ; this is a quirk related to the way the models were trained + (cond + ((eq l 0) + 1) + ((eq r 1) + 3) + (t 2))))) + + +(define (apml_iecount syl) + (if (eq (item.feat syl 'accented) 0) + 0 ; this is a quirk related to the way the models were trained + (+ (item.feat syl 'asyl_in) 1))) + +;; suport functions. +(define (apml_boundl word) +"(apml_boundl word) +Number of boundaries in this performative to the left of this word." + (let ((w (item.prev word)) + (c 0)) + (while (and w (apml_same_p w word)) + (if (item.relation.parent (item.relation.parent w 'Token) 'Boundary) + (set! c (+ c 1))) + (set! w (item.prev w))) + c)) + +(define (apml_boundr word) +"(apml_boundr word) +Number of boundaries in this performative to the right of this word." + (let ((w word) + (c 0)) + (while (and w (apml_same_p w word)) + (if (item.relation.parent (item.relation.parent w 'Token) 'Boundary) + (set! c (+ c 1))) + (set! w (item.next w))) + c)) + +(define (apml_same_p w1 w2) +"(apml_same_p w1 w2) + Are these two words in the same performative?" +(let ((p1 (item.relation.parent (item.relation.parent w1 'Token) 'SemStructure)) + (p2 (item.relation.parent (item.relation.parent w1 'Token) 'SemStructure))) + (if (and (item.parent p1) (item.parent p2)) ; not true if theme/rheme omitted. + (equal? (item.parent p1) (item.parent p2)) + (equal? p1 p2)))) + +;;; +;;; segment timings +;;; + +(define (apml_seg_times utt) + "(apml_seg_times utt) +Output the segment timings for an apml utterance." + (let ((segs (utt.relation.items utt 'Segment))) + (mapcar + (lambda (x) + (format t "%s %s\n" (item.name x) (item.feat x 'end))) + segs) + t)) + +;;; +;;; Additional functions for f0model. +;;; + + +(define (find_hstar_left syl) +"(find_hstar_left syl) +If the closest accent or boundary to the left is H* return how many syllables away it is. Returns 0 if nearest accent is not H*" +(let ((count 0)) + ;; if this syllable has a pitch event + (if (or (not (string-equal (item.feat syl 'tobi_accent) "NONE")) + (not (string-equal (item.feat syl 'tobi_endtone) "NONE"))) + 0) + (while (and syl + (string-equal (item.feat syl 'tobi_accent) "NONE") + (string-equal (item.feat syl 'tobi_endtone) "NONE")) + (set! count (+ count 1)) + (set! syl (item.prev syl))) + (cond + ;; run out of syllables before finding accent + ((null syl) + 0) + ((string-equal (item.feat syl 'tobi_accent) "H*") + count) + (t 0)))) + +(define (find_ll_right syl) +"(find_ll_right syl) +If the closest accent or boundary to the right is L-L% return how many syllables away it is. Returns 0 if nearest is not L-L%." +(let ((count 0)) + ;; if this syllable has a pitch event + (if (or (not (string-equal (item.feat syl 'tobi_accent) "NONE")) + (not (string-equal (item.feat syl 'tobi_endtone) "NONE"))) + 0) + (while (and syl + (string-equal (item.feat syl 'tobi_accent) "NONE") + (string-equal (item.feat syl 'tobi_endtone) "NONE")) + (set! count (+ count 1)) + (set! syl (item.next syl))) + (cond + ;; run out of syllables before finding boundary + ((null syl) + 0) + ((string-equal (item.feat syl 'tobi_endtone) "L-L%") + count) + (t 0)))) + +(define (l_spread syl) +"(l_spread syl) +Proportion of pitch lowering required due to L- spreading backwards." +(let ((l (find_hstar_left syl)) + (r (find_ll_right syl))) + (cond + ((or (eq l 0) + (eq r 0)) + 0) + (t + (/ r (- (+ l r) 1)))))) + + +;;; +;;; Debuging and other useful stuff. +;;; + + + +(define (apml_print_semstruct utt) +"(apml_print_semstruct utt) +Pretty print APML semantic structure." + (let ((i (utt.relation.first utt 'SemStructure))) + (while (not (null i)) + (apml_pss_item 0 i) + (apml_pss_daughters 1 (item.daughters i)) + (set! i (item.next i))))) + +(define (apml_pss_daughters depth list) + (mapcar + (lambda (x) + (apml_pss_item depth x) + (apml_pss_daughters (+ depth 1) (item.daughters x)) + ) + list)) + + +(define (apml_pss_item depth item) + (let ((c 0)) + (while (< c depth) + (format t " ") + (set! c (+ c 1))) + (format t "%s\n" (item.name item)))) + + +(define (apml_print_words utt) +"(apml_print_words utt) + Pretty print APML words with associated accents." + (mapcar + (lambda (x) + (format t "%s (" (item.name x)) + (apml_pww_accent x) + (apml_pww_boundary x) + (apml_pww_pause x) + (format t ")\n")) + (utt.relation.items utt 'Word)) + t) + +(define (apml_pww_accent item) + (let ((p (item.relation.parent (item.relation.parent item 'Token) 'Emphasis))) + (if p (apml_ppw_list (item.features p))))) + +(define (apml_pww_boundary item) + (let ((p (item.relation.parent (item.relation.parent item 'Token) 'Boundary))) + (if p (apml_ppw_list (item.features p))))) + +(define (apml_pww_pause item) + (let ((p (item.relation.parent (item.relation.parent item 'Token) 'Pause))) + (if p (apml_ppw_list (item.features p))))) + +(define (apml_ppw_list l) + (mapcar + (lambda (x) + (format t " %s" x)) + (flatten l))) + + +(define (apml_print_sylstructure utt) +"(apml_print_sylstructure utt) +Pretty print APML syllable structure." + (mapcar + (lambda (x) + (format t "%s\n" (item.name x)) + (apml_psyl x)) + (utt.relation.items utt 'Word)) + t) + +(define (apml_psyl word) + (mapcar + (lambda (x) + (apml_psegs x) + (if (eq (item.feat x 'stress) 1) + (format t " (1)")) + (if (item.relation.daughter1 x 'Intonation) + (begin + (let ((ie (item.relation.daughter1 x 'Intonation))) + (format t " [") + (while ie + (format t "%s" (item.name ie)) + (set! ie (item.next ie)) + (if ie (format t " "))) + (format t "]")))) + (format t "\n")) + (item.daughters (item.relation word 'SylStructure)))) + +(define (apml_psegs syl) + (let ((segs (item.daughters syl))) + (format t " ") + (while segs + (format t "%s" (item.name (car segs))) + (if (cdr segs) + (format t ".")) + (set! segs (cdr segs))))) + + +(define (apml_get_lr_params) + (let ((m 0) + (s 0)) + (if (or (equal? (Parameter.get 'Int_Target_Method) Int_Targets_LR) + (equal? (Parameter.get 'Int_Target_Method) Int_Targets_5_LR)) + (begin + (set! m (car (cdr (car int_lr_params)))) + (set! s (car (cdr (car (cdr int_lr_params)))))) + (begin + (set! m apml_default_pitch_mean) + (set! s apml_default_pitch_standard_deviation))) + (list m s))) + + + + +(define (apml_initialise) + "(apml_initialise) +Set up the current voice for apml use." + (if (not (string-matches current-voice ".*multisyn.*")) ; nothing if multisyn + (cond + ((or (string-equal (Parameter.get 'Language) "americanenglish") + (string-equal (Parameter.get 'Language) "britishenglish")) + (begin + (format t "Initialising APML for English.\n") + ;; Phrasing. + (Parameter.set 'Phrase_Method 'cart_tree) + (set! phrase_cart_tree apml_phrase_tree) + ;; Pauses. + ;;(set! duration_cart_tree apml_kal_duration_cart_tree) + ;;(set! duration_ph_info apml_kal_durs) + ;;(Parameter.set 'Pause_Method Apml_Pauses) + ;; Lexicon. + ;;;; We now assume the lexicon you have already set is suitable, + ;;;; You probably want to ensure this is "apmlcmu" or "unilex" + ;;(if (not (member_string "apmlcmu" (lex.list))) + ;; (load (path-append lexdir "apmlcmu/apmlcmulex.scm"))) + ;;(lex.select "apmlcmu") + ;; Add other lex entries here: + ;;(lex.add.entry '("minerals" nil (((m ih n) 1) ((er) 0) ((ax l z) 0)))) + ;;(lex.add.entry '("fibre" nil (((f ay b) 1) ((er) 0)))) + ;;(lex.add.entry '("dont" v (((d ow n t) 1)))) + ;;(lex.add.entry '("pectoris" nil (((p eh k) 2) ((t ao r) 1) ((ih s) 0)))) + ;;(lex.add.entry '("sideeffects" nil (((s ay d) 1) ((ax f) 0) ((eh k t s) 2)))) + + ;; Intonation events. + (set! int_accent_cart_tree apml_accent_cart) + (set! int_tone_cart_tree apml_boundary_cart) + (Parameter.set 'Int_Method Intonation_Tree) + ;; Intonation f0 contour. + (set! f0_lr_start apml_f2b_f0_lr_start) + (set! f0_lr_left apml_f2b_f0_lr_left) + (set! f0_lr_mid apml_f2b_f0_lr_mid) + (set! f0_lr_right apml_f2b_f0_lr_right) + (set! f0_lr_end apml_f2b_f0_lr_end) + (set! int_lr_params + (list (list 'target_f0_mean (car (apml_get_lr_params))) + (list 'target_f0_std (car (cdr (apml_get_lr_params)))) + (list 'model_f0_mean 170) + (list 'model_f0_std 40))) + (Parameter.set 'Int_Target_Method Int_Targets_5_LR) + nil)) + ((string-equal (Parameter.get 'Language) "italian") + (begin + (format t "Initialising APML for Italian.\n") + ;; Phrasing. + (Parameter.set 'Phrase_Method 'cart_tree) + (set! phrase_cart_tree apml_phrase_tree) + ;; Intonation events. + (set! int_accent_cart_tree apml_accent_cart) + (set! int_tone_cart_tree apml_boundary_cart) + (Parameter.set 'Int_Method Intonation_Tree) + ;; Intonation f0 contour. + (set! f0_lr_start apml_f2b_f0_lr_start) + (set! f0_lr_mid apml_f2b_f0_lr_mid) + (set! f0_lr_end apml_f2b_f0_lr_end) + (set! int_lr_params + (list (list 'target_f0_mean (car (apml_get_lr_params))) + (list 'target_f0_std (car (cdr (apml_get_lr_params)))) + (list 'model_f0_mean 170) + (list 'model_f0_std 34))) + (Parameter.set 'Int_Target_Method Int_Targets_LR) + nil)) + (t nil)))) + +(provide 'apml) diff --git a/lib/apml_f2bf0lr.scm b/lib/apml_f2bf0lr.scm new file mode 100644 index 0000000..3d312a8 --- /dev/null +++ b/lib/apml_f2bf0lr.scm @@ -0,0 +1,530 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 2002 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Author: Rob Clark +;;; Date: July 2002 +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; APML.f0 trees. +;; +;; + +(set! apml_f2b_f0_lr_start +'( +( Intercept 163.9871 ) +( pp.lisp_apml_tgtype -3.1750 (1) ) +( p.lisp_apml_tgtype 5.0332 (1) ) +( lisp_apml_tgtype 0.0000 (1) ) +( n.lisp_apml_tgtype 17.7799 (1) ) +( nn.lisp_apml_tgtype 13.6845 (1) ) +( pp.lisp_apml_tgtype 0.0000 (2) ) +( p.lisp_apml_tgtype 0.0000 (2) ) +( lisp_apml_tgtype 0.0000 (2) ) +( n.lisp_apml_tgtype 0.0000 (2) ) +( nn.lisp_apml_tgtype 0.0000 (2) ) +( pp.lisp_apml_tgtype 0.0000 (3) ) +( p.lisp_apml_tgtype 0.0000 (3) ) +( lisp_apml_tgtype -9.7245 (3) ) +( n.lisp_apml_tgtype 0.0000 (3) ) +( nn.lisp_apml_tgtype -2.4009 (3) ) +( pp.lisp_apml_iecount 0.0000 ) +( p.lisp_apml_iecount -0.4484 ) +( lisp_apml_iecount 0.0000 ) +( n.lisp_apml_iecount -2.0165 ) +( nn.lisp_apml_iecount 0.0000 ) +( pp.tobi_accent 0.0000 (H*) ) +( p.tobi_accent 11.1239 (H*) ) +( tobi_accent 21.5164 (H*) ) +( n.tobi_accent -2.5990 (H*) ) +( nn.tobi_accent -6.5307 (H*) ) +( pp.tobi_accent 0.0000 (L*) ) +( p.tobi_accent -10.0000 (L*) ) +( tobi_accent -5.0000 (L*) ) +( n.tobi_accent -10.6798 (L*) ) +( nn.tobi_accent -5.6561 (L*) ) +( pp.tobi_accent 5.3577 (L*+H) ) +( p.tobi_accent 60.0000 (L*+H) ) +( tobi_accent -5.0000 (L*+H) ) +( n.tobi_accent 0.0000 (L*+H) ) +( nn.tobi_accent 0.0000 (L*+H) ) +( pp.tobi_accent 0.0000 (L+H*) ) +( p.tobi_accent 11.1200 (L+H*) ) +( tobi_accent 21.5200 (L+H*) ) +( n.tobi_accent -2.6000 (L+H*) ) +( nn.tobi_accent -6.5300 (L+H*) ) +( pp.tobi_endtone 0.0000 (L-L%) ) +( p.tobi_endtone -0.6164 (L-L%) ) +( tobi_endtone -50 (L-L%) ) +( n.tobi_endtone -10.8729 (L-L%) ) +( nn.tobi_endtone -7.6522 (L-L%) ) +( pp.tobi_endtone 0.7583 (L-H%) ) +( p.tobi_endtone 0.0000 (L-H%) ) +( tobi_endtone -20.0000 (L-H%) ) +( n.tobi_endtone -11.8935 (L-H%) ) +( nn.tobi_endtone -7.2012 (L-H%) ) +( pp.tobi_endtone 0.0000 (H-L%) ) +( p.tobi_endtone 0.0000 (H-L%) ) +( tobi_endtone 4.0790 (H-L%) ) +( n.tobi_endtone -19.3463 (H-L%) ) +( nn.tobi_endtone -29.3615 (H-L%) ) +( pp.tobi_endtone 0.0000 (H-H%) ) +( p.tobi_endtone 0.0000 (H-H%) ) +( tobi_endtone 0.0000 (H-H%) ) +( n.tobi_endtone 0.0000 (H-H%) ) +( nn.tobi_endtone 0.0000 (H-H%) ) +( pp.tobi_endtone 0.0000 (L-) ) +( p.tobi_endtone -15.1702 (L-) ) +( tobi_endtone 0.0000 (L-) ) +( n.tobi_endtone -14.5562 (L-) ) +( nn.tobi_endtone 0.0000 (L-) ) +( pp.tobi_endtone -13.5046 (H-) ) +( p.tobi_endtone 0.0000 (H-) ) +( tobi_endtone 6.3377 (H-) ) +( n.tobi_endtone -6.8631 (H-) ) +( nn.tobi_endtone 0.0000 (H-) ) +( p.tobi_accent 60.0000 (L+H*L-H%) ) +( tobi_accent -60.0000 (L+H*L-H%) ) +( n.tobi_accent 0.0000 (L+H*L-H%) ) +( pp.syl_break 0.0000 ) +( p.syl_break 0.0000 ) +( syl_break 0.6417 ) +( n.syl_break 1.3532 ) +( nn.syl_break 1.0724 ) +( pp.stress 0.0000 ) +( p.stress -0.6193 ) +( stress 2.4121 ) +( n.stress 0.0000 ) +( nn.stress 2.5478 ) +( syl_in -1.4373 ) +( syl_out 0.4181 ) +( ssyl_in 0.0000 ) +( ssyl_out 0.6125 ) +( asyl_in 0.0000 ) +( asyl_out 0.9906 ) +( last_accent 0.0000 ) +( next_accent -0.3700 ) +( sub_phrases 0.0000 ) +( lisp_l_spread -60.0000 ) +)) + +(set! apml_f2b_f0_lr_left +'( +( Intercept 162.1173 ) +( pp.lisp_apml_tgtype -1.5875 (1) ) +( p.lisp_apml_tgtype 4.8101 (1) ) +( lisp_apml_tgtype 12.8265 (1) ) +( n.lisp_apml_tgtype 16.3027 (1) ) +( nn.lisp_apml_tgtype 13.3225 (1) ) +( pp.lisp_apml_tgtype 0.0000 (2) ) +( p.lisp_apml_tgtype 1.7434 (2) ) +( lisp_apml_tgtype 6.7783 (2) ) +( n.lisp_apml_tgtype 0.6679 (2) ) +( nn.lisp_apml_tgtype 0.0000 (2) ) +( pp.lisp_apml_tgtype 1.6494 (3) ) +( p.lisp_apml_tgtype 1.2861 (3) ) +( lisp_apml_tgtype -2.0724 (3) ) +( n.lisp_apml_tgtype 0.0000 (3) ) +( nn.lisp_apml_tgtype -1.2004 (3) ) +( pp.lisp_apml_iecount 0.0000 ) +( p.lisp_apml_iecount -0.5857 ) +( lisp_apml_iecount 0.0000 ) +( n.lisp_apml_iecount -2.3543 ) +( nn.lisp_apml_iecount 0.0000 ) +( pp.tobi_accent 0.0000 (H*) ) +( p.tobi_accent 8.5867 (H*) ) +( tobi_accent 21.2169 (H*) ) +( n.tobi_accent -1.2995 (H*) ) +( nn.tobi_accent -6.5056 (H*) ) +( pp.tobi_accent 0.0000 (L*) ) +( p.tobi_accent -7.5000 (L*) ) +( tobi_accent -25.0000 (L*) ) +( n.tobi_accent -8.3939 (L*) ) +( nn.tobi_accent -4.5688 (L*) ) +( pp.tobi_accent 2.6789 (L*+H) ) +( p.tobi_accent 45.0000 (L*+H) ) +( tobi_accent -17.5000 (L*+H) ) +( n.tobi_accent -1.3600 (L*+H) ) +( nn.tobi_accent 0.0000 (L*+H) ) +( pp.tobi_accent 0.0000 (L+H*) ) +( p.tobi_accent 8.5850 (L+H*) ) +( tobi_accent 21.2200 (L+H*) ) +( n.tobi_accent -1.3000 (L+H*) ) +( nn.tobi_accent -6.5050 (L+H*) ) +( pp.tobi_endtone 1.8117 (L-L%) ) +( p.tobi_endtone -0.1681 (L-L%) ) +( tobi_endtone -70 (L-L%) ) +( n.tobi_endtone -8.9334 (L-L%) ) +( nn.tobi_endtone -8.4034 (L-L%) ) +( pp.tobi_endtone 1.2099 (L-H%) ) +( p.tobi_endtone 1.1220 (L-H%) ) +( tobi_endtone -10.0000 (L-H%) ) +( n.tobi_endtone -5.9467 (L-H%) ) +( nn.tobi_endtone -6.9072 (L-H%) ) +( pp.tobi_endtone 0.0000 (H-L%) ) +( p.tobi_endtone 0.0000 (H-L%) ) +( tobi_endtone 2.0395 (H-L%) ) +( n.tobi_endtone -12.3940 (H-L%) ) +( nn.tobi_endtone -24.2593 (H-L%) ) +( pp.tobi_endtone 0.0000 (H-H%) ) +( p.tobi_endtone 0.0000 (H-H%) ) +( tobi_endtone 0.0000 (H-H%) ) +( n.tobi_endtone 0.0000 (H-H%) ) +( nn.tobi_endtone 16.1076 (H-H%) ) +( pp.tobi_endtone -1.8913 (L-) ) +( p.tobi_endtone -15.5650 (L-) ) +( tobi_endtone -18.3620 (L-) ) +( n.tobi_endtone -9.8322 (L-) ) +( nn.tobi_endtone -1.8182 (L-) ) +( pp.tobi_endtone -13.4429 (H-) ) +( p.tobi_endtone 0.0000 (H-) ) +( tobi_endtone 1.9053 (H-) ) +( n.tobi_endtone -3.4315 (H-) ) +( nn.tobi_endtone 0.0000 (H-) ) +( p.tobi_accent 0.0000 (L+H*L-H%) ) +( tobi_accent 10.0000 (L+H*L-H%) ) +( n.tobi_accent 0.0000 (L+H*L-H%) ) +( pp.syl_break 0.3501 ) +( p.syl_break -0.8121 ) +( syl_break 0.3209 ) +( n.syl_break 0.7486 ) +( nn.syl_break 0.8182 ) +( pp.stress -0.9778 ) +( p.stress -0.3096 ) +( stress 2.7752 ) +( n.stress 0.9976 ) +( nn.stress 2.7343 ) +( syl_in -1.9845 ) +( syl_out 0.7142 ) +( ssyl_in 1.0376 ) +( ssyl_out 0.3062 ) +( asyl_in 0.0000 ) +( asyl_out 0.4953 ) +( last_accent 0.0000 ) +( next_accent 0.1084 ) +( sub_phrases 0.0000 ) +( lisp_l_spread -60.0000 ) +)) + +(set! apml_f2b_f0_lr_mid +'( +( Intercept 160.2474 ) +( pp.lisp_apml_tgtype 0.0000 (1) ) +( p.lisp_apml_tgtype 4.5869 (1) ) +( lisp_apml_tgtype 25.6530 (1) ) +( n.lisp_apml_tgtype 14.8255 (1) ) +( nn.lisp_apml_tgtype 12.9605 (1) ) +( pp.lisp_apml_tgtype 0.0000 (2) ) +( p.lisp_apml_tgtype 3.4867 (2) ) +( lisp_apml_tgtype 13.5566 (2) ) +( n.lisp_apml_tgtype 1.3359 (2) ) +( nn.lisp_apml_tgtype 0.0000 (2) ) +( pp.lisp_apml_tgtype 3.2989 (3) ) +( p.lisp_apml_tgtype 2.5723 (3) ) +( lisp_apml_tgtype 5.5798 (3) ) +( n.lisp_apml_tgtype 0.0000 (3) ) +( nn.lisp_apml_tgtype 0.0000 (3) ) +( pp.lisp_apml_iecount 0.0000 ) +( p.lisp_apml_iecount -0.7231 ) +( lisp_apml_iecount 0.0000 ) +( n.lisp_apml_iecount -2.6922 ) +( nn.lisp_apml_iecount 0.0000 ) +( pp.tobi_accent 0.0000 (H*) ) +( p.tobi_accent 6.0496 (H*) ) +( tobi_accent 20.9174 (H*) ) +( n.tobi_accent 0.0000 (H*) ) +( nn.tobi_accent -6.4804 (H*) ) +( pp.tobi_accent 0.0000 (L*) ) +( p.tobi_accent -5.0000 (L*) ) +( tobi_accent -45.0000 (L*) ) +( n.tobi_accent -6.1079 (L*) ) +( nn.tobi_accent -3.4815 (L*) ) +( pp.tobi_accent 0.0000 (L*+H) ) +( p.tobi_accent 30.0000 (L*+H) ) +( tobi_accent -30.0000 (L*+H) ) +( n.tobi_accent -2.7200 (L*+H) ) +( nn.tobi_accent 0.0000 (L*+H) ) +( pp.tobi_accent 0.0000 (L+H*) ) +( p.tobi_accent 6.0500 (L+H*) ) +( tobi_accent 20.9200 (L+H*) ) +( n.tobi_accent 0.0000 (L+H*) ) +( nn.tobi_accent -6.4800 (L+H*) ) +( pp.tobi_endtone 3.6235 (L-L%) ) +( p.tobi_endtone 0.2801 (L-L%) ) +( tobi_endtone -80 (L-L%) ) +( n.tobi_endtone -6.9938 (L-L%) ) +( nn.tobi_endtone -9.1546 (L-L%) ) +( pp.tobi_endtone 1.6616 (L-H%) ) +( p.tobi_endtone 2.2441 (L-H%) ) +( tobi_endtone 0.0000 (L-H%) ) +( n.tobi_endtone 0.0000 (L-H%) ) +( nn.tobi_endtone -6.6132 (L-H%) ) +( pp.tobi_endtone 0.0000 (H-L%) ) +( p.tobi_endtone 0.0000 (H-L%) ) +( tobi_endtone 0.0000 (H-L%) ) +( n.tobi_endtone -5.4416 (H-L%) ) +( nn.tobi_endtone -19.1570 (H-L%) ) +( pp.tobi_endtone 0.0000 (H-H%) ) +( p.tobi_endtone 0.0000 (H-H%) ) +( tobi_endtone 0.0000 (H-H%) ) +( n.tobi_endtone 0.0000 (H-H%) ) +( nn.tobi_endtone 32.2151 (H-H%) ) +( pp.tobi_endtone -3.7825 (L-) ) +( p.tobi_endtone -15.9598 (L-) ) +( tobi_endtone -36.7241 (L-) ) +( n.tobi_endtone -5.1082 (L-) ) +( nn.tobi_endtone -3.6363 (L-) ) +( pp.tobi_endtone -13.3813 (H-) ) +( p.tobi_endtone 0.0000 (H-) ) +( tobi_endtone -2.5270 (H-) ) +( n.tobi_endtone 0.0000 (H-) ) +( nn.tobi_endtone 0.0000 (H-) ) +( p.tobi_accent 0.0000 (L+H*L-H%) ) +( tobi_accent 40.0000 (L+H*L-H%) ) +( n.tobi_accent 0.0000 (L+H*L-H%) ) +( pp.syl_break 0.7003 ) +( p.syl_break -1.6241 ) +( syl_break 0.0000 ) +( n.syl_break 0.1439 ) +( nn.syl_break 0.5640 ) +( pp.stress -1.9556 ) +( p.stress 0.0000 ) +( stress 3.1383 ) +( n.stress 1.9952 ) +( nn.stress 2.9208 ) +( syl_in -2.5317 ) +( syl_out 1.0103 ) +( ssyl_in 2.0751 ) +( ssyl_out 0.0000 ) +( asyl_in 0.0000 ) +( asyl_out 0.0000 ) +( last_accent 0.0000 ) +( next_accent 0.5869 ) +( sub_phrases 0.0000 ) +( lisp_l_spread -60.0000 ) +)) + +(set! apml_f2b_f0_lr_right +'( +( Intercept 162.6687 ) +( pp.lisp_apml_tgtype -4.0459 (1) ) +( p.lisp_apml_tgtype 3.0601 (1) ) +( lisp_apml_tgtype 27.8166 (1) ) +( n.lisp_apml_tgtype 7.4127 (1) ) +( nn.lisp_apml_tgtype 11.3458 (1) ) +( pp.lisp_apml_tgtype -3.8091 (2) ) +( p.lisp_apml_tgtype 1.7434 (2) ) +( lisp_apml_tgtype 17.1672 (2) ) +( n.lisp_apml_tgtype 0.6679 (2) ) +( nn.lisp_apml_tgtype 0.0000 (2) ) +( pp.lisp_apml_tgtype 1.6494 (3) ) +( p.lisp_apml_tgtype 1.2861 (3) ) +( lisp_apml_tgtype 9.5674 (3) ) +( n.lisp_apml_tgtype -3.1085 (3) ) +( nn.lisp_apml_tgtype 0.0000 (3) ) +( pp.lisp_apml_iecount 0.0000 ) +( p.lisp_apml_iecount -0.7829 ) +( lisp_apml_iecount -0.5447 ) +( n.lisp_apml_iecount -1.3461 ) +( nn.lisp_apml_iecount -0.7178 ) +( pp.tobi_accent 0.7904 (H*) ) +( p.tobi_accent 3.0248 (H*) ) +( tobi_accent 14.1116 (H*) ) +( n.tobi_accent 0.0000 (H*) ) +( nn.tobi_accent -3.2402 (H*) ) +( pp.tobi_accent 0.0000 (L*) ) +( p.tobi_accent -2.5000 (L*) ) +( tobi_accent -32.5000 (L*) ) +( n.tobi_accent -3.0539 (L*) ) +( nn.tobi_accent -1.7408 (L*) ) +( pp.tobi_accent 0.0000 (L*+H) ) +( p.tobi_accent 17.5000 (L*+H) ) +( tobi_accent -9.0000 (L*+H) ) +( n.tobi_accent -2.8025 (L*+H) ) +( nn.tobi_accent -0.5455 (L*+H) ) +( pp.tobi_accent 0.7900 (L+H*) ) +( p.tobi_accent 3.0250 (L+H*) ) +( tobi_accent 14.1150 (L+H*) ) +( n.tobi_accent 0.0000 (L+H*) ) +( nn.tobi_accent -3.2400 (L+H*) ) +( pp.tobi_endtone 5.7534 (L-L%) ) +( p.tobi_endtone 0.1401 (L-L%) ) +( tobi_endtone -65 (L-L%) ) +( n.tobi_endtone -11.1795 (L-L%) ) +( nn.tobi_endtone -7.8158 (L-L%) ) +( pp.tobi_endtone 4.4276 (L-H%) ) +( p.tobi_endtone 1.1220 (L-H%) ) +( tobi_endtone 20.0000 (L-H%) ) +( n.tobi_endtone -6.8995 (L-H%) ) +( nn.tobi_endtone -6.1219 (L-H%) ) +( pp.tobi_endtone 2.4327 (H-L%) ) +( p.tobi_endtone 0.0000 (H-L%) ) +( tobi_endtone -7.5781 (H-L%) ) +( n.tobi_endtone -2.7208 (H-L%) ) +( nn.tobi_endtone -14.4838 (H-L%) ) +( pp.tobi_endtone 0.0000 (H-H%) ) +( p.tobi_endtone 0.0000 (H-H%) ) +( tobi_endtone 0.0000 (H-H%) ) +( n.tobi_endtone 0.0000 (H-H%) ) +( nn.tobi_endtone 16.1076 (H-H%) ) +( pp.tobi_endtone -1.8913 (L-) ) +( p.tobi_endtone -15.5651 (L-) ) +( tobi_endtone -40.2021 (L-) ) +( n.tobi_endtone -2.5541 (L-) ) +( nn.tobi_endtone -2.2224 (L-) ) +( pp.tobi_endtone -6.6906 (H-) ) +( p.tobi_endtone -3.5483 (H-) ) +( tobi_endtone -1.2635 (H-) ) +( n.tobi_endtone 0.0000 (H-) ) +( nn.tobi_endtone 0.0000 (H-) ) +( p.tobi_accent 0.0000 (L+H*L-H%) ) +( tobi_accent -40.0000 (L+H*L-H%) ) +( n.tobi_accent 0.0000 (L+H*L-H%) ) +( pp.syl_break 0.3501 ) +( p.syl_break -1.0003 ) +( syl_break -1.5536 ) +( n.syl_break 0.0720 ) +( nn.syl_break 0.5989 ) +( pp.stress -0.9778 ) +( p.stress -0.8046 ) +( stress 1.2124 ) +( n.stress 3.9715 ) +( nn.stress 2.3914 ) +( syl_in -2.3468 ) +( syl_out 0.9792 ) +( ssyl_in 2.0463 ) +( ssyl_out 0.0000 ) +( asyl_in -0.1460 ) +( asyl_out 0.0000 ) +( last_accent -1.0992 ) +( next_accent 0.2935 ) +( sub_phrases 0.0000 ) +( lisp_l_spread -60.0000 ) +)) + +(set! apml_f2b_f0_lr_end +'( +( Intercept 165.0901 ) +( pp.lisp_apml_tgtype -8.0918 (1) ) +( p.lisp_apml_tgtype 1.5332 (1) ) +( lisp_apml_tgtype 29.9802 (1) ) +( n.lisp_apml_tgtype 0.0000 (1) ) +( nn.lisp_apml_tgtype 9.7312 (1) ) +( pp.lisp_apml_tgtype -7.6181 (2) ) +( p.lisp_apml_tgtype 0.0000 (2) ) +( lisp_apml_tgtype 20.7778 (2) ) +( n.lisp_apml_tgtype 0.0000 (2) ) +( nn.lisp_apml_tgtype 0.0000 (2) ) +( pp.lisp_apml_tgtype 0.0000 (3) ) +( p.lisp_apml_tgtype 0.0000 (3) ) +( lisp_apml_tgtype 13.5550 (3) ) +( n.lisp_apml_tgtype -6.2170 (3) ) +( nn.lisp_apml_tgtype 0.0000 (3) ) +( pp.lisp_apml_iecount 0.0000 ) +( p.lisp_apml_iecount -0.8428 ) +( lisp_apml_iecount -1.0894 ) +( n.lisp_apml_iecount 0.0000 ) +( nn.lisp_apml_iecount -1.4355 ) +( pp.tobi_accent 1.5807 (H*) ) +( p.tobi_accent 0.0000 (H*) ) +( tobi_accent 7.3057 (H*) ) +( n.tobi_accent 0.0000 (H*) ) +( nn.tobi_accent 0.0000 (H*) ) +( pp.tobi_accent 0.0000 (L*) ) +( p.tobi_accent 0.0000 (L*) ) +( tobi_accent -20.0000 (L*) ) +( n.tobi_accent 0.0000 (L*) ) +( nn.tobi_accent 0.0000 (L*) ) +( pp.tobi_accent 0.0000 (L*+H) ) +( p.tobi_accent 5.0000 (L*+H) ) +( tobi_accent 12.0000 (L*+H) ) +( n.tobi_accent -2.8850 (L*+H) ) +( nn.tobi_accent -1.0910 (L*+H) ) +( pp.tobi_accent 1.5800 (L+H*) ) +( p.tobi_accent 0.0000 (L+H*) ) +( tobi_accent 7.3100 (L+H*) ) +( n.tobi_accent 0.0000 (L+H*) ) +( nn.tobi_accent 0.0000 (L+H*) ) +( pp.tobi_endtone 7.8833 (L-L%) ) +( p.tobi_endtone 0.0000 (L-L%) ) +( tobi_endtone -80 (L-L%) ) +( n.tobi_endtone -35 (L-L%) ) +( nn.tobi_endtone -6.4769 (L-L%) ) +( pp.tobi_endtone 7.1936 (L-H%) ) +( p.tobi_endtone 0.0000 (L-H%) ) +( tobi_endtone 40.0000 (L-H%) ) +( n.tobi_endtone -13.7990 (L-H%) ) +( nn.tobi_endtone -5.6305 (L-H%) ) +( pp.tobi_endtone 4.8654 (H-L%) ) +( p.tobi_endtone 0.0000 (H-L%) ) +( tobi_endtone -15.1561 (H-L%) ) +( n.tobi_endtone 0.0000 (H-L%) ) +( nn.tobi_endtone -9.8107 (H-L%) ) +( pp.tobi_endtone 0.0000 (H-H%) ) +( p.tobi_endtone 0.0000 (H-H%) ) +( tobi_endtone 0.0000 (H-H%) ) +( n.tobi_endtone 0.0000 (H-H%) ) +( nn.tobi_endtone 0.0000 (H-H%) ) +( pp.tobi_endtone 0.0000 (L-) ) +( p.tobi_endtone -15.1705 (L-) ) +( tobi_endtone -43.6801 (L-) ) +( n.tobi_endtone 0.0000 (L-) ) +( nn.tobi_endtone -0.8085 (L-) ) +( pp.tobi_endtone 0.0000 (H-) ) +( p.tobi_endtone -7.0967 (H-) ) +( tobi_endtone 0.0000 (H-) ) +( n.tobi_endtone 0.0000 (H-) ) +( nn.tobi_endtone 0.0000 (H-) ) +( p.tobi_accent 0.0000 (L+H*L-H%) ) +( tobi_accent 60.0000 (L+H*L-H%) ) +( n.tobi_accent -60.0000 (L+H*L-H%) ) +( pp.syl_break 0.0000 ) +( p.syl_break -0.3765 ) +( syl_break -3.1072 ) +( n.syl_break 0.0000 ) +( nn.syl_break 0.6338 ) +( pp.stress 0.0000 ) +( p.stress -1.6093 ) +( stress -0.7136 ) +( n.stress 5.9479 ) +( nn.stress 1.8619 ) +( syl_in -2.1619 ) +( syl_out 0.9481 ) +( ssyl_in 2.0175 ) +( ssyl_out 0.0000 ) +( asyl_in -0.2919 ) +( asyl_out 0.0000 ) +( last_accent -2.1984 ) +( next_accent 0.0000 ) +( sub_phrases 0.0000 ) +( lisp_l_spread -60.0000 ) +)) + diff --git a/lib/apml_kaldurtreeZ.scm b/lib/apml_kaldurtreeZ.scm new file mode 100644 index 0000000..5a3d44e --- /dev/null +++ b/lib/apml_kaldurtreeZ.scm @@ -0,0 +1,996 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; A tree to predict zcore durations build from f2b +;;; doesn't use actual phonemes so it can have better generalizations +;;; +;;; Basically copied from ked +;;; + +(set! apml_kal_durs +'( + (uh 0.067 0.025) + (hh 0.061 0.028) + (ao 0.138 0.046) + (hv 0.053 0.020) + (v 0.051 0.019) + (ih 0.058 0.023) + (el 0.111 0.043) + (ey 0.132 0.042) + (em 0.080 0.033) + (jh 0.094 0.024) + (w 0.054 0.023) + (uw 0.107 0.044) + (ae 0.120 0.036) + (en 0.117 0.056) + (k 0.089 0.034) + (y 0.048 0.025) + (axr 0.147 0.035) +; (l 0.056 0.026) + (l 0.066 0.026) + (ng 0.064 0.024) + (zh 0.071 0.030) + (z 0.079 0.034) + (brth 0.246 0.046) + (m 0.069 0.028) + (iy 0.097 0.041) + (n 0.059 0.025) + (ah 0.087 0.031) + (er 0.086 0.010) + (b 0.069 0.024) + (pau 0.200 0.1) + (aw 0.166 0.053) + (p 0.088 0.030) + (ch 0.115 0.025) + (ow 0.134 0.039) + (dh 0.031 0.016) + (nx 0.049 0.100) + (d 0.048 0.021) + (ax 0.046 0.024) + (h# 0.060 0.083) + (r 0.053 0.031) + (eh 0.095 0.036) + (ay 0.137 0.047) + (oy 0.183 0.050) + (f 0.095 0.033) + (sh 0.108 0.031) + (s 0.102 0.037) + (g 0.064 0.021) + (dx 0.031 0.016) + (th 0.093 0.050) + (aa 0.094 0.037) + (t 0.070 0.020) +) +) + +(set! apml_kal_duration_cart_tree +' +((name is pau) + ((emph_sil is +) + ((0.0 -0.5)) + ((p.R:SylStructure.parent.parent.lisp_apml_pause = 0.2) + ((0.0 0.0)) + ((p.R:SylStructure.parent.parent.lisp_apml_pause = 0.4) + ((0.0 2.0)) + ((p.R:SylStructure.parent.parent.lisp_apml_pause = 0.6) + ((0.0 4.0)) + ((p.R:SylStructure.parent.parent.lisp_apml_pause = 0.8) + ((0.0 6.0)) + ((p.R:SylStructure.parent.parent.lisp_apml_pause = 1.0) + ((0.0 8.0)) + ((p.R:SylStructure.parent.parent.lisp_apml_pause = 1.5) + ((0.0 13.0)) + ((p.R:SylStructure.parent.parent.lisp_apml_pause = 2.0) + ((0.0 18.0)) + ((p.R:SylStructure.parent.parent.lisp_apml_pause = 2.5) + ((0.0 23.0)) + ((p.R:SylStructure.parent.parent.lisp_apml_pause = 3.0) + ((0.0 28.0)) + ((p.R:SylStructure.parent.parent.pbreak is BB) + ((0.0 2.0)) + ((0.0 0.0))))))))))))) + ((R:SylStructure.parent.accented is 0) + ((n.ph_ctype is 0) + ((p.ph_vlng is 0) + ((R:SylStructure.parent.syl_codasize < 1.5) + ((p.ph_ctype is n) + ((ph_ctype is f) + ((0.559208 -0.783163)) + ((1.05215 -0.222704))) + ((ph_ctype is s) + ((R:SylStructure.parent.syl_break is 2) + ((0.589948 0.764459)) + ((R:SylStructure.parent.asyl_in < 0.7) + ((1.06385 0.567944)) + ((0.691943 0.0530272)))) + ((ph_vlng is l) + ((pp.ph_vfront is 1) + ((1.06991 0.766486)) + ((R:SylStructure.parent.syl_break is 1) + ((0.69665 0.279248)) + ((0.670353 0.0567774)))) + ((p.ph_ctype is s) + ((seg_onsetcoda is coda) + ((0.828638 -0.038356)) + ((ph_ctype is f) + ((0.7631 -0.545853)) + ((0.49329 -0.765994)))) + ((R:SylStructure.parent.parent.gpos is det) + ((R:SylStructure.parent.last_accent < 0.3) + ((R:SylStructure.parent.sub_phrases < 1) + ((0.811686 0.160195)) + ((0.799015 0.713958))) + ((0.731599 -0.215472))) + ((ph_ctype is r) + ((0.673487 0.092772)) + ((R:SylStructure.parent.asyl_in < 1) + ((0.745273 0.00132813)) + ((0.75457 -0.334898))))))))) + ((pos_in_syl < 0.5) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 2) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((0.902446 -0.041618)) + ((R:SylStructure.parent.sub_phrases < 2.3) + ((0.900629 0.262952)) + ((1.18474 0.594794)))) + ((seg_onset_stop is 0) + ((R:SylStructure.parent.position_type is mid) + ((0.512323 -0.760444)) + ((R:SylStructure.parent.syl_out < 6.8) + ((pp.ph_vlng is a) + ((0.640575 -0.450449)) + ((ph_ctype is f) + ((R:SylStructure.parent.sub_phrases < 1.3) + ((0.862876 -0.296956)) + ((R:SylStructure.parent.syl_out < 2.4) + ((0.803215 0.0422868)) + ((0.877856 -0.154465)))) + ((R:SylStructure.parent.syl_out < 3.6) + ((R:SylStructure.parent.syl_out < 1.2) + ((0.567081 -0.264199)) + ((0.598043 -0.541738))) + ((0.676843 -0.166623))))) + ((0.691678 -0.57173)))) + ((R:SylStructure.parent.parent.gpos is cc) + ((1.15995 0.313289)) + ((pp.ph_vfront is 1) + ((0.555993 0.0695819)) + ((R:SylStructure.parent.asyl_in < 1.2) + ((R:SylStructure.parent.sub_phrases < 2.7) + ((0.721635 -0.367088)) + ((0.71919 -0.194887))) + ((0.547052 -0.0637491))))))) + ((ph_ctype is s) + ((R:SylStructure.parent.syl_break is 0) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 1) + ((0.650007 -0.333421)) + ((0.846301 -0.165383))) + ((0.527756 -0.516332))) + ((R:SylStructure.parent.syl_break is 0) + ((p.ph_ctype is s) + ((0.504414 -0.779112)) + ((0.812498 -0.337611))) + ((pos_in_syl < 1.4) + ((0.513041 -0.745807)) + ((p.ph_ctype is s) + ((0.350582 -1.04907)) + ((0.362 -0.914974)))))))) + ((R:SylStructure.parent.syl_break is 0) + ((ph_ctype is n) + ((R:SylStructure.parent.position_type is initial) + ((pos_in_syl < 1.2) + ((0.580485 0.172658)) + ((0.630973 -0.101423))) + ((0.577937 -0.360092))) + ((R:SylStructure.parent.syl_out < 2.9) + ((R:SylStructure.parent.syl_out < 1.1) + ((R:SylStructure.parent.position_type is initial) + ((0.896092 0.764189)) + ((R:SylStructure.parent.sub_phrases < 3.6) + ((ph_ctype is s) + ((0.877362 0.555132)) + ((0.604511 0.369882))) + ((0.799982 0.666966)))) + ((seg_onsetcoda is coda) + ((p.ph_vlng is a) + ((R:SylStructure.parent.last_accent < 0.4) + ((0.800736 0.240634)) + ((0.720606 0.486176))) + ((1.18173 0.573811))) + ((0.607147 0.194468)))) + ((ph_ctype is r) + ((0.88377 0.499383)) + ((R:SylStructure.parent.last_accent < 0.5) + ((R:SylStructure.parent.position_type is initial) + ((R:SylStructure.parent.parent.word_numsyls < 2.4) + ((0.62798 0.0737318)) + ((0.787334 0.331014))) + ((ph_ctype is s) + ((0.808368 0.0929299)) + ((0.527948 -0.0443271)))) + ((seg_coda_fric is 0) + ((p.ph_vlng is a) + ((0.679745 0.517681)) + ((R:SylStructure.parent.sub_phrases < 1.1) + ((0.759979 0.128316)) + ((0.775233 0.361383)))) + ((R:SylStructure.parent.last_accent < 1.3) + ((0.696255 0.054136)) + ((0.632425 0.246742)))))))) + ((pos_in_syl < 0.3) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 2) + ((0.847602 0.621547)) + ((ph_ctype is s) + ((0.880645 0.501679)) + ((R:SylStructure.parent.sub_phrases < 3.3) + ((R:SylStructure.parent.sub_phrases < 0.3) + ((0.901014 -0.042049)) + ((0.657493 0.183226))) + ((0.680126 0.284799))))) + ((ph_ctype is s) + ((p.ph_vlng is s) + ((0.670033 -0.820934)) + ((0.863306 -0.348735))) + ((ph_ctype is n) + ((R:SylStructure.parent.asyl_in < 1.2) + ((0.656966 -0.40092)) + ((0.530966 -0.639366))) + ((seg_coda_fric is 0) + ((1.04153 0.364857)) + ((pos_in_syl < 1.2) + ((R:SylStructure.parent.syl_out < 3.4) + ((0.81503 -0.00768613)) + ((0.602665 -0.197753))) + ((0.601844 -0.394632))))))))) + ((n.ph_ctype is f) + ((pos_in_syl < 1.5) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 2) + ((pos_in_syl < 0.1) + ((1.63863 0.938841)) + ((R:SylStructure.parent.position_type is initial) + ((0.897722 -0.0796637)) + ((nn.ph_vheight is 0) + ((0.781081 0.480026)) + ((0.779711 0.127175))))) + ((ph_ctype is r) + ((p.ph_ctype is s) + ((0.581329 -0.708767)) + ((0.564366 -0.236212))) + ((ph_vlng is a) + ((p.ph_ctype is r) + ((0.70992 -0.273389)) + ((R:SylStructure.parent.parent.gpos is in) + ((0.764696 0.0581338)) + ((nn.ph_vheight is 0) + ((0.977737 0.721904)) + ((R:SylStructure.parent.sub_phrases < 2.2) + ((pp.ph_vfront is 0) + ((0.586708 0.0161206)) + ((0.619949 0.227372))) + ((0.707285 0.445569)))))) + ((ph_ctype is n) + ((R:SylStructure.parent.syl_break is 1) + ((nn.ph_vfront is 2) + ((0.430295 -0.120097)) + ((0.741371 0.219042))) + ((0.587492 0.321245))) + ((p.ph_ctype is n) + ((0.871586 0.134075)) + ((p.ph_ctype is r) + ((0.490751 -0.466418)) + ((R:SylStructure.parent.syl_codasize < 1.3) + ((R:SylStructure.parent.sub_phrases < 2.2) + ((p.ph_ctype is s) + ((0.407452 -0.425925)) + ((0.644771 -0.542809))) + ((0.688772 -0.201899))) + ((ph_vheight is 1) + ((nn.ph_vheight is 0) + ((0.692018 0.209018)) + ((0.751345 -0.178136))) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.3) + ((R:SylStructure.parent.asyl_in < 1.5) + ((0.599633 -0.235593)) + ((0.60042 0.126118))) + ((p.ph_vlng is a) + ((0.7148 -0.174812)) + ((R:SylStructure.parent.parent.gpos is content) + ((0.761296 -0.231509)) + ((0.813081 -0.536405))))))))))))) + ((ph_ctype is n) + ((0.898844 0.163343)) + ((p.ph_vlng is s) + ((seg_coda_fric is 0) + ((0.752921 -0.45528)) + ((0.890079 -0.0998025))) + ((ph_ctype is f) + ((0.729376 -0.930547)) + ((ph_ctype is s) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 0) + ((0.745052 -0.634119)) + ((0.521502 -0.760176))) + ((R:SylStructure.parent.syl_break is 1) + ((0.766575 -0.121355)) + ((0.795616 -0.557509)))))))) + ((p.ph_vlng is 0) + ((p.ph_ctype is r) + ((ph_vlng is 0) + ((0.733659 -0.402734)) + ((R:SylStructure.parent.sub_phrases < 1.5) + ((ph_vlng is s) + ((0.326176 -0.988478)) + ((n.ph_ctype is s) + ((0.276471 -0.802536)) + ((0.438283 -0.900628)))) + ((nn.ph_vheight is 0) + ((ph_vheight is 2) + ((0.521 -0.768992)) + ((0.615436 -0.574918))) + ((ph_vheight is 1) + ((0.387376 -0.756359)) + ((pos_in_syl < 0.3) + ((0.417235 -0.808937)) + ((0.384043 -0.93315))))))) + ((ph_vlng is a) + ((ph_ctype is 0) + ((n.ph_ctype is s) + ((p.ph_ctype is f) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((0.415908 -0.428493)) + ((pos_in_syl < 0.1) + ((0.790441 0.0211071)) + ((0.452465 -0.254485)))) + ((p.ph_ctype is s) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((0.582447 -0.389966)) + ((0.757648 0.185781))) + ((R:SylStructure.parent.sub_phrases < 1.4) + ((0.628965 0.422551)) + ((0.713613 0.145576))))) + ((seg_onset_stop is 0) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 0) + ((pp.ph_vfront is 1) + ((0.412363 -0.62319)) + ((R:SylStructure.parent.syl_out < 3.6) + ((0.729259 -0.317324)) + ((0.441633 -0.591051)))) + ((R:SylStructure.parent.syl_break is 1) + ((R:SylStructure.parent.sub_phrases < 2.7) + ((0.457728 -0.405607)) + ((0.532411 -0.313148))) + ((R:SylStructure.parent.last_accent < 0.3) + ((1.14175 0.159416)) + ((0.616396 -0.254651))))) + ((R:SylStructure.parent.position_type is initial) + ((0.264181 -0.799896)) + ((0.439801 -0.551309))))) + ((R:SylStructure.parent.position_type is final) + ((0.552027 -0.707084)) + ((0.585661 -0.901874)))) + ((ph_ctype is s) + ((pos_in_syl < 1.2) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((pp.ph_vfront is 1) + ((0.607449 0.196466)) + ((0.599662 0.00382414))) + ((0.64109 -0.12859))) + ((pp.ph_vfront is 1) + ((0.720484 -0.219339)) + ((0.688707 -0.516734)))) + ((ph_vlng is s) + ((n.ph_ctype is s) + ((R:SylStructure.parent.parent.gpos is content) + ((R:SylStructure.parent.position_type is single) + ((0.659206 0.159445)) + ((R:SylStructure.parent.parent.word_numsyls < 3.5) + ((R:SylStructure.parent.sub_phrases < 2) + ((0.447186 -0.419103)) + ((0.631822 -0.0928561))) + ((0.451623 -0.576116)))) + ((ph_vheight is 3) + ((0.578626 -0.64583)) + ((0.56636 -0.4665)))) + ((R:SylStructure.parent.parent.gpos is in) + ((0.771516 -0.217292)) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 2) + ((0.688571 -0.304382)) + ((R:SylStructure.parent.parent.gpos is content) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 1) + ((n.ph_ctype is n) + ((0.556085 -0.572203)) + ((0.820173 -0.240338))) + ((R:SylStructure.parent.parent.word_numsyls < 2.2) + ((0.595398 -0.588171)) + ((0.524737 -0.95797)))) + ((R:SylStructure.parent.sub_phrases < 3.9) + ((0.371492 -0.959427)) + ((0.440479 -0.845747))))))) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 0) + ((p.ph_ctype is f) + ((0.524088 -0.482247)) + ((nn.ph_vheight is 1) + ((0.587666 -0.632362)) + ((ph_vlng is l) + ((R:SylStructure.parent.position_type is final) + ((0.513286 -0.713117)) + ((0.604613 -0.924308))) + ((R:SylStructure.parent.syl_codasize < 2.2) + ((0.577997 -0.891342)) + ((0.659804 -1.15252)))))) + ((pp.ph_vlng is s) + ((ph_ctype is f) + ((0.813383 -0.599624)) + ((0.984027 -0.0771909))) + ((p.ph_ctype is f) + ((R:SylStructure.parent.parent.gpos is in) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 1) + ((0.313572 -1.03242)) + ((0.525854 -0.542799))) + ((R:SylStructure.parent.syl_out < 2.8) + ((0.613007 -0.423979)) + ((0.570258 -0.766379)))) + ((R:SylStructure.parent.syl_break is 1) + ((R:SylStructure.parent.parent.gpos is to) + ((0.364585 -0.792895)) + ((ph_vlng is l) + ((0.69143 -0.276816)) + ((0.65673 -0.523721)))) + ((R:SylStructure.parent.syl_out < 3.6) + ((R:SylStructure.parent.position_type is initial) + ((0.682096 -0.488102)) + ((0.406364 -0.731758))) + ((0.584694 -0.822229))))))))))) + ((n.ph_ctype is r) + ((R:SylStructure.parent.position_type is initial) + ((p.ph_vlng is a) + ((0.797058 1.02334)) + ((ph_ctype is s) + ((1.0548 0.536277)) + ((0.817253 0.138201)))) + ((R:SylStructure.parent.sub_phrases < 1.1) + ((R:SylStructure.parent.syl_out < 3.3) + ((0.884574 -0.23471)) + ((0.772063 -0.525292))) + ((nn.ph_vfront is 1) + ((1.25254 0.417485)) + ((0.955557 -0.0781996))))) + ((pp.ph_vfront is 0) + ((ph_ctype is f) + ((n.ph_ctype is s) + ((R:SylStructure.parent.parent.gpos is content) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 0) + ((0.583506 -0.56941)) + ((0.525949 -0.289362))) + ((0.749316 -0.0921038))) + ((p.ph_vlng is s) + ((0.734234 0.139463)) + ((0.680119 -0.0708717)))) + ((ph_vlng is s) + ((ph_vheight is 1) + ((0.908712 -0.618971)) + ((0.55344 -0.840495))) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 1.2) + ((pos_in_syl < 1.2) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 2) + ((0.838715 0.00913392)) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 1) + ((ph_vheight is 2) + ((0.555513 -0.512523)) + ((R:SylStructure.parent.position_type is initial) + ((0.758711 0.121704)) + ((0.737555 -0.25637)))) + ((R:SylStructure.parent.syl_out < 3.1) + ((n.ph_ctype is s) + ((0.611756 -0.474522)) + ((1.05437 -0.247206))) + ((R:SylStructure.parent.syl_codasize < 2.2) + ((R:SylStructure.parent.position_type is final) + ((0.567761 -0.597866)) + ((0.785599 -0.407765))) + ((0.575598 -0.741256)))))) + ((ph_ctype is s) + ((n.ph_ctype is s) + ((0.661069 -1.08426)) + ((0.783184 -0.39789))) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 1) + ((R:SylStructure.parent.sub_phrases < 2.6) + ((0.511323 -0.666011)) + ((0.691878 -0.499492))) + ((ph_ctype is r) + ((0.482131 -0.253186)) + ((0.852955 -0.372832)))))) + ((0.854447 -0.0936489))))) + ((R:SylStructure.parent.position_type is final) + ((0.685939 -0.249982)) + ((R:SylStructure.parent.syl_out < 3.2) + ((0.989843 0.18086)) + ((0.686805 -0.0402908))))))))) + ((R:SylStructure.parent.syl_out < 2.4) + ((R:SylStructure.parent.syl_out < 0.2) + ((seg_onsetcoda is coda) + ((ph_ctype is s) + ((R:SylStructure.parent.syl_break is 4) + ((pp.ph_vlng is 0) + ((0.959737 1.63203)) + ((1.20714 0.994933))) + ((n.ph_ctype is 0) + ((R:SylStructure.parent.syl_break is 2) + ((0.864809 0.214457)) + ((0.874278 0.730381))) + ((pp.ph_vfront is 0) + ((seg_coda_fric is 0) + ((1.20844 -0.336221)) + ((1.01357 0.468302))) + ((0.658106 -0.799121))))) + ((n.ph_ctype is f) + ((ph_ctype is f) + ((1.26332 0.0300613)) + ((ph_vlng is d) + ((1.02719 1.1649)) + ((ph_ctype is 0) + ((R:SylStructure.parent.asyl_in < 1.2) + ((1.14048 2.2668)) + ((ph_vheight is 1) + ((1.15528 1.50375)) + ((1.42406 2.07927)))) + ((R:SylStructure.parent.sub_phrases < 1.1) + ((0.955892 1.10243)) + ((R:SylStructure.parent.syl_break is 2) + ((1.32682 1.8432)) + ((1.27582 1.59853))))))) + ((n.ph_ctype is 0) + ((ph_ctype is n) + ((R:SylStructure.parent.syl_break is 2) + ((1.45399 1.12927)) + ((1.05543 0.442376))) + ((R:SylStructure.parent.syl_break is 4) + ((R:SylStructure.parent.position_type is final) + ((ph_ctype is f) + ((1.46434 1.76508)) + ((0.978055 0.7486))) + ((1.2395 2.30826))) + ((ph_ctype is 0) + ((0.935325 1.69917)) + ((nn.ph_vfront is 1) + ((1.20456 1.31128)) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((nn.ph_vheight is 0) + ((1.16907 0.212421)) + ((0.952091 0.653094))) + ((p.ph_ctype is 0) + ((1.05502 1.25802)) + ((0.818731 0.777568)))))))) + ((ph_ctype is f) + ((p.ph_ctype is 0) + ((1.03918 0.163941)) + ((0.737545 -0.167063))) + ((R:SylStructure.parent.position_type is final) + ((n.ph_ctype is n) + ((R:SylStructure.parent.last_accent < 0.5) + ((R:SylStructure.parent.sub_phrases < 2.8) + ((0.826207 -0.000859005)) + ((0.871119 0.273433))) + ((R:SylStructure.parent.parent.word_numsyls < 2.4) + ((1.17405 1.05694)) + ((0.858394 0.244916)))) + ((R:SylStructure.parent.syl_codasize < 2.2) + ((p.ph_ctype is 0) + ((1.14092 1.21187)) + ((R:SylStructure.parent.syl_break is 2) + ((1.02653 0.59865)) + ((0.94248 1.1634)))) + ((seg_coda_fric is 0) + ((1.07441 0.292935)) + ((1.15736 0.92574))))) + ((ph_vlng is s) + ((R:SylStructure.parent.syl_break is 2) + ((1.34638 1.23484)) + ((0.951514 2.02008))) + ((ph_ctype is 0) + ((p.ph_ctype is r) + ((0.806106 0.697089)) + ((R:SylStructure.parent.syl_break is 2) + ((1.10891 0.992197)) + ((1.04657 1.51093)))) + ((1.18165 0.520952))))))))) + ((p.ph_vlng is 0) + ((pos_in_syl < 0.7) + ((R:SylStructure.parent.position_type is final) + ((ph_ctype is r) + ((0.966357 0.185827)) + ((ph_ctype is s) + ((0.647163 0.0332298)) + ((0.692972 -0.534917)))) + ((ph_ctype is s) + ((0.881521 0.575107)) + ((p.ph_ctype is f) + ((0.8223 -0.111275)) + ((R:SylStructure.parent.last_accent < 0.3) + ((0.969188 0.09447)) + ((0.894438 0.381947)))))) + ((p.ph_ctype is f) + ((0.479748 -0.490108)) + ((0.813125 -0.201268)))) + ((ph_ctype is s) + ((0.908566 1.20397)) + ((R:SylStructure.parent.last_accent < 1.2) + ((0.88078 0.636568)) + ((0.978087 1.07763)))))) + ((pos_in_syl < 1.3) + ((R:SylStructure.parent.syl_break is 0) + ((pos_in_syl < 0.1) + ((R:SylStructure.parent.position_type is initial) + ((p.ph_ctype is n) + ((0.801651 -0.0163359)) + ((ph_ctype is s) + ((n.ph_ctype is r) + ((0.893307 1.07253)) + ((p.ph_vlng is 0) + ((0.92651 0.525806)) + ((0.652444 0.952792)))) + ((p.ph_vlng is 0) + ((seg_onsetcoda is coda) + ((0.820151 0.469117)) + ((p.ph_ctype is f) + ((0.747972 -0.0716448)) + ((ph_ctype is f) + ((0.770882 0.457137)) + ((0.840905 0.102492))))) + ((R:SylStructure.parent.syl_out < 1.1) + ((0.667824 0.697337)) + ((0.737967 0.375114)))))) + ((ph_vheight is 1) + ((0.624353 0.410671)) + ((R:SylStructure.parent.asyl_in < 0.8) + ((0.647905 -0.331055)) + ((p.ph_ctype is s) + ((0.629039 -0.240616)) + ((0.749277 -0.0191273)))))) + ((ph_vheight is 3) + ((p.ph_ctype is s) + ((0.626922 0.556537)) + ((0.789357 0.153892))) + ((seg_onsetcoda is coda) + ((n.ph_ctype is 0) + ((R:SylStructure.parent.parent.word_numsyls < 3.4) + ((0.744714 0.123242)) + ((0.742039 0.295753))) + ((seg_coda_fric is 0) + ((R:SylStructure.parent.parent.word_numsyls < 2.4) + ((ph_vheight is 1) + ((0.549715 -0.341018)) + ((0.573641 -0.00893114))) + ((nn.ph_vfront is 2) + ((0.67099 -0.744625)) + ((0.664438 -0.302803)))) + ((p.ph_vlng is 0) + ((0.630028 0.113815)) + ((0.632794 -0.128733))))) + ((ph_ctype is r) + ((0.367169 -0.854509)) + ((0.94334 -0.216179)))))) + ((n.ph_ctype is f) + ((ph_vlng is 0) + ((1.3089 0.46195)) + ((R:SylStructure.parent.syl_codasize < 1.3) + ((1.07673 0.657169)) + ((pp.ph_vlng is 0) + ((0.972319 1.08222)) + ((1.00038 1.46257))))) + ((p.ph_vlng is l) + ((1.03617 0.785204)) + ((p.ph_vlng is a) + ((R:SylStructure.parent.position_type is final) + ((1.00681 0.321168)) + ((0.928115 0.950834))) + ((ph_vlng is 0) + ((pos_in_syl < 0.1) + ((R:SylStructure.parent.position_type is final) + ((0.863682 -0.167374)) + ((nn.ph_vheight is 0) + ((p.ph_ctype is f) + ((0.773591 -0.00374425)) + ((R:SylStructure.parent.syl_out < 1.1) + ((0.951802 0.228448)) + ((1.02282 0.504252)))) + ((1.09721 0.736476)))) + ((R:SylStructure.parent.position_type is final) + ((1.04302 0.0590974)) + ((0.589208 -0.431535)))) + ((n.ph_ctype is 0) + ((1.27879 1.00642)) + ((ph_vlng is s) + ((R:SylStructure.parent.asyl_in < 1.4) + ((0.935787 0.481652)) + ((0.9887 0.749861))) + ((R:SylStructure.parent.syl_out < 1.1) + ((R:SylStructure.parent.position_type is final) + ((0.921307 0.0696307)) + ((0.83675 0.552212))) + ((0.810076 -0.0479225)))))))))) + ((ph_ctype is s) + ((n.ph_ctype is s) + ((0.706959 -1.0609)) + ((p.ph_ctype is n) + ((0.850614 -0.59933)) + ((n.ph_ctype is r) + ((0.665947 0.00698725)) + ((n.ph_ctype is 0) + ((R:SylStructure.parent.position_type is initial) + ((0.762889 -0.0649044)) + ((0.723956 -0.248899))) + ((R:SylStructure.parent.sub_phrases < 1.4) + ((0.632957 -0.601987)) + ((0.889114 -0.302401))))))) + ((ph_ctype is f) + ((R:SylStructure.parent.syl_codasize < 2.2) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((R:SylStructure.parent.syl_out < 1.1) + ((0.865267 0.164636)) + ((0.581827 -0.0989051))) + ((nn.ph_vfront is 2) + ((0.684459 -0.316836)) + ((0.778854 -0.0961191)))) + ((R:SylStructure.parent.syl_out < 1.1) + ((p.ph_ctype is s) + ((0.837964 -0.429437)) + ((0.875304 -0.0652743))) + ((0.611071 -0.635089)))) + ((p.ph_ctype is r) + ((R:SylStructure.parent.syl_out < 1.1) + ((0.762012 0.0139361)) + ((0.567983 -0.454845))) + ((R:SylStructure.parent.syl_codasize < 2.2) + ((ph_ctype is l) + ((1.18845 0.809091)) + ((R:SylStructure.parent.position_type is initial) + ((ph_ctype is n) + ((0.773548 -0.277092)) + ((1.01586 0.281001))) + ((p.ph_ctype is 0) + ((1.06831 0.699145)) + ((0.924189 0.241873))))) + ((R:SylStructure.parent.syl_break is 0) + ((ph_ctype is n) + ((0.592321 -0.470784)) + ((0.778688 -0.072112))) + ((n.ph_ctype is s) + ((1.08848 0.0733489)) + ((1.25674 0.608371)))))))))) + ((pos_in_syl < 0.7) + ((p.ph_vlng is 0) + ((R:SylStructure.parent.position_type is mid) + ((ph_ctype is 0) + ((ph_vheight is 2) + ((0.456225 -0.293282)) + ((0.561529 -0.0816115))) + ((0.6537 -0.504024))) + ((ph_ctype is s) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 2) + ((1.31586 0.98395)) + ((R:SylStructure.parent.position_type is single) + ((0.816869 0.634789)) + ((R:SylStructure.parent.syl_out < 4.4) + ((1.05578 0.479029)) + ((R:SylStructure.parent.asyl_in < 0.4) + ((1.11813 0.143214)) + ((0.87178 0.406834)))))) + ((n.ph_ctype is n) + ((R:SylStructure.parent.last_accent < 0.6) + ((0.838154 -0.415599)) + ((0.924024 0.110288))) + ((seg_onsetcoda is coda) + ((nn.ph_vfront is 2) + ((0.670096 0.0314187)) + ((n.ph_ctype is f) + ((1.00363 0.693893)) + ((R:SylStructure.parent.syl_out < 6) + ((0.772363 0.215675)) + ((0.920313 0.574068))))) + ((R:SylStructure.parent.position_type is final) + ((0.673837 -0.458142)) + ((R:SylStructure.parent.sub_phrases < 2.8) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 2) + ((0.894817 0.304628)) + ((ph_ctype is n) + ((0.787302 -0.23094)) + ((R:SylStructure.parent.asyl_in < 1.2) + ((ph_ctype is f) + ((R:SylStructure.parent.last_accent < 0.5) + ((1.12278 0.326954)) + ((0.802236 -0.100616))) + ((0.791255 -0.0919132))) + ((0.95233 0.219053))))) + ((R:SylStructure.parent.position_type is initial) + ((ph_ctype is f) + ((1.0616 0.216118)) + ((0.703216 -0.00834086))) + ((ph_ctype is f) + ((1.22277 0.761763)) + ((0.904811 0.332721)))))))))) + ((ph_vheight is 0) + ((p.ph_vlng is s) + ((0.873379 0.217178)) + ((n.ph_ctype is r) + ((0.723915 1.29451)) + ((n.ph_ctype is 0) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 1) + ((R:SylStructure.parent.sub_phrases < 4) + ((seg_coda_fric is 0) + ((p.ph_vlng is l) + ((0.849154 0.945261)) + ((0.633261 0.687498))) + ((0.728546 0.403076))) + ((0.850962 1.00255))) + ((0.957999 1.09113))) + ((0.85771 0.209045))))) + ((ph_vheight is 2) + ((0.803401 -0.0544067)) + ((0.681353 0.256045))))) + ((n.ph_ctype is f) + ((ph_ctype is s) + ((p.ph_vlng is 0) + ((0.479307 -0.9673)) + ((0.700477 -0.351397))) + ((ph_ctype is f) + ((0.73467 -0.6233)) + ((R:SylStructure.parent.syl_break is 0) + ((p.ph_ctype is s) + ((0.56282 0.266234)) + ((p.ph_ctype is r) + ((0.446203 -0.302281)) + ((R:SylStructure.parent.sub_phrases < 2.7) + ((ph_ctype is 0) + ((0.572016 -0.0102436)) + ((0.497358 -0.274514))) + ((0.545477 0.0482177))))) + ((ph_vlng is s) + ((0.805269 0.888495)) + ((ph_ctype is n) + ((0.869854 0.653018)) + ((R:SylStructure.parent.sub_phrases < 2.2) + ((0.735031 0.0612886)) + ((0.771859 0.346637)))))))) + ((R:SylStructure.parent.syl_codasize < 1.4) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.3) + ((R:SylStructure.parent.position_type is initial) + ((0.743458 0.0411808)) + ((1.13068 0.613305))) + ((pos_in_syl < 1.2) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 1) + ((1.11481 0.175467)) + ((0.937893 -0.276407))) + ((0.74264 -0.550878)))) + ((pos_in_syl < 3.4) + ((seg_onsetcoda is coda) + ((ph_ctype is r) + ((n.ph_ctype is s) + ((0.714319 -0.240328)) + ((p.ph_ctype is 0) + ((0.976987 0.330352)) + ((1.1781 -0.0816682)))) + ((ph_ctype is l) + ((n.ph_ctype is 0) + ((1.39137 0.383533)) + ((0.725585 -0.324515))) + ((ph_vheight is 3) + ((ph_vlng is d) + ((0.802626 -0.62487)) + ((n.ph_ctype is r) + ((0.661091 -0.513869)) + ((R:SylStructure.parent.position_type is initial) + ((R:SylStructure.parent.parent.word_numsyls < 2.4) + ((0.482285 0.207874)) + ((0.401601 -0.0204711))) + ((0.733755 0.397372))))) + ((n.ph_ctype is r) + ((p.ph_ctype is 0) + ((pos_in_syl < 1.2) + ((0.666325 0.271734)) + ((nn.ph_vheight is 0) + ((0.642401 -0.261466)) + ((0.783684 -0.00956571)))) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((0.692225 -0.381895)) + ((0.741921 -0.0898767)))) + ((nn.ph_vfront is 2) + ((ph_ctype is s) + ((0.697527 -1.12626)) + ((n.ph_ctype is s) + ((ph_vlng is 0) + ((R:SylStructure.parent.sub_phrases < 2.4) + ((0.498719 -0.906926)) + ((0.635342 -0.625651))) + ((0.45886 -0.385089))) + ((0.848596 -0.359702)))) + ((p.ph_vlng is a) + ((p.ph_ctype is 0) + ((0.947278 0.216904)) + ((0.637933 -0.394349))) + ((p.ph_ctype is r) + ((R:SylStructure.parent.syl_break is 0) + ((0.529903 -0.860573)) + ((0.581378 -0.510488))) + ((ph_vlng is 0) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((seg_onset_stop is 0) + ((R:SylStructure.parent.syl_break is 0) + ((p.ph_vlng is d) + ((0.768363 0.0108428)) + ((ph_ctype is s) + ((0.835756 -0.035054)) + ((ph_ctype is f) + ((p.ph_vlng is s) + ((0.602016 -0.179727)) + ((0.640126 -0.297341))) + ((0.674628 -0.542602))))) + ((ph_ctype is s) + ((0.662261 -0.60496)) + ((0.662088 -0.432058)))) + ((R:SylStructure.parent.syl_out < 4.4) + ((0.582448 -0.389079)) + ((ph_ctype is s) + ((0.60413 -0.73564)) + ((0.567153 -0.605444))))) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 2) + ((0.761115 -0.827377)) + ((ph_ctype is n) + ((0.855183 -0.275338)) + ((R:SylStructure.parent.syl_break is 0) + ((0.788288 -0.802801)) + ((R:SylStructure.parent.syl_codasize < 2.2) + ((0.686134 -0.371234)) + ((0.840184 -0.772883))))))) + ((pos_in_syl < 1.2) + ((R:SylStructure.parent.syl_break is 0) + ((n.ph_ctype is n) + ((0.423592 -0.655006)) + ((R:SylStructure.parent.syl_out < 4.4) + ((0.595269 -0.303751)) + ((0.478433 -0.456882)))) + ((0.688133 -0.133182))) + ((seg_onset_stop is 0) + ((1.27464 0.114442)) + ((0.406837 -0.167545)))))))))))) + ((ph_ctype is r) + ((0.462874 -0.87695)) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((0.645442 -0.640572)) + ((0.673717 -0.321322))))) + ((0.61008 -0.925472)))))))) +;; RMSE 0.8085 Correlation is 0.5899 Mean (abs) Error 0.6024 (0.5393) + + +)) + +(provide 'apml_kaldurtreeZ) diff --git a/lib/cart_aux.scm b/lib/cart_aux.scm new file mode 100644 index 0000000..e927ffd --- /dev/null +++ b/lib/cart_aux.scm @@ -0,0 +1,183 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Some functions for manipulating decision trees +;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(define (cart_prune_tree_thresh tree threshold default) +"(prune_cart_tree_thresh TREE THRESHOLD DEFAULT) +Prune the classification tree TREE so that all tail nodes with +a prediction probabality less than THRESHOLD and changed to return +DEFAULT instead. This may be used when different mistakes have actually +different penalites hence some control of the defaults need to be +controlled." + (cond + ((cdr tree) ;; a question + (list + (car tree) + (cart_prune_tree_thresh (car (cdr tree)) threshold default) + (cart_prune_tree_thresh (car (cdr (cdr tree))) threshold default))) + ((< (cart_class_probability (car tree)) threshold) + (list (list (list threshold default) default))) + (t ;; leave asis + tree))) + +(define (cart_class_probability class) + "(cart_class_probability CLASS) +Returns the probability of the best class in the cart leaf node CLASS. +If CLASS simple has a value and now probabilities the probabilities +it assume to be 1.0." + (let ((val 0.0)) + (set! val (assoc (car (last class)) class)) + (if val + (car (cdr val)) + 1.0))) + +(define (cart_class_prune_merge tree) + "(cart_class_prune_merge tree) +Prune all sub trees which are pure. That is they all predict the +same class. This can happen when some other pruning technique +as modified a sub-tree now making it pure." + (let ((pure (cart_tree_pure tree))) + (cond + (pure pure) + ((cdr tree);; a question + (list + (car tree) + (cart_class_prune_merge (car (cdr tree))) + (cart_class_prune_merge (car (cdr (cdr tree)))))) + (t;; a leaf leave asis + tree)))) + +(define (cart_tree_pure tree) + "(cart_tree_pure tree) +Returns a probability density function if all nodes in this tree +predict the same class and nil otherwise" + (cond + ((cdr tree) + (let ((left (cart_tree_pure (car (cdr tree)))) + (right (cart_tree_pure (car (cdr (cdr tree)))))) + (cond + ((not left) nil) + ((not right) nil) + ((equal? (car (last left)) (car (last right))) + left) + (t + nil)))) + (t ;; its a leaf, so of couse its pure + tree))) + +(define (cart_simplify_tree tree map) + "(cart_simplify_tree TREE) +Simplify a CART tree by reducing probability density functions to +simple single clasifications (no probabilities). This removes valuable +information from the tree but makes them smaller easier to read by humans +and faster to read by machines. Also the classes may be mapped by the assoc +list in map. The bright ones amongst you will note this could be +better and merge 'is' operators into 'in' operators in some situations +especially if you are ignoring actual probability distributions." + (cond + ((cdr tree) + (list + (car tree) + (cart_simplify_tree (car (cdr tree)) map) + (cart_simplify_tree (car (cdr (cdr tree))) map))) + (t + (let ((class (car (last (car tree))))) + (if (assoc class map) + (list (cdr (assoc class map))) + (list (last (car tree)))))))) + +(define (cart_simplify_tree2 tree) + "(cart_simplify_tree2 TREE) +Simplify a CART tree by reducing probability density functions to +only non-zero probabilities." + (cond + ((cdr tree) + (list + (car tree) + (cart_simplify_tree2 (car (cdr tree))) + (cart_simplify_tree2 (car (cdr (cdr tree)))))) + (t + (list + (cart_remove_zero_probs (car tree)))))) + +(define (cart_remove_zero_probs pdf) + "(cart_remove_zero_probs pdf) +Removes zero probability classes in pdf, last in list +is best in class (as from cart leaf node)." + (cond + ((null (cdr pdf)) pdf) + ((equal? 0 (car (cdr (car pdf)))) + (cart_remove_zero_probs (cdr pdf))) + (t + (cons + (car pdf) + (cart_remove_zero_probs (cdr pdf)))))) + +(define (cart_interpret_debug i tree) + "(cart_interpret_debug i tree) +In comparing output between different implementations (flite vs festival) +This prints out the details as it interprets the tree." + (cond + ((cdr tree) ;; question + (format t "%s %s %s\n" (car (car tree)) (upcase (cadr (car tree))) + (car (cddr (car tree)))) + (set! a (item.feat i (car (car tree)))) + (format t "%s\n" a) + (cond + ((string-equal "is" (cadr (car tree))) + (if (string-equal a (car (cddr (car tree)))) + (begin + (format t " YES\n") + (cart_interpret_debug i (car (cdr tree)))) + (begin + (format t " NO\n") + (cart_interpret_debug i (car (cddr tree)))))) + ((string-equal "<" (cadr (car tree))) + (if (< (parse-number a) (parse-number (car (cddr (car tree))))) + (begin + (format t " YES\n") + (cart_interpret_debug i (car (cdr tree)))) + (begin + (format t " NO\n") + (cart_interpret_debug i (car (cddr tree)))))) + (t + (format t "unknown q type %l\n" (car tree))))) + (t ;; leaf + (car tree) + ))) + +(provide 'cart_aux) + diff --git a/lib/clunits.scm b/lib/clunits.scm new file mode 100644 index 0000000..9ad181a --- /dev/null +++ b/lib/clunits.scm @@ -0,0 +1,287 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Carnegie Mellon University and ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1998-2001 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH, CARNEGIE MELLON UNIVERSITY AND THE ;; +;;; CONTRIBUTORS TO THIS WORK DISCLAIM ALL WARRANTIES WITH REGARD TO ;; +;;; THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY ;; +;;; AND FITNESS, IN NO EVENT SHALL THE UNIVERSITY OF EDINBURGH, CARNEGIE ;; +;;; MELLON UNIVERSITY NOR THE CONTRIBUTORS BE LIABLE FOR ANY SPECIAL, ;; +;;; INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ;; +;;; RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION ;; +;;; OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF ;; +;;; OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Cluster Unit selection support (Black and Taylor Eurospeech '97) +;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Run-time support, selection and synthesis and some debugging functions +;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(require_module 'clunits) + +(defvar cluster_synth_pre_hooks nil) +(defvar cluster_synth_post_hooks nil) + +(defvar clunits_time time) ;; some old voices might use this + +(defSynthType Cluster + (apply_hooks cluster_synth_pre_hooks utt) + (Clunits_Select utt) + (Clunits_Get_Units utt) + (Clunits_Join_Units utt) + (apply_hooks cluster_synth_post_hooks utt) + utt +) + +(define (Clunits_Join_Units utt) + "(Clunits_Join_Units utt) +Join the preselected and gotten units into a waveform." + (let ((join_method (get_param 'join_method clunits_params 'simple))) + ;; Choice of function to put them together + (cond + ((string-equal join_method 'windowed) + (Clunits_Windowed_Wave utt) + (clunits::fix_segs_durs utt)) + ((string-equal join_method 'smoothedjoin) + (Clunits_SmoothedJoin_Wave utt) + (clunits::fix_segs_durs utt)) + ((string-equal join_method 'none) + t) + ((string-equal join_method 'modified_lpc) + (defvar UniSyn_module_hooks nil) + (Param.def "unisyn.window_name" "hanning") + (Param.def "unisyn.window_factor" 1.0) + (Parameter.def 'us_sigpr 'lpc) + (mapcar + (lambda (u s) + (item.set_feat s "source_end" (item.feat u "end"))) + (utt.relation.items utt 'Unit) + (utt.relation.items utt 'Segment)) + (us_unit_concat utt) + (if (not (member 'f0 (utt.relationnames utt))) + (targets_to_f0 utt)) + (if (utt.relation.last utt 'Segment) + (set! pm_end (+ (item.feat (utt.relation.last utt 'Segment) "end") + 0.02)) + (set! pm_end 0.02)) + (us_f0_to_pitchmarks utt 'f0 'TargetCoef pm_end) + (us_mapping utt 'segment_single) + (us_generate_wave utt (Parameter.get 'us_sigpr) + 'analysis_period)) + ((string-equal join_method 'smoothed_lpc) +; (format t "smoothed_lpc\n") + (defvar UniSyn_module_hooks nil) + (Param.def "unisyn.window_name" "hanning") + (Param.def "unisyn.window_factor" 1.0) + (Parameter.def 'us_sigpr 'lpc) + (mapcar + (lambda (u s) + (item.set_feat s "source_end" (item.feat u "end")) + (item.set_feat s "unit_duration" + (- (item.feat u "seg_end") (item.feat u "seg_start"))) + ) + (utt.relation.items utt 'Unit) + (utt.relation.items utt 'Segment)) + (us_unit_concat utt) + (mapcar + (lambda (u s) + (item.set_feat s "num_frames" (item.feat u "num_frames"))) + (utt.relation.items utt 'Unit) + (utt.relation.items utt 'Segment)) + (if (not (member 'f0 (utt.relationnames utt))) + (targets_to_f0 utt)) + (if (utt.relation.last utt 'Segment) + (set! pm_end (+ (item.feat (utt.relation.last utt 'Segment) "end") + 0.02)) + (set! pm_end 0.02)) + (us_f0_to_pitchmarks utt 'f0 'TargetCoef pm_end) + (cl_mapping utt clunits_params) + (us_generate_wave utt (Parameter.get 'us_sigpr) + 'analysis_period)) + (t + (Clunits_Simple_Wave utt))) + utt + ) +) + +(define (clunits::units_selected utt filename) + "(clunits::units_selected utt filename) +Output selected unitsfile indexes for each unit in the given utterance. +Results saved in given file name, or stdout if filename is \"-\"." + (let ((fd (if (string-equal filename "-") + t + (fopen filename "w"))) + (end 0) + (sample_rate + (cadr (assoc 'sample_rate (wave.info (utt.wave utt)))))) + (format fd "#\n") + (mapcar + (lambda (s) + (let ((dur (/ (- (item.feat s "samp_end") + (item.feat s "samp_start")) + sample_rate)) + (start (/ (item.feat s "samp_start") sample_rate))) + (set! end (+ end dur)) + (format fd "%f 125 %s ; %s %10s %f %f %f\n" + end + (string-before (item.name s) "_") + (item.name s) + (item.feat s "fileid") + (item.feat s "unit_start") + (item.feat s "unit_middle") + (item.feat s "unit_end")) + )) + (utt.relation.items utt 'Unit)) + (if (not (string-equal filename "-")) + (fclose fd)) + t)) + +(define (clunits::units_segs utt filename) + "(clunits::units_segs utt filename) +Svaes the unit selections (alone) for display." + (let ((fd (if (string-equal filename "-") + t + (fopen filename "w"))) + (end 0) + (sample_rate + (cadr (assoc 'sample_rate (wave.info (utt.wave utt)))))) + (format fd "#\n") + (mapcar + (lambda (s) + (let ((dur (/ (- (item.feat s "samp_end") + (item.feat s "samp_start")) + sample_rate)) + (start (/ (item.feat s "samp_start") sample_rate))) + (set! end (+ end dur)) + (format fd "%f 125 %s \n" + end + (string-before (item.name s) "_") +; (item.name s) + ) + )) + (utt.relation.items utt 'Unit)) + (if (not (string-equal filename "-")) + (fclose fd)) + t)) + +(define (clunits::fix_segs_durs utt) + "(clunits::fix_segs_durs utt) +Takes the actual unit times and places then back on the segs." + (let ((end 0) + (sample_rate + (cadr (assoc 'sample_rate (wave.info (utt.wave utt)))))) + (mapcar + (lambda (u s) + (let ((dur (/ (- (item.feat u "samp_end") + (item.feat u "samp_start")) + sample_rate)) + (seg_start (/ (- (item.feat u "samp_seg_start") + (item.feat u "samp_start")) + sample_rate))) + (if (item.prev s) + (item.set_feat (item.prev s) "end" + (+ (item.feat s "p.end") seg_start))) + (set! end (+ end dur)) + (item.set_feat s "end" end))) + (utt.relation.items utt 'Unit) + (utt.relation.items utt 'Segment) + ) + utt)) + +(define (clunits::display utt) + "(clunits::display utt) +Display utterance with emulabel. Note this saves files in +scratch/wav/ and scratch/lab/." + (let ((id "cl01")) + (utt.save.wave utt (format nil "scratch/wav/%s.wav" id)) + (utt.save.segs utt (format nil "scratch/lab/%s.lab" id)) + (system "cd scratch; emulabel ../etc/emu_lab cl01 &") + t)) + +; (define (clunits::debug_resynth_units utt) +; "(clunits::debug_resynth_units utt) +; Check each of the units in utt against the related label +; files and re-synth with any given new boundaries. Note this is +; will only work if the segment still overlaps with its original and +; also note that with a rebuild of the clunits db a complete different +; set of units may be selected for this utterance." +; (let () +; (mapcar +; (lambda (unit) +; (clunits::check_unit_boundaries unit)) +; (utt.relation.items utt 'Unit)) +; ;; This can't be done like this ... +; (Clunits_Get_Units utt) ;; get unit signal/track stuff +; (Clunits_Join_Units utt) ;; make a complete waveform +; (apply_hooks cluster_synth_post_hooks utt) +; utt) +; ) + +(define (clunits::join_parameters utt) + "(clunits::join_parameters utt) +Join selected paremeters (rather than the signal), used in F0 and +Articulatory selection." + (let ((params nil) + (num_channels 0) + (num_frames 0 )) + + (mapcar + (lambda (unit) + (set! num_frames + (+ num_frames + (track.num_frames (item.feat unit "coefs")))) + (set! num_channels (track.num_channels (item.feat unit "coefs"))) + (format t "coounting %d %d\n" num_frames num_channels) + ) + (utt.relation.items utt 'Unit)) + + (set! params (track.resize nil 0 num_channels)) + + (mapcar + (lambda (unit) + (set! frames 0) + (format t "inserting \n") + (format t "%l %l %l %l %l\n" + params (track.num_frames params) + (item.feat unit "coefs") 0 + (track.num_frames (item.feat unit "coefs"))) + (track.insert + params (track.num_frames params) + (item.feat unit "coefs") 0 + (track.num_frames (item.feat unit "coefs"))) + ) + (utt.relation.items utt 'Unit)) + + (utt.relation.create utt "AllCoefs") + (set! coefs_item (utt.relation.append utt "AllCoefs")) + (item.set_feat coefs_item "name" "AllCoefs") + (item.set_feat coefs_item "AllCoefs" params) + + utt +)) + + +(provide 'clunits) diff --git a/lib/clunits_build.scm b/lib/clunits_build.scm new file mode 100644 index 0000000..1dd438a --- /dev/null +++ b/lib/clunits_build.scm @@ -0,0 +1,465 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Carnegie Mellon University and ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1998-2005 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH, CARNEGIE MELLON UNIVERSITY AND THE ;; +;;; CONTRIBUTORS TO THIS WORK DISCLAIM ALL WARRANTIES WITH REGARD TO ;; +;;; THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY ;; +;;; AND FITNESS, IN NO EVENT SHALL THE UNIVERSITY OF EDINBURGH, CARNEGIE ;; +;;; MELLON UNIVERSITY NOR THE CONTRIBUTORS BE LIABLE FOR ANY SPECIAL, ;; +;;; INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ;; +;;; RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION ;; +;;; OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF ;; +;;; OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Cluster Unit selection support (Black and Taylor Eurospeech '97) +;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; clunits build support +;;; +;;; There are five stages to this +;;; Load in all utterances +;;; Load in their coefficients +;;; Collect together the units of the same type +;;; build distance tables from them +;;; dump features for them +;;; + +(require_module 'clunits) ;; C++ modules support +(require 'clunits) ;; run time scheme support + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(define (do_all) + (let () + + (format t "Loading utterances and sorting types\n") + (set! utterances (acost:db_utts_load clunits_params)) + (set! unittypes (acost:find_same_types utterances clunits_params)) + (acost:name_units unittypes) + + (format t "Dumping features for clustering\n") + (acost:dump_features unittypes utterances clunits_params) + + (format t "Loading coefficients\n") + (acost:utts_load_coeffs utterances) + ;; If you are short of diskspace try this + (acost:disttabs_and_clusters unittypes clunits_params) + + ;; or if you have lots of diskspace try +; (format t "Building distance tables\n") +; (acost:build_disttabs unittypes clunits_params) + +; ;; Build the cluster trees (requires disttabs and features) +; (format t "Building cluster trees\n") +; (acost:find_clusters (mapcar car unittypes) clunits_params) + + ;; Tidy up and put things together + (acost:collect_trees (mapcar car unittypes) clunits_params) + + (format t "Saving unit catalogue\n") + (acost:save_catalogue utterances clunits_params) + + ) +) + +(define (do_init) + (set! utterances (acost:db_utts_load clunits_params)) + (set! unittypes (acost:find_same_types utterances clunits_params)) + (acost:name_units unittypes) + t) + +(define (acost:disttabs_and_clusters unittypes clunits_params) + "(acost:disttabs_and_custers unittypes) +Cause it uses so much diskspace, build each table individually +and them the cluster, removing the table before moving on to the +next." + (mapcar + (lambda (uu) + (acost:build_disttabs (list uu) clunits_params) + (acost:find_clusters (list (car uu)) clunits_params) + (delete-file + (format nil "%s/%s/%s%s" + (get_param 'db_dir clunits_params "./") + (get_param 'disttabs_dir clunits_params "disttabs/") + (car uu) + (get_param 'disttabs_ext clunits_params ".disttab"))) + ) + unittypes) + t) + +(define (acost:db_utts_load params) + "(acost:db_utts_load params) +Load in all utterances identified in database." + (let ((files (car (cdr (assoc 'files params))))) + (set! acost:all_utts + (mapcar + (lambda (fname) + (set! utt_seg (Utterance Text fname)) + (utt.load utt_seg + (string-append + (get_param 'db_dir params "./") + (get_param 'utts_dir params "festival/utts/") + fname + (get_param 'utts_ext params ".utt"))) + utt_seg) + files)))) + +(define (acost:utts_load_coeffs utterances) + "(acost:utts_load_coeffs utterances) +Loading the acoustic coefficients of for each utterance." + (mapcar + (lambda (utt) (acost:utt.load_coeffs utt clunits_params)) + utterances) + t) + +(define (acost:find_same_types utterances params) + "(acost:find_same_types utterances) +Find all the stream items of the same type and collect them into +lists of that type." + (let ((clunit_name_feat (get_param 'clunit_name_feat params "name")) + (clunit_relation (get_param 'clunit_relation params "Segment"))) + (set! acost:unittypes nil) + (mapcar + (lambda (u) + (mapcar + (lambda (s) + (let ((cname (item.feat s clunit_name_feat))) + (if (not (string-equal "ignore" cname)) + (begin + (item.set_feat s "clunit_name" (item.feat s clunit_name_feat)) + (let ((p (assoc (item.feat s "clunit_name") acost:unittypes))) + (if p + (set-cdr! p (cons s (cdr p))) + (set! acost:unittypes + (cons + (list (item.feat s "clunit_name") s) + acost:unittypes)))))))) + (utt.relation.items u clunit_relation))) + utterances) + (acost:prune_unittypes acost:unittypes params))) + +(define (acost:prune_unittypes unittypes params) + "(acost:prune_unittypes unittypes) +If unit types are complex (contain an _) then remove all unittypes sets +with less than unittype_prune_threshold (typically 3)." + (if (string-matches (car (car unittypes)) ".*_.*") + (let ((ut nil) (pt (get_param 'unittype_prune_threshold params 0))) + (while unittypes + (if (or (eq? pt 0) + (> (length (cdr (car unittypes))) pt)) + (set! ut (cons (car unittypes) ut))) + (set! unittypes (cdr unittypes))) + (reverse ut)) + unittypes)) + +(define (acost:name_units unittypes) + "(acost:name_units unittypes) +Names each unit with a unique id and number the occurrences of each type." + (let ((idnum 0) (tynum 0)) + (mapcar + (lambda (s) + (set! tynum 0) + (mapcar + (lambda (si) + (item.set_feat si "unitid" idnum) + (set! idnum (+ 1 idnum)) + (item.set_feat si "occurid" tynum) + (set! tynum (+ 1 tynum))) + (cdr s)) + (format t "units \"%s\" %d\n" (car s) tynum)) + unittypes) + (format t "total units %d\n" idnum) + idnum)) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Generating feature files + +(define (acost:dump_features unittypes utterances params) + "(acost:dump_features unittypes utterances params) +Do multiple passes over the utterances for each unittype and +dump the desired features. This would be easier if utterances +weren't require for feature functions." + (mapcar + (lambda (utype) + (acost:dump_features_utype + (car utype) + (cdr utype) + utterances + params)) + unittypes) + t) + +(define (acost:dump_features_utype utype uitems utterances params) + "(acost:dump_features_utype utype utterances params) +Dump features for all items of type utype." + (let ((fd (fopen + (string-append + (get_param 'db_dir params "./") + (get_param 'feats_dir params "festival/feats/") + utype + (get_param 'feats_ext params ".feats")) + "w")) + (feats (car (cdr (assoc 'feats params))))) + (format t "Dumping features for %s\n" utype) + (mapcar + (lambda (s) + (mapcar + (lambda (f) + (set! fval (unwind-protect (item.feat s f) "0")) + (if (or (string-equal "" fval) + (string-equal " " fval)) + (format fd "%l " fval) + (format fd "%s " fval))) + feats) + (format fd "\n")) + uitems) + (fclose fd))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Tree building functions + +(defvar wagon-balance-size 0) + +(define (acost:find_clusters unittypes clunits_params) +"Use wagon to find the best clusters." + (mapcar + (lambda (unittype) + (build_tree unittype clunits_params)) + unittypes) + t) + +(define (build_tree unittype clunits_params) +"Build tree with Wagon for this unittype." + (let ((command + (format nil "%s -desc %s -data '%s' -balance %s -distmatrix '%s' -stop %s -output '%s' %s" + (get_param 'wagon_progname clunits_params "wagon") + (if (probe_file + (string-append + (get_param 'db_dir clunits_params "./") + (get_param 'wagon_field_desc clunits_params "wagon") + "." unittype)) + ;; So there can be unittype specific desc files + (string-append + (get_param 'db_dir clunits_params "./") + (get_param 'wagon_field_desc clunits_params "wagon") + "." unittype) + (string-append + (get_param 'db_dir clunits_params "./") + (get_param 'wagon_field_desc clunits_params "wagon"))) + (string-append + (get_param 'db_dir clunits_params "./") + (get_param 'feats_dir clunits_params "festival/feats/") + unittype + (get_param 'feats_ext clunits_params ".feats")) + (get_param 'wagon_balance_size clunits_params 0) + (string-append + (get_param 'db_dir clunits_params "./") + (get_param 'disttabs_dir clunits_params "festival/disttabs/") + unittype + (get_param 'disttabs_ext clunits_params ".disttab")) + (get_param 'wagon_cluster_size clunits_params 10) + (string-append + (get_param 'db_dir clunits_params "./") + (get_param 'trees_dir clunits_params "festival/trees/") + unittype + (get_param 'trees_ext clunits_params ".tree")) + (get_param 'wagon_other_params clunits_params "") + ))) + (format t "%s\n" command) + (system command))) + +(define (acost:collect_trees unittypes params) +"Collect the trees into one file as an assoc list" + (let ((fd (fopen + (string-append + (get_param 'db_dir params "./") + (get_param 'trees_dir params "festival/trees/") + (get_param 'index_name params "all.") + (get_param 'trees_ext params ".tree")) + "wb")) + (tree_pref + (string-append + (get_param 'db_dir params "./") + (get_param 'trees_dir params "festival/trees/"))) + (cluster_prune_limit (get_param 'cluster_prune_limit params 0)) + (cluster_merge (get_param 'cluster_merge params 0))) + (format fd ";; Autogenerated list of selection trees\n") + (mapcar + (lambda (fp) + (format fd ";; %l %l\n" (car fp) (car (cdr fp)))) + params) + (format fd "(set! clunits_selection_trees '(\n") + (mapcar + (lambda (unit) + (set! tree (car (load (string-append tree_pref unit ".tree") t))) + (if (> cluster_prune_limit 0) + (set! tree (cluster_tree_prune tree cluster_prune_limit))) + (if (> cluster_merge 0) + (set! tree (tree_merge_leafs tree cluster_merge))) + (if (boundp 'temp_tree_convert) + (set! tree (temp_tree_convert))) + (pprintf (list unit tree) fd)) + unittypes) + (format fd "))\n") + (fclose fd))) + +(define (cluster_tree_prune_in_line prune_limit) +"(cluster_tree_prune_in_line) +Prune number of units in each cluster in each tree *by* prune_limit, +if negative, or *to* prune_limit, if positive." + (set! sucs_select_trees + (mapcar + (lambda (t) + (cluster_tree_prune t prune_limit)) + sucs_select_trees))) + +(define (tree_merge_leafs tree depth) + "(tree_merge_leafs tree depth) +Merge the leafs of the tree at goven depth. This allows the trees +to be pruned then the single leafs joined together into larger +clusters (so the viterbi part has something to do)." + (let ((num_leafs (tree_num_leafs tree))) + (cond + ((< num_leafs 2) tree) ;; already at the foot + ((< num_leafs depth) + (tree_collect_leafs tree)) + (t + (list + (car tree) + (tree_merge_leafs (car (cdr tree)) depth) + (tree_merge_leafs (car (cdr (cdr tree))) depth)))))) + +(define (tree_num_leafs tree) + "(tree_num_leafs tree) +Number of leafs of given tree." + (cond + ((cdr tree) + (+ + (tree_num_leafs (car (cdr tree))) + (tree_num_leafs (car (cdr (cdr tree)))))) + (t + 1))) + +(define (tree_collect_leafs tree) + "(tree_collect_leafs tree) +Combine all units in the leafs." + (cond + ((cdr tree) + (let ((a (tree_collect_leafs (car (cdr tree)))) + (b (tree_collect_leafs (car (cdr (cdr tree)))))) + (list + (list + (append + (caar a) + (caar b)) + 10.0)))) + (t + tree))) + +(define (cluster_tree_prune tree prune_limit) +"(cluster_tree_prune TREE PRUNE_LIMIT) +Reduce the number of elements in the (CART) tree leaves to PRUNE_LIMIT +removing the ones further from the cluster centre. Maybe later this should +have guards on minimum number of units that must remain in the tree and +a per unit type limit." + (cond + ((cdr tree) ;; a question + (list + (car tree) + (cluster_tree_prune (car (cdr tree)) prune_limit) + (cluster_tree_prune (car (cdr (cdr tree))) prune_limit))) + (t ;; tree leave + (list + (list + (remove_n_worst + (car (car tree)) + (if (< prune_limit 0) + (* -1 prune_limit) + (- (length (car (car tree))) prune_limit))) + (car (cdr (car tree)))))))) + +(define (remove_n_worst lll togo) +"(remove_n_worst lll togo) +Remove togo worst items from lll." + (cond + ((< togo 0) + lll) + ((equal? 0 togo) + lll) + (t + (remove_n_worst + (remove (worst_unit (cdr lll) (car lll)) lll) + (- togo 1))))) + +(define (worst_unit lll worst_so_far) +"(worst_unit lll worst_so_far) +Returns unit with worst score in list." + (cond + ((null lll) + worst_so_far) + ((< (car (cdr worst_so_far)) (car (cdr (car lll)))) + (worst_unit (cdr lll) (car lll))) + (t + (worst_unit (cdr lll) worst_so_far)))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Save the unit catalogue for use in the run-time index + +(define (acost:save_catalogue utterances clunits_params) + "(acost:save_catalogue utterances clunits_params) +Save the catalogue with named units with times." + (let ((fd (fopen + (string-append + (get_param 'db_dir clunits_params "./") + (get_param 'catalogue_dir clunits_params "trees/") + (get_param 'index_name clunits_params "catalogue.") + ".catalogue") + "wb")) + (num_units 0) + ) + (format fd "EST_File index\n") + (format fd "DataType ascii\n") + (format fd "NumEntries %d\n" + (apply + + (mapcar (lambda (u) + (length (utt.relation.items u 'Segment))) utterances))) + (format fd "IndexName %s\n" (get_param 'index_name clunits_params "cluser")) + (format fd "EST_Header_End\n") + (mapcar + (lambda (u) + (mapcar + (lambda (s) + (format fd "%s_%s %s %f %f %f\n" + (item.feat s "clunit_name") + (item.feat s 'occurid) + (utt.feat u 'fileid) + (item.feat s 'segment_start) + (item.feat s 'segment_mid) + (item.feat s 'segment_end))) + (utt.relation.items u 'Segment))) + utterances) + (fclose fd))) + +(provide 'clunits_build.scm) diff --git a/lib/cmusphinx2_phones.scm b/lib/cmusphinx2_phones.scm new file mode 100644 index 0000000..49c6597 --- /dev/null +++ b/lib/cmusphinx2_phones.scm @@ -0,0 +1,119 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;;; +;;; Carnegie Mellon University ;;; +;;; and Alan W Black and Kevin Lenzo ;;; +;;; Copyright (c) 1998-2000 ;;; +;;; All Rights Reserved. ;;; +;;; ;;; +;;; Permission is hereby granted, free of charge, to use and distribute ;;; +;;; this software and its documentation without restriction, including ;;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;;; +;;; permit persons to whom this work is furnished to do so, subject to ;;; +;;; the following conditions: ;;; +;;; 1. The code must retain the above copyright notice, this list of ;;; +;;; conditions and the following disclaimer. ;;; +;;; 2. Any modifications must be clearly marked as such. ;;; +;;; 3. Original authors' names are not deleted. ;;; +;;; 4. The authors' names are not used to endorse or promote products ;;; +;;; derived from this software without specific prior written ;;; +;;; permission. ;;; +;;; ;;; +;;; CARNEGIE MELLON UNIVERSITY AND THE CONTRIBUTORS TO THIS WORK ;;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;;; +;;; SHALL CARNEGIE MELLON UNIVERSITY NOR THE CONTRIBUTORS BE LIABLE ;;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;;; +;;; THIS SOFTWARE. ;;; +;;; ;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; A definition of the cmusphinx2 phone set used in the BU RADIO FM +;;; corpus, some people call this the darpa set. This one +;;; has the closures removed +;;; + +(defPhoneSet + cmusphinx2 + ;;; Phone Features + (;; vowel or consonant + (vc + -) + ;; vowel length: short long dipthong schwa + (vlng s l d a 0) + ;; vowel height: high mid low + (vheight 1 2 3 0) + ;; vowel frontness: front mid back + (vfront 1 2 3 0) + ;; lip rounding + (vrnd + - 0) + ;; consonant type: stop fricative affricate nasal lateral approximant + (ctype s f a n l r 0) + ;; place of articulation: labial alveolar palatal labio-dental + ;; dental velar glottal + (cplace l a p b d v g 0) + ;; consonant voicing + (cvox + - 0) + ) + ;; Phone set members + ( + + ;; Note these features were set by awb so they are wrong !!! + +; phone vc vl vh vf vr ct cp cv + (AA + l 3 3 - 0 0 0) ;; father + (AE + s 3 1 - 0 0 0) ;; fat + (AH + s 2 2 - 0 0 0) ;; but + (AO + l 3 3 + 0 0 0) ;; lawn + (AW + d 3 2 - 0 0 0) ;; how + (AX + a 2 2 - 0 0 0) ;; about + (AXR + a 2 2 - r a +) + (AY + d 3 2 - 0 0 0) ;; hide + (B - 0 0 0 0 s l +) + (CH - 0 0 0 0 a p -) + (D - 0 0 0 0 s a +) + (DH - 0 0 0 0 f d +) + (DX - 0 0 0 0 s a +) + (EH + s 2 1 - 0 0 0) ;; get + (ER + a 2 2 - r 0 0) + (EY + d 2 1 - 0 0 0) ;; gate + (F - 0 0 0 0 f b -) + (G - 0 0 0 0 s v +) + (HH - 0 0 0 0 f g -) + (IH + s 1 1 - 0 0 0) ;; bit + (IY + l 1 1 - 0 0 0) ;; beet + (JH - 0 0 0 0 a p +) + (K - 0 0 0 0 s v -) + (L - 0 0 0 0 l a +) + (M - 0 0 0 0 n l +) + (N - 0 0 0 0 n a +) + (NG - 0 0 0 0 n v +) + (OW + d 2 3 + 0 0 0) ;; lone + (OY + d 2 3 + 0 0 0) ;; toy + (P - 0 0 0 0 s l -) + (R - 0 0 0 0 r a +) + (S - 0 0 0 0 f a -) + (SH - 0 0 0 0 f p -) + (T - 0 0 0 0 s a -) + (TH - 0 0 0 0 f d -) + (UH + s 1 3 + 0 0 0) ;; full + (UW + l 1 3 + 0 0 0) ;; fool + (V - 0 0 0 0 f b +) + (W - 0 0 0 0 r l +) + (Y - 0 0 0 0 r p +) + (Z - 0 0 0 0 f a +) + (ZH - 0 0 0 0 f p +) + (SIL - 0 0 0 0 0 0 -) ; added + ) +) + +(PhoneSet.silences '(SIL)) + +(provide 'cmusphinx2_phones) + + + + diff --git a/lib/cslush.scm b/lib/cslush.scm new file mode 100644 index 0000000..6864917 --- /dev/null +++ b/lib/cslush.scm @@ -0,0 +1,79 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Functions specific to using Festival in cslush part of the OGI toolkit +;;; The OGI toolkit is a complete dialog building system with speech +;;; recognition and synthesis (Festival) it is available for free for +;;; research purposes from +;;; http://www.cse.ogi.edu/CSLU/toolkit/toolkit.html +;;; +;;; Note this cslush interface requires you to compile festival +;;; with tcl (7.6) +;;; +;;; The functions replace the C++ level functions Jacques H. de Villiers +;;; from CSLU wrote for the previous version +;;; + +(if (not (member 'tcl *modules*)) + (error "cslush: can't load cslush, TCL not supported in this installation of Festival.")) + +(define (cslush.getwave utt) +"(cslush.getwave UTT) +Extract wave memory info, pass this to wave import in CSLUsh." + (format nil "%s %s %s" + (utt.wave.info utt 'data_addr) + (utt.wave.info utt 'num_samples) + (utt.wave.info utt 'sample_rate))) + +(define (cslush.getphone utt) +"(cslush.getphone UTT) +Return segment names a single string of phones, for use to pass to +TCL." + (let ((phones "")) + (mapcar + (lambda (s) + (if (string-equal phones "") + (set! phones (format nil "%s" (utt.streamitem.feat utt s 'name))) + (set! phones (format nil "%s %s" + phones (utt.streamitem.feat utt s 'name))))) + (utt.stream utt 'Segment)) + phones)) + +(define (cslush TCLCOMMAND) +"(cslush TCLCOMMAND) +Pass TCLCOMMAND to TCL interpreter, returns what TCL returns as a +string." + (tcl_eval TCLCOMMAND)) + + +(provide 'cslush) diff --git a/lib/darpa_phones.scm b/lib/darpa_phones.scm new file mode 100644 index 0000000..184c8bf --- /dev/null +++ b/lib/darpa_phones.scm @@ -0,0 +1,115 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1999 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Author: Alan W Black +;;; Date: April 1999 +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; (yet another) darpa definition +;;; + +(require 'phoneset) + +(set! darpa_fs (cadr +(defPhoneSet + darpa + (Features + (vowel (syllabic + -) + (length long short diphthong schwa) + (height high mid low) + (front front mid back) + (round + -)) + (consonant + (syllabic + -) + (manner stop affricate fricative approximant nasal) + (place alveolar dental labial palatal velar) + (voicing + -)) + (silence + (syllabic -))) + (Phones + ;; type syl length height front round + (aa vowel + long low back -) + (ae vowel + short low front -) + (ah vowel + short mid mid -) + (ao vowel + long low front +) + (aw vowel + diphthong low mid -) + (ax vowel + schwa mid mid -) + (axr vowel + schwa mid mid -) + (ay vowel + diphthong low mid -) + (eh vowel + short mid front -) + (ey vowel + diphthong mid front -) + (ih vowel + short high front -) + (iy vowel + long high front -) + (ow vowel + diphthong mid back +) + (oy vowel + diphthong mid back +) + (uh vowel + short high back +) + (uw vowel + long high back +) + ;; type syl manner place voicing + (b consonant - stop labial +) + (ch consonant - affricate alveolar -) + (d consonant - stop alveolar +) + (dh consonant - fricative dental +) + (dx consonant - stop alveolar +) + (el consonant + approximant alveolar +) + (em consonant + nasal labial +) + (en consonant + stop alveolar +) + (er consonant + approximant alveolar +) + (f consonant - fricative labial -) + (g consonant - stop velar +) + (hh consonant - fricative velar -) + (jh consonant - affricate alveolar +) + (k consonant - stop velar -) + (l consonant - approximant alveolar +) + (m consonant - nasal labial +) + (n consonant - nasal alveolar +) + (nx consonant - nasal alveolar +) + (ng consonant - nasal velar +) + (p consonant - stop labial -) + (r consonant - approximant alveolar +) + (s consonant - fricative alveolar -) + (sh consonant - fricative palatal -) + (t consonant - stop alveolar -) + (th consonant - fricative dental -) + (v consonant - fricative labial +) + (w consonant - approximant velar +) + (y consonant - approximant palatal +) + (z consonant - fricative alveolar +) + (zh consonant - fricative palatal +) + (pau silence -) +; (sil silence -) + )))) + +(provide 'darpa_phones) + + + + diff --git a/lib/display.scm b/lib/display.scm new file mode 100644 index 0000000..b190c05 --- /dev/null +++ b/lib/display.scm @@ -0,0 +1,69 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Author: Alan W Black +;;; Date: December 1996 +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; An xwaves display function for utterances +;;; +;;; Requires Xwaves to be running, saves labels etc and sends +;;; messages to Xwaves to display the utterance. +;;; +;;; This can be a model for other display functions. +;;; + +(define (display utt) +"(display utt) +Display an utterance's waveform, F0 and segment labels in Xwaves. +Xwaves must be running on the current machine, with a labeller for +this to work." + (let ((tmpname (make_tmp_filename))) + (utt.save.wave utt (string-append tmpname ".wav")) + (utt.save.segs utt (string-append tmpname ".lab")) + (utt.save.f0 utt (string-append tmpname ".f0")) + (system (format nil "send_xwaves make file %s name %s height 150" + (string-append tmpname ".f0") tmpname)) + (system (format nil "send_xwaves make name %s file %s height 200" + tmpname (string-append tmpname ".wav"))) + (system (format nil "send_xwaves send make file %s name %s color 125" + (string-append tmpname ".lab") tmpname)) + (system (format nil "send_xwaves send activate name %s fields 1" + tmpname)) + (system (format nil "send_xwaves %s align file %s" + tmpname (string-append tmpname ".wav")))) + ) + +(provide 'display) + + + + diff --git a/lib/duration.scm b/lib/duration.scm new file mode 100644 index 0000000..7e074d7 --- /dev/null +++ b/lib/duration.scm @@ -0,0 +1,196 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Basic Duration module which will call appropriate duration +;;; (C++) modules based on set parameter +;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;;; These modules should predict intonation events/labels +;;; based on information in the phrase and word streams + +(define (Duration utt) +"(Duration utt) +Predict segmental durations using Duration_Method defined in Parameters. +Four methods are currently available: averages, Klatt rules, CART tree +based, and fixed duration." + (let ((rval (apply_method 'Duration_Method utt))) + (cond + (rval rval) ;; new style + ;; 1.1.1 voices still use other names + ((eq 'Averages (Parameter.get 'Duration_Method)) + (Duration_Averages utt)) + ((eq 'Klatt (Parameter.get 'Duration_Method)) + (Duration_Klatt utt)) + ((eq 'Tree_ZScores (Parameter.get 'Duration_Method)) + (Duration_Tree_ZScores utt)) + ((eq 'Tree (Parameter.get 'Duration_Method)) + (Duration_Tree utt)) + (t + (Duration_Default utt))))) + +(define (Duration_LogZScores utt) +"(Duration_LogZScores utt) +Predicts duration to segments using the CART tree in duration_logzscore_tree +and duration_logzscore_tree_silence which produces a zscore of the log +duration. The variable duration_logzscore_ph_info contains (log) means +and std for each phone in the set." + (let ((silence (car (car (cdr (assoc 'silences (PhoneSet.description)))))) + ldurinfo) + (mapcar + (lambda (s) + (if (string-equal silence (item.name s)) + (set! ldurinfo + (wagon s duration_logzscore_tree_silence)) + (set! ldurinfo + (wagon s duration_logzscore_tree))) + (set! dur (exp (duration_unzscore + (item.name s) + (car (last ldurinfo)) + duration_logzscore_ph_info))) + (set! dur (* dur (duration_find_stretch s))) + (item.set_feat + s "end" (+ dur (item.feat s "start_segment")))) + (utt.relation.items utt 'Segment)) + utt)) + +(define (duration_unzscore phname zscore table) +"(duration_unzscore phname zscore table) +Look up phname in table and convert xscore back to absolute domain." + (let ((phinfo (assoc phname table)) + mean std) + (if phinfo + (begin + (set! mean (car (cdr phinfo))) + (set! std (car (cdr (cdr phinfo))))) + (begin + (format t "Duration: unzscore no info for %s\n" phname) + (set! mean 0.100) + (set! std 0.25))) + (+ mean (* zscore std)))) + +(define (duration_find_stretch seg) +"(duration_find_stretch utt seg) +Find any relavant duration stretch." + (let ((global (Parameter.get 'Duration_Stretch)) + (local (item.feat + seg "R:SylStructure.parent.parent.R:Token.parent.dur_stretch"))) + (if (or (not global) + (equal? global 0.0)) + (set! global 1.0)) + (if (string-equal local 0.0) + (set! local 1.0)) + (* global local))) + +;; These provide lisp level functions, some of which have +;; been converted in C++ (in festival/src/modules/base/ff.cc) +(define (onset_has_ctype seg type) + ;; "1" if onset contains ctype + (let ((syl (item.relation.parent seg 'SylStructure))) + (if (not syl) + "0" ;; a silence + (let ((segs (item.relation.daughters syl 'SylStructure)) + (v "0")) + (while (and segs + (not (string-equal + "+" + (item.feat (car segs) "ph_vc")))) + (if (string-equal + type + (item.feat (car segs) "ph_ctype")) + (set! v "1")) + (set! segs (cdr segs))) + v)))) + +(define (coda_has_ctype seg type) + ;; "1" if coda contains ctype + (let ((syl (item.relation.parent seg 'SylStructure))) + (if (not syl) + "0" ;; a silence + (let ((segs (reverse (item.relation.daughters + syl 'SylStructure))) + (v "0")) + (while (and segs + (not (string-equal + "+" + (item.feat (car segs) "ph_vc")))) + (if (string-equal + type + (item.feat (car segs) "ph_ctype")) + (set! v "1")) + (set! segs (cdr segs))) + v)))) + +(define (onset_stop seg) + (onset_has_ctype seg "s")) +(define (onset_fric seg) + (onset_has_ctype seg "f")) +(define (onset_nasal seg) + (onset_has_ctype seg "n")) +(define (onset_glide seg) + (let ((l (onset_has_ctype seg "l"))) + (if (string-equal l "0") + (onset_has_ctype seg "r") + "1"))) +(define (coda_stop seg) + (coda_has_ctype seg "s")) +(define (coda_fric seg) + (coda_has_ctype seg "f")) +(define (coda_nasal seg) + (coda_has_ctype seg "n")) +(define (coda_glide seg) + (let ((l (coda_has_ctype seg "l"))) + (if (string-equal l "0") + (coda_has_ctype seg "r") + "1"))) + +(define (Unisyn_Duration utt) + "(UniSyn_Duration utt) +predicts Segment durations is some speficied way but holds the +result in a way necessary for other Unisyn code." + (let ((end 0)) + (mapcar + (lambda (s) + (item.get_utt s) + (let ((dur (wagon_predict s duration_cart_tree))) + (set! dur (* (Parameter.get 'Duration_Stretch) dur)) + (set! end (+ dur end)) + (item.set_feat s "target_dur" dur) + (item.set_function s "start" "unisyn_start") + (item.set_feat s "end" end) + (item.set_feat s "dur" dur) + )) + (utt.relation.items utt 'Segment)) + utt)) + +(provide 'duration) diff --git a/lib/email-mode.scm b/lib/email-mode.scm new file mode 100644 index 0000000..4f8450f --- /dev/null +++ b/lib/email-mode.scm @@ -0,0 +1,89 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; An example tts text mode for reading email messages, this includes +;;; support for extracting the interesting headers from the message +;;; and for dealing with quoted text. Its all very primitive and +;;; will easily be confused but its here just as an example +;;; + +(define (email_init_func) + "(email_init_func) +Called on starting email text mode." + (voice_rab_diphone) + (set! email_previous_t2w_func token_to_words) + (set! english_token_to_words email_token_to_words) + (set! token_to_words english_token_to_words) + (set! email_in_quote nil)) + +(define (email_exit_func) + "(email_exit_func) +Called on exit email text mode." + (set! english_token_to_words email_previous_t2w_func) + (set! token_to_words english_token_to_words)) + +(define (email_token_to_words token name) + "(email_token_to_words utt token name) +Email spcific token to word rules." + (cond + ((string-matches name "<.*@.*>") + (append + (email_previous_t2w_func token + (string-after (string-before name "@") "<")) + (cons + "at" + (email_previous_t2w_func token + (string-before (string-after name "@") ">"))))) + ((and (string-matches name ">") + (string-matches (item.feat token "whitespace") + "[ \t\n]*\n *")) + (voice_cmu_us_awb_cg) + nil ;; return nothing to say + ) + (t ;; for all other cases + (if (string-matches (item.feat token "whitespace") + ".*\n[ \n]*") + (voice_rab_diphone)) + (email_previous_t2w_func token name)))) + +(set! tts_text_modes + (cons + (list + 'email ;; mode name + (list ;; email mode params + (list 'init_func email_init_func) + (list 'exit_func email_exit_func) + '(filter "email_filter"))) + tts_text_modes)) + +(provide 'email-mode) diff --git a/lib/engmorph.scm b/lib/engmorph.scm new file mode 100644 index 0000000..46b7c42 --- /dev/null +++ b/lib/engmorph.scm @@ -0,0 +1,151 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Author: Alan W Black +;;; Date: December 1997 +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; THIS IS EXPERIMENTAL AND DOES *NOT* WORK +;;; +;;; Koskenniemi-style context rewrite rules for English Morphographemics +;;; Basically splits words into their (potential) morphemes. +;;; +;;; Based (roughly) on the rules in "Computational Morphology" +;;; Ritchie et al. MIT Press 1992. +;;; +;;; This is not a Scheme file and can't be loaded and evaluated +;;; It is designed for use with the wfst tools in the speech tools +;;; e.g. wfst_build -type kk -o engmorph.wfst -detmin engmorph.scm +;;; + +(KKrules + engmorph + (Alphabets + ;; Input Alphabet + (a b c d e f g h i j k l m n o p q r s t u v w x y z #) + ;; Output Alphabet + (a b c d e f g h i j k l m n o p q r s t u v w x y z + #) + ) + (Sets + (LET a b c d e f g h i j k l m n o p q r s t u v w x y z) + ) + (Rules + ;; The basic rules + ( a => nil --- nil) + ( b => nil --- nil) + ( c => nil --- nil) + ( d => nil --- nil) + ( e => nil --- nil) + ( f => nil --- nil) + ( g => nil --- nil) + ( h => nil --- nil) + ( i => nil --- nil) + ( j => nil --- nil) + ( k => nil --- nil) + ( l => nil --- nil) + ( m => nil --- nil) + ( n => nil --- nil) + ( o => nil --- nil) + ( p => nil --- nil) + ( q => nil --- nil) + ( r => nil --- nil) + ( s => nil --- nil) + ( t => nil --- nil) + ( u => nil --- nil) + ( v => nil --- nil) + ( w => nil --- nil) + ( x => nil --- nil) + ( y => nil --- nil) + ( z => nil --- nil) + ( # => nil --- nil) +; ( _epsilon_/+ => (or LET _epsilon_/e ) --- (LET)) + ( _epsilon_/+ => (or LET _epsilon_/e) --- nil) + + ;; The rules that do interesting things + + ;; Epenthesis + ;; churches -> church+s + ;; boxes -> box+s + (e/+ <=> (or (s h) (or s x z) (i/y) (c h)) + --- + (s)) + ;; Gemination + (b/+ <=> ( (or b c d f g h j k l m n p q r s t v w z) (or a e i o u y) b ) + --- + ((or a e i o u))) + (d/+ <=> ((or b c d f g h j k l m n p q r s t v w z) (or a e i o u y) d ) + --- + ((or a e i o u))) + (f/+ <=> ((or b c d f g h j k l m n p q r s t v w z) (or a e i o u y) f ) + --- + ((or a e i o u))) + (g/+ <=> ((or b c d f g h j k l m n p q r s t v w z) (or a e i o u y) g ) + --- + ((or a e i o u))) + (m/+ <=> ((or b c d f g h j k l m n p q r s t v w z) (or a e i o u y) m ) + --- + ((or a e i o u))) + (p/+ <=> ((or b c d f g h j k l m n p q r s t v w z) (or a e i o u y) p ) + --- + ((or a e i o u))) + (s/+ <=> ((or b c d f g h j k l m n p q r s t v w z) (or a e i o u y) s ) + --- + ((or a e i o u))) + (t/+ <=> ((or b c d f g h j k l m n p q r s t v w z) (or a e i o u y) t ) + --- + ((or a e i o u))) + (z/+ <=> ((or b c d f g h j k l m n p q r s t v w z) (or a e i o u y) z ) + --- + ((or a e i o u))) + (n/+ <=> ((or b c d f g h j k l m n p q r s t v w z) (or a e i o u y) n ) + --- + ((or a e i o u))) + (l/+ <=> ((or b c d f g h j k l m n p q r s t v w z) (or a e i o u y) l ) + --- + ((or a e i o u))) + (r/+ <=> ((or b c d f g h j k l m n p q r s t v w z) (or a e i o u y) r ) + --- + ((or a e i o u))) + ;; tries->try+s + ( i/y <=> ((or b c d f g h j k l m n p q r s t v w x z)) + --- + ((or ( e/+ s ) + ( _epsilon_/+ (or a d e f h i l m n o p s w y))))) + ;; Elision + ;; moved -> move+ed + (_epsilon_/e <=> + ((or a e i o u ) (or b c d f g j k l m n p q r s t v x z)) + --- + ( _epsilon_/+ (or a e i o u ))) + + ) +) diff --git a/lib/engmorphsyn.scm b/lib/engmorphsyn.scm new file mode 100644 index 0000000..d6e237f --- /dev/null +++ b/lib/engmorphsyn.scm @@ -0,0 +1,170 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Author: Alan W Black +;;; Date: December 1997 +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; THIS IS EXPERIMENTAL AND DOES *NOT* WORK +;;; +;;; +;;; An English morpho-syntax finite-state grammar +;;; This is used for morphological decomposition of unknown words +;;; specifically (only) words that are not found in the lexicon. +;;; This idea is that when an unknown word is found an attempt is made +;;; to see if it contains any well known morphological inflections or +;;; derivations, if so a better use of LTS can be made on the root, of +;;; none are found this +;;; +;;; +;;; Based on "Analysis of Unknown Words through Morphological +;;; Decomposition", Black, van de Plassche, Willians, European ACL 91. +;;; with the anyword matcher from a question by Lauri Karttunen after +;;; the talk. +;;; +;;; The suffixes and finite-state morph-syntax grammar is based +;;; (very roughly) on the rules in "Computational Morphology" +;;; Ritchie et al. MIT Press 1992. +;;; +;;; Can be compiled with +;;; wfst_build -type rg -o engmorphsyn.wfst -detmin engmorphsyn.scm +;;; +;;; The result can be combined with the morphographemic rules +;;; with +;;; wfst_build -type compose engmorph.wfst engmorphsyn.wfst -detmin -o engstemmer.wfst +;;; +;;; echo "# b o x e/+ s #" | wfst_run -wfst engstemmer.wfst -recog +;;; state 0 #/# -> 1 +;;; state 1 b/b -> 3 +;;; state 3 o/o -> 17 +;;; state 17 x/x -> 14 +;;; state 14 e/+ -> 36 +;;; state 36 s/s -> 34 +;;; state 34 #/# -> 16 +;;; OK. +;;; echo "# b o x e s #" | wfst_run -wfst engstemmer.wfst -recog +;;; state 0 #/# -> 1 +;;; state 1 b/b -> 3 +;;; state 3 o/o -> 17 +;;; state 17 x/x -> 14 +;;; state 14 e/e -> 22 +;;; state 22 s/s -> -1 + +(RegularGrammar + engsuffixmorphosyntax + ;; Sets + ( + (V a e i o u y) + (C b c d f g h j k l m n p q r s t v w x y z) + ) + ;; Rules + + ( + ;; A word *must* have a suffix to be recognized + (Word -> # Syls Suffix ) + (Word -> # Syls End ) + + ;; This matches any string of characters that contains at least one vowel + (Syls -> Syl Syls ) + (Syls -> Syl ) + (Syl -> Cs V Cs ) + (Cs -> C Cs ) + (Cs -> ) + + (Suffix -> VerbSuffix ) + (Suffix -> NounSuffix ) + (Suffix -> AdjSuffix ) + (VerbSuffix -> VerbFinal End ) + (VerbSuffix -> VerbtoNoun NounSuffix ) + (VerbSuffix -> VerbtoNoun End ) + (VerbSuffix -> VerbtoAdj AdjSuffix ) + (VerbSuffix -> VerbtoAdj End ) + (NounSuffix -> NounFinal End ) + (NounSuffix -> NountoNoun NounSuffix ) + (NounSuffix -> NountoNoun End ) + (NounSuffix -> NountoAdj AdjSuffix ) + (NounSuffix -> NountoAdj End ) + (NounSuffix -> NountoVerb VerbSuffix ) + (NounSuffix -> NountoVerb End ) + (AdjSuffix -> AdjFinal End ) + (AdjSuffix -> AdjtoAdj AdjSuffix) + (AdjSuffix -> AdjtoAdj End) + (AdjSuffix -> AdjtoAdv End) ;; isn't any Adv to anything + + (End -> # ) ;; word boundary symbol *always* present + + (VerbFinal -> + e d) + (VerbFinal -> + i n g) + (VerbFinal -> + s) + + (VerbtoNoun -> + e r) + (VerbtoNoun -> + e s s) + (VerbtoNoun -> + a t i o n) + (VerbtoNoun -> + i n g) + (VerbtoNoun -> + m e n t) + + (VerbtoAdj -> + a b l e) + + (NounFinal -> + s) + + (NountoNoun -> + i s m) + (NountoNoun -> + i s t) + (NountoNoun -> + s h i p) + + (NountoAdj -> + l i k e) + (NountoAdj -> + l e s s) + (NountoAdj -> + i s h) + (NountoAdj -> + o u s) + + (NountoVerb -> + i f y) + (NountoVerb -> + i s e) + (NountoVerb -> + i z e) + + (AdjFinal -> + e r) + (AdjFinal -> + e s t) + + (AdjtoAdj -> + i s h) + (AdjtoAdv -> + l y) + (AdjtoNoun -> + n e s s) + (AdjtoVerb -> + i s e) + (AdjtoVerb -> + i z e) + +) +) + + + + + + + + diff --git a/lib/etc/Makefile b/lib/etc/Makefile new file mode 100644 index 0000000..17fb4a0 --- /dev/null +++ b/lib/etc/Makefile @@ -0,0 +1,45 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## ## +## Directory for various scripts (machine independent functions) ## +## Sub-directories of this will contain machine-dependent binaries ## +## ## +########################################################################### +TOP=../.. +DIRNAME=lib/etc + +FILTERS=email_filter +FILES=Makefile $(FILTERS) + +include $(TOP)/config/common_make_rules + diff --git a/lib/etc/email_filter b/lib/etc/email_filter new file mode 100755 index 0000000..a2d9250 --- /dev/null +++ b/lib/etc/email_filter @@ -0,0 +1,47 @@ +#!/bin/sh +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## ## +## Email filter for tts text mode ## +## usage: email_filter email_message >filtered_message ## +## ## +## Extracts the From and Subject lines form the head and the body of ## +## of the message, I suppose it could also do signature extraction ## +## ## +########################################################################### +grep "^From: " $1 +echo +grep "^Subject: " $1 +echo +# delete up to first blank line (i.e. the header) +sed '1,/^$/ d' $1 diff --git a/lib/f2bdurtreeZ.scm b/lib/f2bdurtreeZ.scm new file mode 100644 index 0000000..407943a --- /dev/null +++ b/lib/f2bdurtreeZ.scm @@ -0,0 +1,869 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; First attempt at a tree to learn durations. Although +;;; it was trained from F2B and the radio phone set should +;;; work for others that are decalred with the same phone +;;; features +;;; + +;; in ancient items (not on independent data) +;; RMSE 0.821086 Correlation is 0.573693 Mean (abs) Error 0.612327 (0.547034) + +;; on independent test data +;; RMSE 0.8054 Correlation is 0.5327 Mean (abs) Error 0.6073 (0.5290) + +(set! f2b_duration_cart_tree +' +((name is #) + ((emph_sil is +) + ((0.0 -0.5)) + ((R:Segment.p.R:SylStructure.parent.parent.pbreak is BB) + ((0.0 2.0)) + ((0.0 0.0)))) +((R:SylStructure.parent.accented is 0) + ((R:Segment.p.ph_ctype is 0) + ((R:Segment.n.ph_cplace is 0) + ((ph_ctype is n) + ((R:SylStructure.parent.position_type is initial) + ((ph_cplace is a) + ((0.675606 -0.068741)) + ((0.674321 0.204279))) + ((ph_cplace is l) + ((0.688993 -0.124997)) + ((R:SylStructure.parent.syl_out < 10) + ((0.610881 -0.394451)) + ((0.664504 -0.603196))))) + ((ph_ctype is r) + ((lisp_onset_glide is 0) + ((R:SylStructure.parent.R:Syllable.n.syl_break is 0) + ((0.949991 0.619256)) + ((1.05066 0.979668))) + ((0.858728 0.457972))) + ((R:SylStructure.parent.position_type is single) + ((syl_initial is 0) + ((ph_ctype is s) + ((0.692981 -0.788933)) + ((0.834878 -0.116988))) + ((R:SylStructure.parent.syl_out < 9.4) + ((0.777932 0.357818)) + ((0.852909 0.115478)))) + ((R:Segment.n.ph_vrnd is +) + ((ph_ctype is s) + ((0.81305 0.87399)) + ((0.65978 0.418928))) + ((R:SylStructure.parent.position_type is final) + ((R:SylStructure.parent.parent.word_numsyls < 2.3) + ((0.71613 -0.2888)) + ((0.642029 0.0624649))) + ((R:Segment.nn.ph_cplace is a) + ((R:SylStructure.parent.R:Syllable.n.syl_break is 1) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 1) + ((R:SylStructure.parent.position_type is initial) + ((0.854092 0.384456)) + ((0.769274 0.10705))) + ((lisp_coda_stop is 0) + ((0.571763 0.0755348)) + ((0.632928 -0.11117)))) + ((lisp_coda_stop is 0) + ((R:SylStructure.parent.syl_out < 8.6) + ((0.555092 0.30006)) + ((0.552673 -0.0263481))) + ((0.903186 0.519185)))) + ((R:Segment.nn.ph_cplace is p) + ((0.563915 0.204967)) + ((R:Segment.nn.ph_cvox is -) + ((ph_ctype is s) + ((0.67653 0.227681)) + ((0.550623 0.435079))) + ((R:SylStructure.parent.position_type is initial) + ((0.93428 0.732003)) + ((0.84114 0.423214))))))))))) + ((R:Segment.n.ph_ctype is s) + ((ph_ctype is s) + ((0.693376 -1.02719)) + ((R:Segment.n.ph_cplace is v) + ((ph_ctype is r) + ((0.539799 -0.344524)) + ((0.858576 0.154275))) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 1.2) + ((lisp_onset_glide is 0) + ((R:SylStructure.parent.R:Syllable.n.syl_break is 1) + ((ph_ctype is n) + ((R:Segment.nn.ph_cplace is a) + ((0.64604 -0.643797)) + ((0.739746 -0.450649))) + ((ph_ctype is f) + ((0.657043 -0.462107)) + ((0.798438 -0.19569)))) + ((R:SylStructure.parent.syl_out < 8.4) + ((lisp_coda_stop is 0) + ((0.766789 -0.0484781)) + ((0.717203 -0.322113))) + ((R:SylStructure.parent.position_type is single) + ((0.508168 -0.412874)) + ((0.703458 -0.291121))))) + ((0.574827 -0.65022))) + ((0.801765 -0.120813))))) + ((ph_ctype is n) + ((R:Segment.n.ph_ctype is f) + ((R:Segment.n.ph_cplace is b) + ((0.797652 0.623764)) + ((R:Segment.n.ph_cplace is a) + ((R:Segment.n.seg_onsetcoda is coda) + ((0.675567 0.288251)) + ((0.854197 0.626272))) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 1) + ((0.660394 -0.225466)) + ((0.65275 0.0487195))))) + ((R:Segment.n.ph_ctype is n) + ((0.685613 -0.512227)) + ((0.736366 -0.104066)))) + ((R:Segment.n.ph_ctype is r) + ((R:SylStructure.parent.position_type is initial) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.1) + ((0.98185 0.152471)) + ((0.851907 0.788208))) + ((ph_ctype is f) + ((0.76106 0.406474)) + ((R:Segment.n.ph_cplace is a) + ((1.01348 -0.0422549)) + ((0.786777 -0.714839))))) + ((ph_cplace is b) + ((R:SylStructure.parent.syl_out < 10.4) + ((0.799025 0.0992277)) + ((0.851068 -0.115896))) + ((R:Segment.n.ph_cplace is p) + ((0.669855 -0.655488)) + ((ph_ctype is r) + ((R:Segment.n.ph_cplace is a) + ((1.00772 0.130892)) + ((0.635981 -0.35826))) + ((R:Segment.n.ph_ctype is l) + ((R:SylStructure.parent.R:Syllable.n.syl_break is 1) + ((0.746089 -0.286007)) + ((0.89158 0.154432))) + ((R:Segment.n.ph_cplace is b) + ((1.04971 -0.0449782)) + ((R:SylStructure.parent.syl_out < 9.8) + ((R:Segment.n.ph_ctype is f) + ((R:Segment.n.seg_onsetcoda is coda) + ((1.4144 0.143658)) + ((0.781116 -0.281483))) + ((ph_vlng is 0) + ((0.755959 -0.33462)) + ((0.81024 -0.615287)))) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.3) + ((0.7426 -0.24342)) + ((R:Segment.n.ph_ctype is f) + ((R:Segment.n.ph_cplace is a) + ((R:SylStructure.parent.position_type is single) + ((0.578639 -0.322097)) + ((0.55826 -0.663238))) + ((0.616575 -0.713688))) + ((0.759572 -0.314116)))))))))))))) + ((R:Segment.n.ph_ctype is f) + ((ph_ctype is 0) + ((R:Segment.p.ph_ctype is r) + ((R:SylStructure.parent.parent.word_numsyls < 2.2) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 1) + ((0.733193 -0.180968)) + ((0.563111 -0.467934))) + ((0.426244 -0.758137))) + ((ph_vlng is a) + ((R:Segment.n.ph_cplace is b) + ((R:Segment.nn.ph_cvox is +) + ((0.680234 0.059855)) + ((R:SylStructure.parent.position_type is single) + ((0.980851 0.443893)) + ((0.715307 0.112865)))) + ((R:Segment.p.ph_cplace is a) + ((0.851224 0.695863)) + ((R:Segment.nn.ph_cvox is -) + ((0.75892 0.195772)) + ((0.630633 0.478738))))) + ((R:Segment.n.seg_onsetcoda is coda) + ((R:Segment.n.ph_cplace is b) + ((R:Segment.nn.ph_cplace is 0) + ((0.815979 -0.477579)) + ((0.851491 -0.168622))) + ((R:SylStructure.parent.position_type is single) + ((R:Segment.nn.ph_cvox is +) + ((1.14265 0.717697)) + ((0.814726 0.291482))) + ((R:SylStructure.parent.R:Syllable.n.syl_break is 0) + ((0.512322 -0.0749096)) + ((0.488216 0.112774))))) + ((R:SylStructure.parent.position_type is final) + ((0.693071 -0.200708)) + ((R:Segment.p.ph_cvox is +) + ((0.489147 -0.378728)) + ((0.695396 -0.525028))))))) + ((ph_vlng is s) + ((0.464234 -0.162706)) + ((R:Segment.p.ph_cvox is +) + ((R:SylStructure.parent.parent.word_numsyls < 2.2) + ((0.566845 -0.616918)) + ((0.92747 -0.26777))) + ((0.632833 -0.858295))))) + ((R:Segment.n.ph_vrnd is 0) + ((R:Segment.p.ph_ctype is r) + ((ph_vlng is 0) + ((0.845308 -0.23426)) + ((R:SylStructure.parent.syl_out < 4.8) + ((R:Segment.n.ph_ctype is n) + ((0.484602 -0.850587)) + ((0.535398 -0.586652))) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.3) + ((ph_vlng is a) + ((0.368898 -0.799533)) + ((lisp_coda_stop is 0) + ((0.387923 -1.11431)) + ((0.407377 -0.859849)))) + ((R:Segment.n.ph_cplace is a) + ((ph_vlng is a) + ((0.382367 -0.787669)) + ((0.522121 -0.687376))) + ((0.361185 -0.853639)))))) + ((ph_vlng is a) + ((ph_ctype is 0) + ((R:Segment.n.ph_ctype is s) + ((R:Segment.p.ph_cvox is +) + ((R:Segment.p.ph_cplace is d) + ((0.502849 -0.232866)) + ((R:SylStructure.parent.position_type is initial) + ((0.641714 -0.0545426)) + ((R:SylStructure.parent.parent.word_numsyls < 2.6) + ((0.613913 0.373746)) + ((R:Segment.n.ph_cplace is v) + ((0.581158 0.310101)) + ((0.628758 -0.068165)))))) + ((R:SylStructure.parent.position_type is mid) + ((0.459281 -0.553794)) + ((0.728208 -0.138806)))) + ((R:Segment.p.ph_cplace is v) + ((0.32179 -0.728364)) + ((R:Segment.p.ph_cplace is l) + ((0.562971 -0.550272)) + ((R:SylStructure.parent.position_type is initial) + ((0.937298 -0.0246324)) + ((R:Segment.p.ph_cvox is +) + ((R:Segment.n.ph_ctype is n) + ((R:Segment.n.ph_cplace is a) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 0) + ((0.434029 -0.404793)) + ((1.05548 -0.103717))) + ((0.408372 -0.556145))) + ((0.712335 -0.118776))) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.3) + ((0.379593 -0.658075)) + ((0.549207 -0.494876)))))))) + ((R:SylStructure.parent.position_type is final) + ((0.597124 -0.649729)) + ((0.628822 -1.03743)))) + ((ph_ctype is s) + ((R:Segment.n.ph_ctype is r) + ((R:SylStructure.parent.syl_out < 8.4) + ((0.760328 0.31651)) + ((0.738363 -0.0177161))) + ((R:Segment.n.ph_ctype is l) + ((0.649328 -0.108791)) + ((0.594945 -0.712753)))) + ((ph_vlng is s) + ((R:Segment.n.ph_ctype is s) + ((R:Segment.n.ph_cplace is v) + ((R:Segment.nn.ph_cplace is a) + ((0.583211 0.0724331)) + ((0.434605 -0.229857))) + ((R:Segment.p.ph_cplace is a) + ((R:SylStructure.parent.position_type is single) + ((0.785502 -0.00061573)) + ((0.544995 -0.432984))) + ((R:Segment.nn.ph_cplace is 0) + ((0.507071 -0.715041)) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 0) + ((0.506404 -0.573733)) + ((0.62466 -0.3356)))))) + ((R:Segment.p.ph_cplace is l) + ((0.571756 -0.819693)) + ((lisp_coda_stop is 0) + ((R:SylStructure.parent.position_type is initial) + ((0.906891 -0.352911)) + ((R:Segment.n.ph_ctype is r) + ((0.620335 -0.445714)) + ((R:SylStructure.parent.parent.word_numsyls < 2.5) + ((R:Segment.p.ph_cvox is +) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 0) + ((0.484057 -0.781483)) + ((0.653917 -0.615429))) + ((0.754814 -0.531845))) + ((0.493988 -0.881596))))) + ((0.792979 -0.32648))))) + ((R:Segment.p.ph_cvox is +) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.3) + ((lisp_coda_stop is 0) + ((0.913526 -0.195111)) + ((0.56564 -0.64867))) + ((R:SylStructure.parent.position_type is single) + ((R:Segment.n.ph_cplace is a) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 1) + ((0.790882 -0.488954)) + ((0.780221 -0.185138))) + ((0.487794 -0.691338))) + ((R:Segment.p.ph_ctype is n) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 1) + ((0.595729 -0.771698)) + ((0.57908 -1.06592))) + ((R:Segment.pp.ph_vfront is 0) + ((0.591417 -0.784735)) + ((0.486298 -0.436971)))))) + ((ph_vlng is 0) + ((0.629869 -0.960652)) + ((R:Segment.n.ph_ctype is r) + ((R:Segment.nn.ph_cplace is 0) + ((0.591783 -0.671576)) + ((R:Segment.nn.ph_cvox is +) + ((0.365135 -0.822844)) + ((0.428573 -0.988434)))) + ((lisp_coda_stop is 0) + ((R:Segment.p.ph_cplace is a) + ((R:Segment.n.ph_cplace is a) + ((0.428189 -0.730057)) + ((0.337443 -0.861764))) + ((0.57354 -0.494602))) + ((0.497606 -0.414451)))))))))) + ((ph_vlng is l) + ((R:Segment.pp.ph_vfront is 1) + ((0.937199 0.833877)) + ((R:SylStructure.parent.syl_out < 12.7) + ((0.729202 0.344121)) + ((0.71086 0.101855)))) + ((syl_initial is 0) + ((R:Segment.p.ph_ctype is r) + ((R:Segment.nn.ph_cplace is a) + ((0.844815 0.175273)) + ((0.662523 -0.297527))) + ((ph_vlng is 0) + ((R:Segment.p.ph_ctype is s) + ((R:SylStructure.parent.syl_out < 14.6) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 0) + ((0.665332 -0.610529)) + ((0.42276 -0.848942))) + ((0.427946 -0.980726))) + ((R:SylStructure.parent.position_type is single) + ((R:SylStructure.parent.R:Syllable.n.syl_break is 1) + ((0.523367 -0.825038)) + ((0.635654 -0.535303))) + ((R:SylStructure.parent.position_type is final) + ((0.515996 -0.707614)) + ((ph_cplace is a) + ((lisp_coda_stop is 0) + ((0.689738 0.0446601)) + ((0.698347 -0.268593))) + ((R:Segment.nn.ph_cplace is a) + ((0.706504 -0.659172)) + ((0.775589 -0.201769))))))) + ((0.79472 -0.0539192)))) + ((ph_ctype is s) + ((R:SylStructure.parent.position_type is single) + ((R:Segment.p.ph_ctype is f) + ((0.641302 0.532411)) + ((R:Segment.n.ph_vrnd is +) + ((0.800655 0.325651)) + ((0.894711 0.0487864)))) + ((R:SylStructure.parent.position_type is initial) + ((R:Segment.nn.ph_cplace is a) + ((0.618082 -0.0190591)) + ((0.733637 0.156329))) + ((ph_cplace is a) + ((R:SylStructure.parent.parent.word_numsyls < 2.3) + ((0.372869 -0.0827845)) + ((0.494988 0.0882778))) + ((0.593526 -0.335404))))) + ((R:Segment.p.ph_cvox is +) + ((R:Segment.p.ph_ctype is n) + ((R:SylStructure.parent.syl_out < 5.4) + ((1.0207 -0.152517)) + ((R:SylStructure.parent.R:Syllable.p.stress is 0) + ((0.711277 -0.513467)) + ((0.509207 -0.726794)))) + ((ph_cplace is g) + ((0.545188 -0.568352)) + ((R:Segment.p.ph_cplace is a) + ((ph_ctype is n) + ((0.61149 -0.325094)) + ((R:SylStructure.parent.position_type is single) + ((R:Segment.p.ph_ctype is r) + ((0.525282 0.395446)) + ((R:SylStructure.parent.R:Syllable.n.syl_break is 1) + ((0.85778 0.0760293)) + ((0.704055 0.290369)))) + ((R:Segment.pp.ph_vfront is 0) + ((0.590093 0.136983)) + ((0.734563 -0.0570759))))) + ((R:Segment.pp.ph_vfront is 2) + ((0.519485 -0.477174)) + ((0.707546 -0.13584)))))) + ((R:SylStructure.parent.position_type is single) + ((R:Segment.p.ph_ctype is f) + ((0.797877 0.00462775)) + ((R:Segment.pp.ph_vfront is 1) + ((0.852184 -0.259914)) + ((0.65313 -0.492506)))) + ((R:SylStructure.parent.position_type is initial) + ((0.662516 -0.45585)) + ((lisp_onset_glide is 0) + ((0.652534 -0.652428)) + ((0.482818 -0.885728)))))))))))) + ((syl_initial is 0) + ((ph_cplace is 0) + ((R:SylStructure.parent.position_type is single) + ((R:Segment.n.ph_ctype is f) + ((R:Segment.p.ph_cplace is a) + ((R:Segment.n.ph_cplace is a) + ((R:Segment.pp.ph_vfront is 0) + ((1.06157 1.30945)) + ((1.12041 1.85843))) + ((1.05622 0.921414))) + ((R:Segment.nn.ph_cvox is -) + ((1.03073 0.916168)) + ((1.06857 0.452851)))) + ((R:Segment.p.ph_ctype is r) + ((R:Segment.n.ph_cplace is v) + ((1.22144 0.672433)) + ((R:Segment.p.ph_cplace is l) + ((0.859749 -0.315152)) + ((R:Segment.nn.ph_cvox is -) + ((0.89862 0.131037)) + ((0.760033 -0.121252))))) + ((R:SylStructure.parent.syl_out < 8.8) + ((R:SylStructure.parent.syl_out < 0.8) + ((1.06821 1.63716)) + ((R:Segment.n.ph_cplace is a) + ((R:Segment.p.ph_cvox is +) + ((1.04477 0.581686)) + ((R:Segment.nn.ph_cvox is +) + ((0.769059 0.301576)) + ((0.953428 0.0764058)))) + ((R:Segment.p.ph_cplace is a) + ((1.01367 0.507761)) + ((1.2827 0.945031))))) + ((R:Segment.n.ph_cplace is l) + ((0.618397 -0.0873608)) + ((R:SylStructure.parent.R:Syllable.n.syl_break is 0) + ((R:Segment.p.ph_cvox is +) + ((0.817182 0.477262)) + ((0.792181 -0.0592145))) + ((R:SylStructure.parent.R:Syllable.p.stress is 0) + ((R:SylStructure.parent.syl_out < 16) + ((0.995411 0.497843)) + ((0.784087 0.152266))) + ((1.11816 0.716352)))))))) + ((R:Segment.n.ph_ctype is f) + ((R:SylStructure.parent.position_type is final) + ((1.35724 1.06028)) + ((R:Segment.p.ph_ctype is r) + ((R:SylStructure.parent.syl_out < 8.6) + ((0.511716 -0.0833005)) + ((0.492142 -0.30212))) + ((R:Segment.n.ph_cplace is b) + ((0.53059 0.00266551)) + ((R:SylStructure.parent.parent.word_numsyls < 2.3) + ((ph_vlng is l) + ((0.433396 0.821463)) + ((0.66915 0.415614))) + ((0.501369 0.154721)))))) + ((R:SylStructure.parent.position_type is final) + ((R:Segment.n.ph_ctype is s) + ((1.03896 0.524706)) + ((R:SylStructure.parent.R:Syllable.p.stress is 0) + ((1.15147 0.428386)) + ((R:Segment.p.ph_cplace is a) + ((0.919929 0.0314637)) + ((0.716168 -0.366629))))) + ((R:SylStructure.parent.R:Syllable.n.syl_break is 4) + ((0.816778 0.408786)) + ((lisp_onset_glide is 0) + ((R:Segment.p.ph_ctype is n) + ((R:Segment.n.ph_ctype is s) + ((0.532911 -0.153851)) + ((0.633518 -0.762353))) + ((R:Segment.p.ph_cvox is -) + ((R:Segment.p.ph_cplace is g) + ((0.618376 -0.593197)) + ((R:SylStructure.parent.R:Syllable.n.syl_break is 1) + ((R:Segment.pp.ph_vfront is 0) + ((R:Segment.n.ph_ctype is n) + ((0.554085 -0.058903)) + ((R:Segment.p.ph_cplace is a) + ((0.59842 -0.174458)) + ((0.585539 -0.349335)))) + ((0.500857 -0.416613))) + ((R:SylStructure.parent.syl_out < 7) + ((0.616683 -0.00213272)) + ((0.631444 -0.141773))))) + ((R:SylStructure.parent.R:Syllable.n.syl_break is 0) + ((0.5198 -0.151901)) + ((ph_vlng is s) + ((0.677428 0.203522)) + ((0.780789 0.375429)))))) + ((R:Segment.nn.ph_cplace is a) + ((R:SylStructure.parent.R:Syllable.p.stress is 0) + ((0.594604 -0.27832)) + ((0.736114 -0.422756))) + ((R:Segment.p.ph_cplace is a) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 1) + ((0.512186 -0.732785)) + ((0.550759 -0.506471))) + ((0.47297 -0.791841))))))))) + ((R:Segment.p.ph_ctype is 0) + ((R:SylStructure.parent.position_type is final) + ((lisp_coda_stop is 0) + ((ph_ctype is f) + ((R:Segment.nn.ph_cplace is 0) + ((1.00978 0.366105)) + ((0.80682 -0.0827529))) + ((R:Segment.n.ph_cplace is a) + ((R:Segment.nn.ph_cvox is -) + ((1.07097 1.77503)) + ((1.14864 1.14754))) + ((R:Segment.n.ph_vrnd is -) + ((0.883474 0.286471)) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 1) + ((1.22264 0.884142)) + ((1.03401 0.658192)))))) + ((ph_cplace is a) + ((R:SylStructure.parent.syl_out < 6.4) + ((R:SylStructure.parent.syl_out < 0.6) + ((1.07956 0.602849)) + ((1.12301 0.0555897))) + ((R:SylStructure.parent.R:Syllable.p.stress is 0) + ((0.898888 -0.17527)) + ((0.940932 0.274301)))) + ((1.10093 -0.68098)))) + ((R:Segment.n.ph_ctype is s) + ((ph_cplace is v) + ((0.639932 -1.33353)) + ((R:SylStructure.parent.position_type is single) + ((R:SylStructure.parent.R:Syllable.n.syl_break is 0) + ((lisp_coda_stop is 0) + ((0.822882 -0.131692)) + ((0.971957 -0.385365))) + ((R:Segment.nn.ph_cvox is -) + ((1.06611 0.183678)) + ((lisp_coda_stop is 0) + ((0.967183 0.0925019)) + ((0.876026 -0.230108))))) + ((ph_ctype is f) + ((R:SylStructure.parent.syl_out < 13) + ((0.589198 -0.655594)) + ((0.476651 -0.926625))) + ((R:SylStructure.parent.syl_out < 5) + ((0.682936 -0.227662)) + ((R:SylStructure.parent.R:Syllable.p.stress is 0) + ((R:Segment.nn.ph_cplace is a) + ((0.447309 -0.700998)) + ((0.626113 -0.468853))) + ((0.657893 -0.383607))))))) + ((ph_ctype is r) + ((R:Segment.nn.ph_cvox is -) + ((1.15158 1.15233)) + ((R:Segment.n.ph_vrnd is -) + ((1.05554 0.533749)) + ((0.955478 0.0841894)))) + ((ph_ctype is l) + ((R:Segment.n.ph_ctype is 0) + ((R:Segment.nn.ph_cplace is a) + ((0.766431 0.28943)) + ((1.48633 1.09574))) + ((R:SylStructure.parent.position_type is single) + ((1.01777 0.474653)) + ((0.545859 -0.402743)))) + ((R:SylStructure.parent.syl_out < 4.8) + ((R:Segment.n.ph_vc is +) + ((ph_ctype is n) + ((0.776645 -0.433859)) + ((R:SylStructure.parent.R:Syllable.p.stress is 0) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 0) + ((0.776179 0.23435)) + ((R:SylStructure.parent.parent.word_numsyls < 2.2) + ((0.744272 -0.0859672)) + ((0.782605 0.115647)))) + ((0.626541 -0.167615)))) + ((R:Segment.n.seg_onsetcoda is coda) + ((1.28499 0.864144)) + ((ph_cplace is a) + ((0.926103 0.0435837)) + ((0.839172 -0.189514))))) + ((R:Segment.n.ph_ctype is n) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.1) + ((0.973489 -0.203415)) + ((0.777589 -0.849733))) + ((ph_ctype is n) + ((R:SylStructure.parent.position_type is initial) + ((R:Segment.n.ph_vc is +) + ((0.743482 -0.53384)) + ((0.619309 -0.0987861))) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((1.15555 0.0786295)) + ((1.06689 0.681662)))) + ((R:Segment.n.ph_ctype is r) + ((R:SylStructure.parent.syl_out < 8.9) + ((0.752079 -0.237421)) + ((R:SylStructure.parent.R:Syllable.p.stress is 0) + ((0.664182 -0.041521)) + ((0.772712 0.103499)))) + ((R:Segment.n.seg_onsetcoda is coda) + ((R:SylStructure.parent.position_type is mid) + ((R:SylStructure.parent.parent.word_numsyls < 3.3) + ((0.715944 -0.275113)) + ((0.675729 0.202848))) + ((R:Segment.n.ph_vrnd is -) + ((R:SylStructure.parent.syl_out < 8.3) + ((ph_ctype is s) + ((0.82747 -0.116723)) + ((0.689586 -0.303909))) + ((R:SylStructure.parent.syl_out < 17.7) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 0) + ((0.659686 -0.621268)) + ((ph_cplace is a) + ((0.861741 -0.285324)) + ((0.507102 -0.444082)))) + ((0.850664 -0.269084)))) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 0) + ((0.878643 -0.255833)) + ((0.98882 0.115252))))) + ((ph_cplace is a) + ((R:SylStructure.parent.syl_out < 13) + ((0.850625 -0.289333)) + ((0.788154 -0.44844))) + ((0.70482 -0.630276)))))))))))) + ((R:Segment.p.ph_ctype is l) + ((R:SylStructure.parent.position_type is single) + ((0.873748 -0.21639)) + ((lisp_coda_stop is 0) + ((0.71002 0.428132)) + ((0.703501 0.015833)))) + ((ph_vlng is 0) + ((R:Segment.p.ph_ctype is r) + ((R:SylStructure.parent.position_type is initial) + ((0.907151 -0.494409)) + ((ph_ctype is s) + ((0.782539 -0.398555)) + ((R:Segment.p.ph_cplace is 0) + ((0.767435 -0.298857)) + ((0.767046 0.151217))))) + ((ph_cplace is a) + ((R:Segment.n.ph_ctype is r) + ((R:SylStructure.parent.R:Syllable.p.stress is 0) + ((0.689367 0.0195991)) + ((0.64446 -0.256648))) + ((R:Segment.n.ph_vc is +) + ((ph_ctype is s) + ((R:Segment.nn.ph_cvox is +) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 1) + ((0.59482 -0.214443)) + ((0.745691 0.0292177))) + ((0.523103 -0.391245))) + ((R:Segment.p.ph_cvox is +) + ((R:Segment.p.ph_cplace is a) + ((0.524304 -0.428306)) + ((0.605117 -0.165604))) + ((R:Segment.p.ph_ctype is f) + ((0.491251 -0.455353)) + ((lisp_coda_stop is 0) + ((R:SylStructure.parent.R:Syllable.n.syl_break is 1) + ((0.175021 -1.02136)) + ((0.264113 -0.976809))) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.3) + ((0.704803 -0.716976)) + ((0.300317 -0.924727))))))) + ((ph_ctype is f) + ((R:SylStructure.parent.syl_out < 13) + ((R:Segment.n.ph_ctype is s) + ((0.731994 -0.711044)) + ((0.768008 -0.415076))) + ((0.691821 -0.803284))) + ((R:Segment.nn.ph_cplace is 0) + ((R:Segment.n.ph_cplace is a) + ((0.569567 -0.993506)) + ((0.689849 -0.761696))) + ((0.386818 -1.14744)))))) + ((R:Segment.p.seg_onsetcoda is coda) + ((R:Segment.p.ph_cplace is a) + ((0.746337 -0.866206)) + ((0.532751 -1.22185))) + ((ph_cplace is l) + ((0.74942 -0.820648)) + ((0.685988 -0.298146)))))) + ((0.812766 0.17291)))))) + ((R:SylStructure.parent.position_type is mid) + ((ph_ctype is r) + ((0.577775 -0.54714)) + ((R:Segment.n.ph_ctype is f) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 0) + ((0.370448 0.00076407)) + ((0.460385 0.20631))) + ((R:Segment.p.ph_cvox is -) + ((ph_vlng is 0) + ((0.615959 -0.57434)) + ((0.50852 -0.197814))) + ((R:Segment.n.ph_ctype is 0) + ((1.34281 0.477163)) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 1) + ((0.59975 -0.1342)) + ((0.640294 -0.32653))))))) + ((R:Segment.n.ph_ctype is f) + ((R:SylStructure.parent.position_type is initial) + ((0.758739 0.311943)) + ((R:Segment.n.seg_onsetcoda is coda) + ((R:Segment.p.ph_ctype is f) + ((1.28746 1.99771)) + ((R:Segment.pp.ph_vfront is 1) + ((1.42474 1.76925)) + ((R:SylStructure.parent.R:Syllable.n.syl_break is 1) + ((0.979414 1.37583)) + ((1.00321 1.06671))))) + ((1.15222 0.852004)))) + ((R:Segment.p.ph_ctype is 0) + ((R:Segment.n.ph_ctype is s) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 1) + ((0.664807 -0.0880262)) + ((0.573589 0.217234))) + ((ph_ctype is s) + ((ph_cplace is l) + ((0.800348 0.66579)) + ((ph_cplace is a) + ((0.859133 1.46854)) + ((R:SylStructure.parent.position_type is single) + ((0.692229 1.23671)) + ((0.552426 0.923928))))) + ((R:SylStructure.parent.syl_out < 9.2) + ((R:SylStructure.parent.position_type is single) + ((R:SylStructure.parent.syl_out < 3.6) + ((1.01673 1.26824)) + ((0.848274 0.92375))) + ((R:SylStructure.parent.R:Syllable.n.syl_break is 1) + ((R:Segment.nn.ph_cplace is a) + ((0.788163 0.818855)) + ((0.822028 1.01227))) + ((0.8365 0.483313)))) + ((lisp_coda_stop is 0) + ((R:Segment.nn.ph_cvox is +) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((0.807795 0.670829)) + ((0.773774 0.435486))) + ((0.849529 0.103561))) + ((0.858848 0.763836)))))) + ((R:Segment.n.ph_vrnd is -) + ((ph_vlng is 0) + ((R:SylStructure.parent.position_type is final) + ((ph_cplace is a) + ((R:Segment.nn.ph_cvox is -) + ((0.691915 -0.42124)) + ((R:Segment.p.ph_cplace is a) + ((0.773696 0.354001)) + ((0.65495 -0.14321)))) + ((0.610433 -0.479739))) + ((R:Segment.p.ph_ctype is r) + ((R:SylStructure.parent.R:Syllable.n.syl_break is 0) + ((0.560921 0.384674)) + ((0.895267 0.746476))) + ((R:Segment.p.ph_ctype is l) + ((0.704694 0.568012)) + ((R:Segment.p.ph_cplace is b) + ((1.34739 0.539049)) + ((R:Segment.p.ph_ctype is s) + ((R:SylStructure.parent.syl_out < 12.9) + ((R:SylStructure.parent.R:Syllable.p.stress is 0) + ((0.807285 0.151429)) + ((0.988033 0.383763))) + ((0.878655 0.102291))) + ((ph_ctype is n) + ((0.759582 -0.315096)) + ((R:SylStructure.parent.syl_out < 8.8) + ((R:Segment.pp.ph_vfront is 0) + ((0.846546 0.000647117)) + ((R:Segment.pp.ph_vfront is 1) + ((0.586216 0.150701)) + ((0.793898 0.379041)))) + ((lisp_coda_stop is 0) + ((ph_ctype is f) + ((0.74736 -0.31103)) + ((0.715751 -0.00576581))) + ((0.914486 0.17528)))))))))) + ((1.24204 0.908819))) + ((ph_ctype is s) + ((ph_cplace is a) + ((0.864408 1.35528)) + ((R:Segment.n.seg_onsetcoda is coda) + ((0.85602 0.344576)) + ((0.869622 0.659223)))) + ((R:Segment.nn.ph_cvox is -) + ((R:Segment.n.ph_ctype is s) + ((R:Segment.nn.ph_cplace is 0) + ((0.942964 1.27475)) + ((0.978218 0.650268))) + ((R:SylStructure.parent.syl_out < 3.9) + ((R:SylStructure.parent.R:Syllable.p.stress is 0) + ((1.32463 1.05026)) + ((0.896966 0.417727))) + ((R:Segment.p.ph_cplace is a) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 0) + ((0.776698 0.195369)) + ((0.969518 0.432394))) + ((0.799096 -0.0203318))))) + ((ph_cplace is a) + ((R:SylStructure.parent.R:Syllable.p.stress is 0) + ((0.680861 -0.315846)) + ((R:SylStructure.parent.R:Syllable.nn.syl_break is 1) + ((0.954393 0.0965487)) + ((0.884928 0.372884)))) + ((lisp_coda_stop is 0) + ((R:SylStructure.parent.R:Syllable.p.stress is 0) + ((R:SylStructure.parent.position_type is final) + ((1.03696 0.565834)) + ((0.906661 0.277961))) + ((R:SylStructure.parent.position_type is final) + ((0.778429 -0.0967381)) + ((0.863993 0.314023)))) + ((R:Segment.p.ph_cplace is a) + ((R:SylStructure.parent.R:Syllable.p.stress is 0) + ((0.898898 0.571009)) + ((0.830278 0.787486))) + ((1.1101 0.333888))))))))))))) +;; RMSE 0.7726 Correlation is 0.5943 Mean (abs) Error 0.5752 (0.5160) + +)) + +(provide 'f2bdurtreeZ) diff --git a/lib/f2bf0lr.scm b/lib/f2bf0lr.scm new file mode 100644 index 0000000..6a06671 --- /dev/null +++ b/lib/f2bf0lr.scm @@ -0,0 +1,314 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; First attempt at a linear regression model to predict F0 values. +;;; This is an attempt to reimplement the work in Black and +;;; Hunt ICSLP96, though this model probably isn't as good. +;;; + +;;;start +;;; R2 = 0.251, F(74, 12711) = 57.5, Prob>F = 0.000 +;;; RMSE = 27.877 +;;;mid +;;; R2 = 0.332, F(74, 12711) = 85.6, Prob>F = 0.000 +;;; RMSE = 28.293 +;;;end +;;; R2 = 0.292, F(74, 12711) = 70.8, Prob>F = 0.000 +;;; RMSE = 27.139 + +(define (emph_syl syl) + (if (string-equal (item.feat syl "tobi_accent") "NONE") + 0.0 + (if (string-equal (item.feat + syl "R:SylStructure.parent.R:Token.parent.EMPH") "1") + 2.0 + 0.0))) + +(set! f2b_f0_lr_start +'( +( Intercept 160.584956 ) +( R:SylStructure.parent.R:Token.parent.EMPH 10.0 ) +( pp.tobi_accent 10.081770 (H*) ) +( pp.tobi_accent 3.358613 (!H*) ) +( pp.tobi_accent 4.144342 (*? X*? H*!H* * L+H* L+!H*) ) +( pp.tobi_accent -1.111794 (L*) ) +( pp.tobi_accent 19.646313 (L*+H L*+!H) ) +( p.tobi_accent 32.081029 (H*) ) +( p.tobi_accent 18.090033 (!H*) ) +( p.tobi_accent 23.255280 (*? X*? H*!H* * L+H* L+!H*) ) +( p.tobi_accent -9.623577 (L*) ) +( p.tobi_accent 26.517095 (L*+H L*+!H) ) +( tobi_accent 5.221081 (H*) ) +( tobi_accent 10.159194 (!H*) ) +( tobi_accent 3.645511 (*? X*? H*!H* * L+H* L+!H*) ) +( tobi_accent -5.720030 (L*) ) +( tobi_accent -6.355773 (L*+H L*+!H) ) +( n.tobi_accent -5.691933 (H*) ) +( n.tobi_accent 8.265606 (!H*) ) +( n.tobi_accent 0.861427 (*? X*? H*!H* * L+H* L+!H*) ) +( n.tobi_accent 1.270504 (L*) ) +( n.tobi_accent 3.499418 (L*+H L*+!H) ) +( nn.tobi_accent -3.785701 (H*) ) +( nn.tobi_accent 7.013446 (!H*) ) +( nn.tobi_accent 2.637494 (*? X*? H*!H* * L+H* L+!H*) ) +( nn.tobi_accent -0.392176 (L*) ) +( nn.tobi_accent -2.957502 (L*+H L*+!H) ) +( pp.tobi_endtone -3.531153 (L-L%) ) +( pp.tobi_endtone 0.131156 (L-) ) +( pp.tobi_endtone 2.729199 (H-L% !H-L% -X?) ) +( pp.tobi_endtone 8.258756 (L-H%) ) +( pp.tobi_endtone 5.836487 (H-) ) +( pp.tobi_endtone 11.213440 (!H- H-H%) ) +( R:Syllable.p.tobi_endtone -28.081359 (L-L%) ) +( R:Syllable.p.tobi_endtone -20.553145 (L-) ) +( R:Syllable.p.tobi_endtone -5.442577 (H-L% !H-L% -X?) ) +( R:Syllable.p.tobi_endtone -6.585836 (L-H%) ) +( R:Syllable.p.tobi_endtone 8.537044 (H-) ) +( R:Syllable.p.tobi_endtone 4.243342 (!H- H-H%) ) +( tobi_endtone -9.333926 (L-L%) ) +( tobi_endtone -0.346711 (L-) ) +( tobi_endtone -0.507352 (H-L% !H-L% -X?) ) +( tobi_endtone -0.937483 (L-H%) ) +( tobi_endtone 9.472265 (H-) ) +( tobi_endtone 14.256898 (!H- H-H%) ) +( n.tobi_endtone -13.084253 (L-L%) ) +( n.tobi_endtone -1.060688 (L-) ) +( n.tobi_endtone -7.947205 (H-L% !H-L% -X?) ) +( n.tobi_endtone -5.471592 (L-H%) ) +( n.tobi_endtone -0.095669 (H-) ) +( n.tobi_endtone 4.933708 (!H- H-H%) ) +( nn.tobi_endtone -14.993470 (L-L%) ) +( nn.tobi_endtone -3.784284 (L-) ) +( nn.tobi_endtone -15.505132 (H-L% !H-L% -X?) ) +( nn.tobi_endtone -11.352400 (L-H%) ) +( nn.tobi_endtone -5.551627 (H-) ) +( nn.tobi_endtone -0.661581 (!H- H-H%) ) +( pp.old_syl_break -3.367677 ) +( p.old_syl_break 0.641755 ) +( old_syl_break -0.659002 ) +( n.old_syl_break 1.217358 ) +( nn.old_syl_break 2.974502 ) +( pp.stress 1.588098 ) +( p.stress 3.693430 ) +( stress 2.009843 ) +( n.stress 1.645560 ) +( nn.stress 1.926870 ) +( syl_in 1.048362 ) +( syl_out 0.315553 ) +( ssyl_in -2.096079 ) +( ssyl_out 0.303531 ) +( asyl_in -4.257915 ) +( asyl_out -2.422424 ) +( last_accent -0.397647 ) +( next_accent -0.418613 ) +( sub_phrases -5.472055 ) +)) + +(set! f2b_f0_lr_mid +'( +( Intercept 169.183377 ) +( R:SylStructure.parent.R:Token.parent.EMPH 10.0 ) +( pp.tobi_accent 4.923247 (H*) ) +( pp.tobi_accent 0.955474 (!H*) ) +( pp.tobi_accent 1.193597 (*? X*? H*!H* * L+H* L+!H*) ) +( pp.tobi_accent 1.501383 (L*) ) +( pp.tobi_accent 7.992120 (L*+H L*+!H) ) +( p.tobi_accent 16.603350 (H*) ) +( p.tobi_accent 11.665814 (!H*) ) +( p.tobi_accent 13.063298 (*? X*? H*!H* * L+H* L+!H*) ) +( p.tobi_accent -2.288798 (L*) ) +( p.tobi_accent 29.168430 (L*+H L*+!H) ) +( tobi_accent 34.517868 (H*) ) +( tobi_accent 22.349656 (!H*) ) +( tobi_accent 23.551548 (*? X*? H*!H* * L+H* L+!H*) ) +( tobi_accent -14.117284 (L*) ) +( tobi_accent -5.978760 (L*+H L*+!H) ) +( n.tobi_accent -1.914945 (H*) ) +( n.tobi_accent 5.249441 (!H*) ) +( n.tobi_accent -1.929947 (*? X*? H*!H* * L+H* L+!H*) ) +( n.tobi_accent -3.287877 (L*) ) +( n.tobi_accent -4.980375 (L*+H L*+!H) ) +( nn.tobi_accent -6.147251 (H*) ) +( nn.tobi_accent 8.408949 (!H*) ) +( nn.tobi_accent 3.193500 (*? X*? H*!H* * L+H* L+!H*) ) +( nn.tobi_accent 1.323099 (L*) ) +( nn.tobi_accent 9.148058 (L*+H L*+!H) ) +( pp.tobi_endtone 4.255273 (L-L%) ) +( pp.tobi_endtone -1.033377 (L-) ) +( pp.tobi_endtone 11.992045 (H-L% !H-L% -X?) ) +( pp.tobi_endtone 6.989573 (L-H%) ) +( pp.tobi_endtone 2.598854 (H-) ) +( pp.tobi_endtone 12.178307 (!H- H-H%) ) +( R:Syllable.p.tobi_endtone -4.397973 (L-L%) ) +( R:Syllable.p.tobi_endtone -6.157077 (L-) ) +( R:Syllable.p.tobi_endtone 5.530608 (H-L% !H-L% -X?) ) +( R:Syllable.p.tobi_endtone 6.938086 (L-H%) ) +( R:Syllable.p.tobi_endtone 6.162763 (H-) ) +( R:Syllable.p.tobi_endtone 8.035727 (!H- H-H%) ) +( tobi_endtone -19.357902 (L-L%) ) +( tobi_endtone -13.877759 (L-) ) +( tobi_endtone -6.176061 (H-L% !H-L% -X?) ) +( tobi_endtone -7.328882 (L-H%) ) +( tobi_endtone 12.694193 (H-) ) +( tobi_endtone 30.923398 (!H- H-H%) ) +( n.tobi_endtone -17.727785 (L-L%) ) +( n.tobi_endtone -2.539592 (L-) ) +( n.tobi_endtone -8.126830 (H-L% !H-L% -X?) ) +( n.tobi_endtone -8.701685 (L-H%) ) +( n.tobi_endtone -1.006439 (H-) ) +( n.tobi_endtone 6.834498 (!H- H-H%) ) +( nn.tobi_endtone -15.407530 (L-L%) ) +( nn.tobi_endtone -2.974196 (L-) ) +( nn.tobi_endtone -12.287673 (H-L% !H-L% -X?) ) +( nn.tobi_endtone -7.621437 (L-H%) ) +( nn.tobi_endtone -0.458837 (H-) ) +( nn.tobi_endtone 3.170632 (!H- H-H%) ) +( pp.old_syl_break -4.196950 ) +( p.old_syl_break -5.176929 ) +( old_syl_break 0.047922 ) +( n.old_syl_break 2.153968 ) +( nn.old_syl_break 2.577074 ) +( pp.stress -2.368192 ) +( p.stress 1.080493 ) +( stress 1.135556 ) +( n.stress 2.447219 ) +( nn.stress 1.318122 ) +( syl_in 0.291663 ) +( syl_out -0.411814 ) +( ssyl_in -1.643456 ) +( ssyl_out 0.580589 ) +( asyl_in -5.649243 ) +( asyl_out 0.489823 ) +( last_accent 0.216634 ) +( next_accent 0.244134 ) +( sub_phrases -5.758156 ) +)) + + +(set! f2b_f0_lr_end +'( +( Intercept 169.570381 ) +( R:SylStructure.parent.R:Token.parent.EMPH 10.0 ) +( pp.tobi_accent 3.594771 (H*) ) +( pp.tobi_accent 0.432519 (!H*) ) +( pp.tobi_accent 0.235664 (*? X*? H*!H* * L+H* L+!H*) ) +( pp.tobi_accent 1.513892 (L*) ) +( pp.tobi_accent 2.474823 (L*+H L*+!H) ) +( p.tobi_accent 11.214208 (H*) ) +( p.tobi_accent 9.619350 (!H*) ) +( p.tobi_accent 9.084690 (*? X*? H*!H* * L+H* L+!H*) ) +( p.tobi_accent 0.519202 (L*) ) +( p.tobi_accent 26.593112 (L*+H L*+!H) ) +( tobi_accent 25.217589 (H*) ) +( tobi_accent 13.759851 (!H*) ) +( tobi_accent 17.635192 (*? X*? H*!H* * L+H* L+!H*) ) +( tobi_accent -12.149974 (L*) ) +( tobi_accent 13.345913 (L*+H L*+!H) ) +( n.tobi_accent 4.944848 (H*) ) +( n.tobi_accent 7.398383 (!H*) ) +( n.tobi_accent 1.683011 (*? X*? H*!H* * L+H* L+!H*) ) +( n.tobi_accent -6.516900 (L*) ) +( n.tobi_accent -6.768201 (L*+H L*+!H) ) +( nn.tobi_accent -4.335797 (H*) ) +( nn.tobi_accent 5.656462 (!H*) ) +( nn.tobi_accent 0.263288 (*? X*? H*!H* * L+H* L+!H*) ) +( nn.tobi_accent 1.022002 (L*) ) +( nn.tobi_accent 6.702368 (L*+H L*+!H) ) +( pp.tobi_endtone 10.274958 (L-L%) ) +( pp.tobi_endtone 3.129947 (L-) ) +( pp.tobi_endtone 15.476240 (H-L% !H-L% -X?) ) +( pp.tobi_endtone 10.446935 (L-H%) ) +( pp.tobi_endtone 6.104384 (H-) ) +( pp.tobi_endtone 14.182688 (!H- H-H%) ) +( R:Syllable.p.tobi_endtone 1.767454 (L-L%) ) +( R:Syllable.p.tobi_endtone -1.040077 (L-) ) +( R:Syllable.p.tobi_endtone 18.438093 (H-L% !H-L% -X?) ) +( R:Syllable.p.tobi_endtone 8.750018 (L-H%) ) +( R:Syllable.p.tobi_endtone 5.000340 (H-) ) +( R:Syllable.p.tobi_endtone 10.913437 (!H- H-H%) ) +( tobi_endtone -12.637935 (L-L%) ) +( tobi_endtone -13.597961 (L-) ) +( tobi_endtone -6.501965 (H-L% !H-L% -X?) ) +( tobi_endtone 8.747483 (L-H%) ) +( tobi_endtone 15.165833 (H-) ) +( tobi_endtone 50.190326 (!H- H-H%) ) +( n.tobi_endtone -16.965781 (L-L%) ) +( n.tobi_endtone -5.222475 (L-) ) +( n.tobi_endtone -7.358555 (H-L% !H-L% -X?) ) +( n.tobi_endtone -7.833168 (L-H%) ) +( n.tobi_endtone 4.701087 (H-) ) +( n.tobi_endtone 10.349902 (!H- H-H%) ) +( nn.tobi_endtone -15.369483 (L-L%) ) +( nn.tobi_endtone -2.207161 (L-) ) +( nn.tobi_endtone -9.363835 (H-L% !H-L% -X?) ) +( nn.tobi_endtone -7.052374 (L-H%) ) +( nn.tobi_endtone 2.207854 (H-) ) +( nn.tobi_endtone 5.271546 (!H- H-H%) ) +( pp.old_syl_break -4.745862 ) +( p.old_syl_break -5.685178 ) +( old_syl_break -2.633291 ) +( n.old_syl_break 1.678340 ) +( nn.old_syl_break 2.274729 ) +( pp.stress -2.747198 ) +( p.stress 0.306724 ) +( stress -0.565613 ) +( n.stress 2.838327 ) +( nn.stress 1.285244 ) +( syl_in 0.169955 ) +( syl_out -1.045661 ) +( ssyl_in -1.487774 ) +( ssyl_out 0.752405 ) +( asyl_in -5.081677 ) +( asyl_out 3.016218 ) +( last_accent 0.312900 ) +( next_accent 0.837992 ) +( sub_phrases -5.397805 ) + +)) + +;; groups +;; tobi_accent_1 25.217589 (H*) ) +;; tobi_accent_2 13.759851 (!H*) ) +;; tobi_accent_3 17.635192 (*? X*? H*!H* * L+H* L+!H*) ) +;; tobi_accent_4 -12.149974 (L*) ) +;; tobi_accent_5 13.345913 (L*+H L*+!H) ) + +;; tobi_endtone_1 10.274958 (L-L%) ) +;; tobi_endtone_2 3.129947 (L-) ) +;; tobi_endtone_3 15.476240 (H-L% !H-L% -X?) ) +;; tobi_endtone_4 10.446935 (L-H%) ) +;; tobi_endtone_5 6.104384 (H-) ) +;; tobi_endtone_6 14.182688 (!H- H-H%) ) + +(provide 'f2bf0lr) + diff --git a/lib/festdoc.scm b/lib/festdoc.scm new file mode 100644 index 0000000..13bc5dd --- /dev/null +++ b/lib/festdoc.scm @@ -0,0 +1,178 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Author: Alan W Black +;;; Date: August 1996 +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Save documentation strings as texinfo files +;;; +;;; Finds all functions with documentation, and all variables with +;;; documentation, sorts and dumps the information in doc/festfunc.texi +;;; and doc/festvars.texi +;;; +;;; The makefile in the doc directory runs the compiled festival binary and +;;; causes these files to be created form the currently defined functions +;;; and variables +;;; +;;; Also provides function to extract manual section for documentation +;;; string and send a url to Netscape to display it +;;; + +(define (make-doc) +"(make-doc) +Find function and variable document strings and save them in texinfo +format to respective files." + (format t "Making function, feature and variable lists\n") + + ;; Need to ensure all library files are actually loaded if they contain + ;; funcstions/variables which have to be put in the manual + (require 'display) + (require 'mbrola) + (require 'tilt) + + (make-a-doc "festfunc.texi" 'function) + (make-a-doc "festfeat.texi" 'features) + (make-a-doc "festvars.texi" 'vars)) + +(define (make-a-doc outfile doclist) +"(make-a-doc FILENAME DOCLIST) +Make a texinfo document in FILENAME as a texinfo table, items are +from DOCLIST. DOCLIST names which doclist to use, it may be +one of 'function, 'features or 'vars." + (let ((outfp (fopen outfile "wb"))) + (format outfp "@table @code\n") + ;; Yes I am so lazy I'm not willing to write a sort function in Scheme + (sort-and-dump-docstrings doclist outfp) + (format outfp "@end table\n") + (fclose outfp))) + +;;; +;;; Documentation string may refer to a section in the manual +;;; If it does then we can automatically go to that section in the +;;; menu using Netscape. +;;; + +(defvar manual-browser "netscape" +"manual-browser +The Unix program name of your Netscape Navigator browser. +[see Getting some help]") + +(defvar manual-url + (format nil "http://www.cstr.ed.ac.uk/projects/festival/manual-%s.%s.%s/" + (car festival_version_number) + (car (cdr festival_version_number)) + (car (cdr (cdr festival_version_number)))) +"manual-url +The default URL for the Festival Manual in html format. You may +reset this to a file://.../... type URL on you're local machine. +[see Getting some help]") + +;;; Paul got this idea from VM, the email system for emacs and +;;; I found out how to do this from their code, thanks Kyle + +(define (send-url-to-netscape url) +"(send-url-to-netscape URL) +Send given URL to netscape for display. This is primarily used to +display parts of the manual referenced in documentation strings." + (system + (string-append + manual-browser + " -remote \"openURL( " + url + " )\" "))) + +(define (lastline string) +"(lastline STRING) +Returns the part of the string which between the last newline and the +end of string." + (let ((ns (string-after string "\n"))) + (if (string-equal ns "") + string + (lastline ns)))) + +(define (manual-sym symbol) +"(manual-sym SYMBOL) +Display the section in the manual that SYMBOL's docstring has +identified as the most relevant. The section is named on the +last line of a documentation string with no newlines within it +prefixed by \"[see \" with a \"]\" just immediately before the end +of the documentation string. The manual section name is translated to +the section in the HTML version of the manual and a URL is +and sent to Netscape for display. [see Getting some help]" +(let ((section (string-before (string-after + (lastline (eval (list 'doc symbol))) + "[see ") + "]"))) + (cond + ((string-equal section "") + (eval (list 'doc symbol))) ;; nothing there + (t + (manual section))))) + +(define (manual section) +"(manual SECTION) +Display SECTION in the manual. SECTION is a string identifying +a manual section (it could be an initial substring. If SECTION +is nil or unspecifed then the Manual table of contents is displayed. +This uses netscape to display the manual page so you must have that +(use variable manual-browser to identify it) and the variable +manual-url pointing to a copy of the manual. [see Getting some help]" +(let ((tmpfile (make_tmp_filename)) + (manual-section)) + (cond + ((string-matches section "\"") + (string-append "Invalid section reference containing quote: " + section "\n")) + ((not section) + (send-url-to-netscape (string-append manual-url "festival_toc.html"))) + (t ;; find section in manual + (get_url (string-append manual-url "festival_toc.html") tmpfile) + (system + (string-append + "grep -i \"^
  • .*$//' > \"" + tmpfile ".out\"")) + (set! manual-section (load (string-append tmpfile ".out") t)) + (cond + ((not manual-section) + (string-append "No section called: " section)) + (t + (send-url-to-netscape (string-append manual-url (car manual-section))) + (delete-file tmpfile) + (delete-file (string-append tmpfile ".out")) + "Sent manual reference url to netscape.")))))) + +(provide 'festdoc) + + + + diff --git a/lib/festival.el b/lib/festival.el new file mode 100644 index 0000000..c1899f6 --- /dev/null +++ b/lib/festival.el @@ -0,0 +1,282 @@ +;;; +;;; File: festival.el +;;; Emacs Lisp +;;; +;;; Alan W Black CSTR (awb@cstr.ed.ac.uk) June 1996 +;;; +;;; Provide an emacs mode for interfacing to the festival speech +;;; synthesizer system +;;; +;;; I've looked at many examples from the emacs Lisp directory +;;; copying relevant bits from here and there, so this can only +;;; reasonably inherit the GNU licence (GPL) +;;; +;;; Setup: +;;; In your .emacs add the following 2 lines to get a Say menu: +;;; +;;; (autoload 'say-minor-mode "festival" "Menu for using Festival." t) +;;; (say-minor-mode t) +;;; (setq auto-mode-alist +;;; (append '(("\\.festivalrc$" . scheme-mode)) auto-mode-alist)) +;;; +;;; The following gives you pretty colors in emacs-19 if you are into +;;; such things +;;; ;;; Some colors for scheme mode +;;; (hilit-set-mode-patterns +;;; '(scheme-mode) +;;; '( +;;; (";.*" nil comment) +;;; (hilit-string-find ?\\ string) +;;; ("^\\s *(def\\s +" "\\()\\|nil\\)" defun) +;;; ("^\\s *(defvar\\s +\\S +" nil decl) +;;; ("^\\s *(set\\s +\\S +" nil decl) +;;; ("^\\s *(defconst\\s +\\S +" nil define) +;;; ("^\\s *(\\(provide\\|require\\).*$" nil include) +;;; ("(\\(let\\*?\\|cond\\|if\\|or\\|and\\|map\\(car\\|concat\\)\\|prog[n1*]?\\|while\\|lambda\\|function\\|Parameter\\|set\\([qf]\\|car\\|cdr\\)?\\|nconc\\|eval-when-compile\\|condition-case\\|unwind-protect\\|catch\\|throw\\|error\\)[ \t\n]" 1 keyword))) +;;; +;;; +;;;-------------------------------------------------------------------- +;;; Copyright (C) Alan W Black 1996 +;;; This code is distributed in the hope that it will be useful, +;;; but WITHOUT ANY WARRANTY. No author or distributor accepts +;;; responsibility to anyone for the consequences of using this code +;;; or for whether it serves any particular purpose or works at all, +;;; unless explicitly stated in a written agreement. +;;; +;;; Everyone is granted permission to copy, modify and redistribute +;;; this code, but only under the conditions described in the GNU +;;; Emacs General Public License. A copy of this license is +;;; distrubuted with GNU Emacs so you can know your rights and +;;; responsibilities. It should be in a file named COPYING. Among +;;; other things, the copyright notice and this notice must be +;;; preserved on all copies. +;;;-------------------------------------------------------------------- +;;; + +(defvar festival-program-name "festival") + +(defvar festival-process nil) + +(defvar festival-tmp-file + (format "/tmp/festival-emacs-tmp-%s" (user-real-login-name)) + "Filename to save input for Festivial.") + +(defun festival-fast () + (interactive) + (festival-send-command '(Parameter.set 'Duration.Stretch 0.8))) +(defun festival-slow () + (interactive) + (festival-send-command '(Parameter.set 'Duration.Stretch 1.2))) +(defun festival-ndur () + (interactive) + (festival-send-command '(Parameter.set 'Duration.Stretch 1.0))) +(defun festival-intro () + (interactive) + (festival-send-command '(intro))) + +(defun festival-gsw () + (interactive) + (festival-send-command '(voice_gsw_diphone))) +(defun festival-rab () + (interactive) + (festival-send-command '(voice_rab_diphone))) +(defun festival-ked () + (interactive) + (festival-send-command '(voice_ked_diphone))) +(defun festival-kal () + (interactive) + (festival-send-command '(voice_kal_diphone))) +(defun festival-don () + (interactive) + (festival-send-command '(voice_don_diphone))) +(defun festival-welsh () + (interactive) + (festival-send-command '(voice_welsh_hl))) +(defun festival-spanish () + (interactive) + (festival-send-command '(voice_spanish_el))) + +(defun festival-say-string (string) + "Send string to festival and have it said" + (interactive "sSay: ") + (festival-start-process) + (process-send-string festival-process + (concat "(SayText " (format "%S" string) ") +"))) + +(defun festival-send-command (cmd) + "Send command to festival" + (interactive "px") + (festival-start-process) + (process-send-string festival-process (format "%S +" cmd))) + +(defun festival-process-status () + (interactive) + (if festival-process + (message (format "Festival process status: %s" + (process-status festival-process))) + (message (format "Festival process status: NONE")))) + +(defun festival-start-process () + "Check status of process and start it if necessary" + (interactive ) + (let ((process-connection-type t)) + (if (and festival-process + (eq (process-status festival-process) 'run)) + 't + ;;(festival-kill-festival t) + (message "Starting new synthesizer process...") + (sit-for 0) + (setq festival-process + (start-process "festival" (get-buffer-create "*festival*") + festival-program-name))) + )) + +(defun festival-kill-process () + "Kill festival sub-process" + (interactive) + (if festival-process + (kill-process festival-process)) + (setq festival-process nil) + (message "Festival process killed")) + +(defun festival-send-string (string) + "Send given string to fesitval process." + (interactive) + (festival-start-process) + (process-send-string festival-process string)) + +(defun festival-say-region (reg-start reg-end) + "Send given region to festival for saying. This saves the region +as a file in /tmp and then tells festival to say that file. The +major mode is *not* passed as text mode name to Festival." + (interactive "r") + (write-region reg-start reg-end festival-tmp-file) + (festival-send-command (list 'tts festival-tmp-file nil))) + +(defun festival-say-buffer () + "Send given region to festival for saying. This saves the region +as a file in /tmp and then tells festival to say that file. The +major-mode is passed as a text mode to Festival." + (interactive) + (write-region (point-min) (point-max) festival-tmp-file) + ;; Because there may by sgml-like sub-files mentioned + ;; ensure festival tracks the buffer's default-directory + (festival-send-command (list 'cd (expand-file-name default-directory))) + (if (equal "-mode" (substring (format "%S" major-mode) -5 nil)) + (if (equal "sgml" (substring (format "%S" major-mode) 0 -5)) + (festival-send-command + (list 'tts festival-tmp-file "sable")) + (festival-send-command + (list 'tts festival-tmp-file + (substring (format "%S" major-mode) 0 -5)))) + (festival-send-command (list 'tts festival-tmp-file nil)))) + +;; +;; say-minor-mode provides a menu offering various speech synthesis commands +;; +(defvar say-minor-mode nil) + +(defun say-minor-mode (arg) + "Toggle say minor mode. +With arg, turn say-minor-mode on iff arg is positive." + (interactive "P") + (setq say-minor-mode + (if (if (null arg) (not say-minor-mode) + (> (prefix-numeric-value arg) 0)) + t)) + (force-mode-line-update)) + +(setq say-params-menu (make-sparse-keymap "Pitch/Duration")) +(fset 'say-params-menu (symbol-value 'say-params-menu)) +(define-key say-params-menu [say-fast] '("Fast" . festival-fast)) +(define-key say-params-menu [say-slow] '("Slow" . festival-slow)) +(define-key say-params-menu [say-ndur] '("Normal Dur" . festival-ndur)) + +(setq say-lang-menu (make-sparse-keymap "Select language")) +(fset 'say-lang-menu (symbol-value 'say-lang-menu)) +(define-key say-lang-menu [say-lang-spain1] '("Spanish el" . festival-spanish)) +(define-key say-lang-menu [say-lang-welsh1] '("Welsh hl" . festival-welsh)) +(define-key say-lang-menu [say-lang-eng5] '("English gsw" . festival-gsw)) +(define-key say-lang-menu [say-lang-eng4] '("English don" . festival-don)) +(define-key say-lang-menu [say-lang-eng3] '("English rab" . festival-rab)) +(define-key say-lang-menu [say-lang-eng2] '("English ked" . festival-ked)) +(define-key say-lang-menu [say-lang-eng1] '("English kal" . festival-kal)) +;(define-key say-params-menu [say-set-dur-stretch] +; '("Set Duration Stretch" . festival-set-dur-stretch)) +;(define-key say-params-menu [say-high] '("High" . festival-high)) +;(define-key say-params-menu [say-low] '("Low" . festival-low)) +;(define-key say-params-menu [say-npit] '("Normal Pitch" . festival-npit)) +;(define-key say-params-menu [say-set-pitch-stretch] +; '("Set Pitch Stretch" . festival-set-pitch-stretch)) + +(setq say-minor-mode-map (make-sparse-keymap)) +(setq say-menu (make-sparse-keymap "SAY")) +(define-key say-minor-mode-map [menu-bar SAY] (cons "Say" say-menu)) +(define-key say-minor-mode-map [menu-bar SAY festival-intro] '("Festival Intro" . festival-intro)) +(define-key say-minor-mode-map [menu-bar SAY festival-process-status] '("Festival status" . festival-process-status)) +(define-key say-minor-mode-map [menu-bar SAY festival-kill-process] '("Kill Festival" . festival-kill-process)) +(define-key say-minor-mode-map [menu-bar SAY festival-start-process] '("(Re)start Festival" . festival-start-process)) +;;(define-key say-menu [separator-process] '("--")) +;;(define-key say-menu [params] '("Pitch/Durations" . say-params-menu)) +(define-key say-menu [separator-buffers] '("--")) +(define-key say-menu [festival-send-command] '("Festival eval command" . festival-send-command)) +(define-key say-menu [say-lang-menu] '("Select language" . say-lang-menu)) +(define-key say-menu [festival-say-buffer] '("Say buffer" . festival-say-buffer)) +(define-key say-menu [festival-say-region] '("Say region" . festival-say-region)) + + +(setq minor-mode-map-alist + (cons + (cons 'say-minor-mode say-minor-mode-map) + minor-mode-map-alist)) + +(or (assq 'say-minor-mode minor-mode-alist) + (setq minor-mode-alist + (cons '(say-minor-mode "") minor-mode-alist))) + +;;; +;;; A FESTIVAL inferior mode (copied from prolog.el) +;;; +(defvar inferior-festival-mode-map nil) + +(defun inferior-festival-mode () + "Major mode for interacting with an inferior FESTIVAL process. + +The following commands are available: +\\{inferior-festival-mode-map} + +Entry to this mode calls the value of `festival-mode-hook' with no arguments, +if that value is non-nil. Likewise with the value of `comint-mode-hook'. +`festival-mode-hook' is called after `comint-mode-hook'. + +You can send text to the inferior FESTIVAL from other buffers +using the commands `send-region', `send-string' + +Return at end of buffer sends line as input. +Return not at end copies rest of line to end and sends it. +\\[comint-kill-input] and \\[backward-kill-word] are kill commands, imitating normal Unix input editing. +\\[comint-interrupt-subjob] interrupts the shell or its current subjob if any. +\\[comint-stop-subjob] stops. \\[comint-quit-subjob] sends quit signal." + (interactive) + (require 'comint) + (comint-mode) + (setq major-mode 'inferior-festival-mode + mode-name "Inferior FESTIVAL" + comint-prompt-regexp "^festival> ") + (if inferior-festival-mode-map nil + (setq inferior-festival-mode-map (copy-keymap comint-mode-map)) + (festival-mode-commands inferior-festival-mode-map)) + (use-local-map inferior-festivalr-mode-map) + (run-hooks 'festival-mode-hook)) + +;;;###autoload +(defun run-festival () + "Run an inferior FESTIVAL process, input and output via buffer *festival*." + (interactive) + (require 'comint) + (switch-to-buffer (make-comint "festival" festival-program-name)) + (inferior-festival-mode)) + +(provide 'festival) diff --git a/lib/festival.scm b/lib/festival.scm new file mode 100644 index 0000000..ead1498 --- /dev/null +++ b/lib/festival.scm @@ -0,0 +1,633 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; General Festival Scheme specific functions +;;; Including definitions of various standard variables. + +;; will be set automatically on start-up +(defvar festival_version "unknown" + "festival_version + A string containing the current version number of the system.") + +;; will be set automatically on start-up +(defvar festival_version_number '(x x x) + "festival_version_number + A list of major, minor and subminor version numbers of the current + system. e.g. (1 0 12).") + +(define (apply_method method utt) +"(apply_method METHOD UTT) +Apply the appropriate function to utt defined in parameter." + (let ((method_val (Parameter.get method))) + (cond + ((null method_val) + nil) ;; should be an error, but I'll let you off at present + ((and (symbol? method_val) (symbol-bound? method_val)) + (apply (symbol-value method_val) (list utt))) + ((member (typeof method_val) '(subr closure)) + (apply method_val (list utt))) + (t ;; again is probably an error + nil)))) + +(define (require_module l) + "(require_module l) +Check that certain compile-time modules are included in this installation. +l may be a single atom or list of atoms. Each item in l must appear in +*modules* otherwise an error is throw." + (if (consp l) + (mapcar require_module l) + (if (not (member_string l *modules*)) + (error (format nil "module %s required, but not compiled in this installation\n" l)))) + t) + +;;; Feature Function Functions +(define (utt.features utt relname func_list) +"(utt.features UTT RELATIONNAME FUNCLIST) + Get vectors of feature values for each item in RELATIONNAME in UTT. + [see Features]" + (mapcar + (lambda (s) + (mapcar (lambda (f) (item.feat s f)) func_list)) + (utt.relation.items utt relname))) + +(define (utt.type utt) +"(utt.type UTT) + Returns the type of UTT." + (intern (utt.feat utt 'type))) + +(define (utt.save.segs utt filename) +"(utt.save.segs UTT FILE) + Save segments of UTT in a FILE in xlabel format." + (let ((fd (fopen filename "w"))) + (format fd "#\n") + (mapcar + (lambda (info) + (format fd "%2.4f 100 %s\n" (car info) (car (cdr info)))) + (utt.features utt 'Segment '(segment_end name))) + (fclose fd) + utt)) + +(define (utt.save.words utt filename) +"(utt.save.words UTT FILE) + Save words of UTT in a FILE in xlabel format." + (let ((fd (fopen filename "w"))) + (format fd "#\n") + (mapcar + (lambda (info) + (format fd "%2.4f 100 %s\n" (car info) (car (cdr info)))) + (utt.features utt 'Word '(word_end name))) + (fclose fd) + utt)) + +(define (utt.resynth labfile f0file) +"(utt.resynth LABFILE F0FILE) +Resynthesize an utterance from a label file and F0 file (in any format +supported by the Speech Tool Library). This loads, synthesizes and +plays the utterance." + (let (u f0 f0_item) + (set! u (Utterance SegF0)) ; need some u to start with + (utt.relation.load u 'Segment labfile) + (utt.relation.create u 'f0) + (set! f0 (track.load f0file)) + (set! f0_item (utt.relation.append u 'f0)) + (item.set_feat f0_item "name" "f0") + (item.set_feat f0_item "f0" f0) + + ;; emulabel may have flipped pau to H# + (mapcar + (lambda (s) + (cond + ((string-matches (item.name s) "[hH]#") + (item.set_feat s "name" "pau")) + ((string-matches (item.name s) "#.*") + (item.set_feat s "name" (string-after (item.name s) "#"))))) + (utt.relation.items u 'Segment)) + + (Wave_Synth u) + (utt.play u) + u)) + +(define (utt.relation.present utt relation) +"(utt.relation.present UTT RELATIONNAME) +Returns t if UTT caontains a relation called RELATIONNAME, nil otherwise." + (if (member_string relation (utt.relationnames utt)) + t + nil)) + +(define (utt.relation.leafs utt relation) +"(utt.relation.leafs UTT RELATIONNAME) +Returns a list of all the leafs in this relation." + (let ((leafs nil)) + (mapcar + (lambda (i) + (if (not (item.down (item.relation i relation))) + (set! leafs (cons i leafs)))) + (utt.relation.items utt relation)) + (reverse leafs))) + +(define (utt.relation.first utt relation) +"(utt.relation.first UTT RELATIONNAME) +Returns a the first item in this relation." + (utt.relation utt relation)) + +(define (utt.relation.last utt relation) +"(utt.relation.last UTT RELATIONNAME) +Returns a the last item in this relation." + (let ((i (utt.relation.first utt relation))) + (while (item.next i) + (set! i (item.next i))) + i)) + +(define (item.feat.present item feat) + "(item.feat.present item feat) +nil if feat doesn't existing in this item, non-nil otherwise." + (and item (assoc_string feat (item.features item)))) + +(define (item.relation.append_daughter parent relname daughter) +"(item.relation.append_daughter parent relname daughter) +Make add daughter to parent as a new daughter in relname." + (item.append_daughter (item.relation parent relname) daughter)) + +(define (item.relation.insert si relname newsi direction) +"(item.relation.insert si relname newsi direction) +Insert newsi in relation relname with respect to direction. If +direction is ommited after is assumed, valid directions are after +before, above and below. Note you should use +item.relation.append_daughter for tree adjoining. newsi maybe +a item itself of a LISP description of one." + (item.insert + (item.relation si relname) + newsi + direction)) + +(define (item.relation.daughters parent relname) + "(item.relation.daughters parent relname) +Return a list of all daughters of parent by relname." + (let ((d1 (item.daughter1 (item.relation parent relname))) + (daughters)) + (while d1 + (set! daughters (cons d1 daughters)) + (set! d1 (item.next d1))) + (reverse daughters))) + +(define (item.daughters p) + "(item.daughters parent) +Return a list of all daughters of parent." + (item.relation.daughters p (item.relation.name p))) + +(define (item.relation.parent si relname) + "(item.relation.parent item relname) +Return the parent of this item in this relation." + (item.parent (item.relation si relname))) + +(define (item.relation.daughter1 si relname) + "(item.relation.daughter1 item relname) +Return the first daughter of this item in this relation." + (item.daughter1 (item.relation si relname))) + +(define (item.relation.daughter2 si relname) + "(item.relation.daughter2 item relname) +Return the second daughter of this item in this relation." + (item.daughter2 (item.relation si relname))) + +(define (item.relation.daughtern si relname) + "(item.relation.daughtern item relname) +Return the final daughter of this item in this relation." + (item.daughtern (item.relation si relname))) + +(define (item.relation.next si relname) + "(item.relation.next item relname) +Return the next item in this relation." + (item.next (item.relation si relname))) + +(define (item.relation.prev si relname) + "(item.relation.prev item relname) +Return the previous item in this relation." + (item.prev (item.relation si relname))) + +(define (item.relation.first si relname) + "(item.relation.first item relname) +Return the most previous item from this item in this relation." + (let ((n (item.relation si relname))) + (while (item.prev n) + (set! n (item.prev n))) + n)) + +(define (item.leafs si) + "(item.relation.leafs item relname) +Return a list of the leafs of this item in this relation." + (let ((ls nil) + (pl (item.first_leaf si)) + (ll (item.next_leaf (item.last_leaf si)))) + (while (and pl (not (equal? pl ll))) + (set! ls (cons pl ls)) + (set! pl (item.next_leaf pl))) + (reverse ls))) + +(define (item.relation.leafs si relname) + "(item.relation.leafs item relname) +Return a list of the leafs of this item in this relation." + (item.leafs (item.relation si relname))) + +(define (item.root s) + "(item.root s) +Follow parent link until s has no parent." + (cond + ((item.parent s) + (item.root (item.parent s))) + (t s))) + +(define (item.parent_to s relname) + "(item.parent_to s relname) +Find the first ancestor of s in its current relation that is also in +relname. s is treated as an ancestor of itself so if s is in relname +it is returned. The returned value is in will be in relation relname +or nil if there isn't one." + (cond + ((null s) s) + ((member_string relname (item.relations s)) + (item.relation s relname)) + (t (item.parent_to (item.parent s) relname)))) + +(define (item.daughter1_to s relname) + "(item.daughter1_to s relname) +Follow daughter1 links of s in its current relation until an item +is found that is also in relname, is s is in relname it is returned. +The return item is returned in relation relname, or nil if there is +nothing in relname." + (cond + ((null s) s) + ((member_string relname (item.relations s)) (item.relation s relname)) + (t (item.daughter1_to (item.daughter1 s) relname)))) + +(define (item.daughtern_to s relname) + "(item.daughter1_to s relname) +Follow daughtern links of s in its current relation until an item +is found that is also in relname, is s is in relname it is returned. +The return item is returned in relation relname, or nil if there is +nothing in relname." + (cond + ((null s) s) + ((member_string relname (item.relations s)) (item.relation s relname)) + (t (item.daughtern_to (item.daughtern s) relname)))) + +(define (item.name s) +"(item.name ITEM) + Returns the name of ITEM. [see Accessing an utterance]" + (item.feat s "name")) + +(define (utt.wave utt) + "(utt.wave UTT) +Get waveform from wave (R:Wave.first.wave)." + (item.feat (utt.relation.first utt "Wave") "wave")) + +(define (utt.wave.rescale . args) + "(utt.wave.rescale UTT FACTOR NORMALIZE) +Modify the gain of the waveform in UTT by GAIN. If NORMALIZE is +specified and non-nil the waveform is maximized first." + (wave.rescale (utt.wave (nth 0 args)) (nth 1 args) (nth 2 args)) + (nth 0 args)) + +(define (utt.wave.resample utt rate) + "(utt.wave.resample UTT RATE)\ +Resample waveform in UTT to RATE (if it is already at that rate it remains +unchanged)." + (wave.resample (utt.wave utt) rate) + utt) + +(define (utt.import.wave . args) + "(utt.import.wave UTT FILENAME APPEND) +Load waveform in FILENAME into UTT in R:Wave.first.wave. If APPEND +is specified and non-nil append this to the current waveform." + (let ((utt (nth 0 args)) + (filename (nth 1 args)) + (append (nth 2 args))) + (if (and append (member 'Wave (utt.relationnames utt))) + (wave.append (utt.wave utt) (wave.load filename)) + (begin + (utt.relation.create utt 'Wave) + (item.set_feat + (utt.relation.append utt 'Wave) + "wave" + (wave.load filename)))) + utt)) + +(define (utt.save.wave . args) + "(utt.save.wave UTT FILENAME FILETYPE) +Save waveform in UTT in FILENAME with FILETYPE (if specified) or +using global parameter Wavefiletype." + (wave.save + (utt.wave (nth 0 args)) + (nth 1 args) + (nth 2 args)) + (nth 0 args)) + +(define (utt.play utt) + "(utt.play UTT) +Play waveform in utt by current audio method." + (wave.play (utt.wave utt)) + utt) + +(define (utt.save.track utt filename relation feature) + "(utt.save.track utt filename relation feature) +DEPRICATED use trace.save instead." + (format stderr "utt.save.track: DEPRICATED use track.save instead\n") + (track.save + (item.feat + (utt.relation.first utt relation) + feature) + filename) + utt) + +(define (utt.import.track utt filename relation fname) + "(utt.import.track UTT FILENAME RELATION FEATURE_NAME) +Load track in FILENAME into UTT in R:RELATION.first.FEATURE_NAME. +Deletes RELATION if it already exists. (you maybe want to use track.load +directly rather than this legacy function." + (utt.relation.create utt relation) + (item.set_feat + (utt.relation.append utt relation) + fname + (track.load filename)) + utt) + +(define (wagon_predict item tree) +"(wagon_predict ITEM TREE) +Predict with given ITEM and CART tree and return the prediction +(the last item) rather than whole probability distribution." + (car (last (wagon item tree)))) + +(define (phone_is_silence phone) + (member_string + phone + (car (cdr (car (PhoneSet.description '(silences))))))) + +(define (phone_feature phone feat) +"(phone_feature phone feat) +Return the feature for given phone in current phone set, or 0 +if it doesn't exist." + (let ((ph (intern phone))) + (let ((fnames (cadr (assoc 'features (PhoneSet.description)))) + (fvals (cdr (assoc ph (cadr (assoc 'phones (PhoneSet.description))))))) + (while (and fnames (not (string-equal feat (car (car fnames))))) + (set! fvals (cdr fvals)) + (set! fnames (cdr fnames))) + (if fnames + (car fvals) + 0)))) + +(defvar server_max_clients 10 + "server_max_clients +In server mode, the maximum number of clients supported at any one +time. When more that this number of clients attach simulaneous +the last ones are denied access. Default value is 10. +[see Server/client API]") + +(defvar server_port 1314 + "server_port +In server mode the inet port number the server will wait for connects +on. The default value is 1314. [see Server/client API]") + +(defvar server_log_file t + "server_log_file +If set to t server log information is printed to standard output +of the server process. If set to nil no output is given. If set +to anything else the value is used as the name of file to which +server log information is appended. Note this value is checked at +server start time, there is no way a client may change this. +[see Server/client API]") + +(defvar server_passwd nil + "server_passwd +If non-nil clients must send this passwd to the server followed by +a newline before they can get a connection. It would be normal +to set this for the particular server task. +[see Server/client API]") + +(defvar server_access_list '(localhost) + "server_access_list +If non-nil this is the exhaustive list of machines and domains +from which clients may access the server. This is a list of REGEXs +that client host must match. Remember to add the backslashes before +the dots. [see Server/client API]") + +(defvar server_deny_list nil + "server_deny_list +If non-nil this is a list of machines which are to be denied access +to the server absolutely, irrespective of any other control features. +The list is a list of REGEXs that are used to matched the client hostname. +This list is checked first, then server_access_list, then passwd. +[see Server/client API]") + +(define (def_feature_docstring fname fdoc) +"(def_feature_docstring FEATURENAME FEATUREDOC) +As some feature are used directly of stream items with no +accompanying feature function, the features are just values on the feature +list. This function also those features to have an accompanying +documentation string." + (let ((fff (assoc fname ff_docstrings))) + (cond + (fff ;; replace what's already there + (set-cdr! fff fdoc)) + (t + (set! ff_docstrings (cons (cons fname fdoc) ff_docstrings)))) + t)) + +(define (linear_regression item model) + "(linear_regression ITEM MODEL) +Use linear regression MODEL on ITEM. MODEL consists of a list +of features, weights and optional map list. E.g. ((Intercept 100) +(tobi_accent 10 (H* !H*)))." + (let ((intercept (if (equal? 'Intercept (car (car model))) + (car (cdr (car model))) 0)) + (mm (if (equal? 'Intercept (car (car model))) + (cdr model) model))) + (apply + + (cons intercept + (mapcar + (lambda (f) + (let ((ff (item.feat item (car f)))) + (if (car (cdr (cdr f))) + (if (member_string ff (car (cdr (cdr f)))) + (car (cdr f)) + 0) + (* (parse-number ff) (car (cdr f)))))) + mm))))) + +(defvar help + "The Festival Speech Synthesizer System: Help + +Getting Help + (doc ') displays help on + (manual nil) displays manual in local netscape + C-c return to top level + C-d or (quit) Exit Festival +(If compiled with editline) + M-h displays help on current symbol + M-s speaks help on current symbol + M-m displays relevant manula page in local netscape + TAB Command, symbol and filename completion + C-p or up-arrow Previous command + C-b or left-arrow Move back one character + C-f or right-arrow + Move forward one character + Normal Emacs commands work for editing command line + +Doing stuff + (SayText TEXT) Synthesize text, text should be surrounded by + double quotes + (tts FILENAME nil) Say contexts of file, FILENAME should be + surrounded by double quotes + (voice_rab_diphone) Select voice (Britsh Male) + (voice_ked_diphone) Select voice (American Male) +") + +(define (festival_warranty) +"(festival_warranty) + Display Festival's copyright and warranty. [see Copying]" + (format t + (string-append + " The Festival Speech Synthesis System: " + festival_version +" + Centre for Speech Technology Research + University of Edinburgh, UK + Copyright (c) 1996-2010 + All Rights Reserved. + + Permission is hereby granted, free of charge, to use and distribute + this software and its documentation without restriction, including + without limitation the rights to use, copy, modify, merge, publish, + distribute, sublicense, and/or sell copies of this work, and to + permit persons to whom this work is furnished to do so, subject to + the following conditions: + 1. The code must retain the above copyright notice, this list of + conditions and the following disclaimer. + 2. Any modifications must be clearly marked as such. + 3. Original authors' names are not deleted. + 4. The authors' names are not used to endorse or promote products + derived from this software without specific prior written + permission. + + THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK + DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING + ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT + SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE + FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN + AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, + ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF + THIS SOFTWARE. +"))) + +(define (intro) +"(intro) + Synthesize an introduction to the Festival Speech Synthesis System." + (tts (path-append libdir "../examples/intro.text") nil)) + +(define (intro-spanish) +"(intro-spanish) + Synthesize an introduction to the Festival Speech Synthesis System + in spanish. Spanish voice must already be selected for this." + (tts (path-append libdir "../examples/spintro.text") nil)) + +(define (na_play FILENAME) +"(play_wave FILENAME) +Play given wavefile" + (utt.play (utt.synth (eval (list 'Utterance 'Wave FILENAME))))) + +;;; Some autoload commands +(autoload manual-sym "festdoc" "Show appropriate manual section for symbol.") +(autoload manual "festdoc" "Show manual section.") + +(autoload display "display" "Graphically display utterance.") + +(autoload festtest "festtest" "Run tests of Festival.") + +(defvar diphone_module_hooks nil + "diphone_module_hooks + A function or list of functions that will be applied to the utterance + at the start of the diphone module. It can be used to map segment + names to those that will be used by the diphone database itself. + Typical use specifies _ and $ for consonant clusters and syllable + boundaries, mapping to dark ll's etc. Reduction and tap type + phenomena should probabaly be done by post lexical rules though the + distinction is not a clear one.") + +(def_feature_docstring + 'Segment.diphone_phone_name + "Segment.diphone_phone_name + This is produced by the diphone module to contain the desired phone + name for the desired diphone. This adds things like _ if part of + a consonant or $ to denote syllable boundaries. These are generated + on a per voice basis by function(s) specified by diphone_module_hooks. + Identification of dark ll's etc. may also be included. Note this is not + necessarily the name of the diphone selected as if it is not found + some of these characters will be removed and fall back values will be + used.") + +(def_feature_docstring + 'Syllable.stress + "Syllable.stress + The lexical stress of the syllable as specified from the lexicon entry + corresponding to the word related to this syllable.") + +;;; +;;; I tried some tests on the resulting speed both runtime and loadtime +;;; but compiled files don't seem to make any significant difference +;;; +(define (compile_library) + "(compile_library) +Compile all the scheme files in the library directory." + (mapcar + (lambda (file) + (format t "compile ... %s\n" file) + (compile-file (string-before file ".scm"))) + (list + "synthesis.scm" "siod.scm" "init.scm" "lexicons.scm" + "festival.scm" "gsw_diphone.scm" "intonation.scm" "duration.scm" + "pos.scm" "phrase.scm" "don_diphone.scm" "rab_diphone.scm" + "voices.scm" "tts.scm" "festdoc.scm" "languages.scm" "token.scm" + "mbrola.scm" "display.scm" "postlex.scm" "tokenpos.scm" + "festtest.scm" "cslush.scm" "ducs_cluster.scm" "sucs.scm" + "web.scm" "cart_aux.scm" + "lts_nrl.scm" "lts_nrl_us.scm" "email-mode.scm" + "mrpa_phones.scm" "radio_phones.scm" "holmes_phones.scm" + "mrpa_durs.scm" "klatt_durs.scm" "gswdurtreeZ.scm" + "tobi.scm" "f2bf0lr.scm")) + t) + +;;; For mlsa resynthesizer +(defvar mlsa_alpha_param 0.42) +(defvar mlsa_beta_param 0.0) + +(provide 'festival) diff --git a/lib/festtest.scm b/lib/festtest.scm new file mode 100644 index 0000000..345c3cc --- /dev/null +++ b/lib/festtest.scm @@ -0,0 +1,72 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Some basic functions used in tests for Festival +;;; + +(define (test_words text) +"(test_words TEXT) +prints TEXT, Synthesizes TEXT and outputs the words in it." + (format t "Word test: %s\n " text) + (set! utt1 (utt.synth (eval (list 'Utterance 'Text text)))) + (mapcar + (lambda (word) (format t "%s " (car word))) + (utt.features utt1 'Word '(name))) + (format t "\n") + t) + +(define (test_segments text) +"(test_segments TEXT) +prints TEXT, Synthesizes TEXT and outputs the segments in it." + (format t "Segment test: %s\n " text) + (set! utt1 (utt.synth (eval (list 'Utterance 'Text text)))) + (mapcar + (lambda (word) (format t "%s " (car word))) + (utt.features utt1 'Segment '(name))) + (format t "\n") +) + +(define (test_phrases text) +"(test_phrases TEXT) +prints TEXT, Synthesizes TEXT and outputs the words and phrase breaks." + (format t "Phrase test: %s \n " text) + (set! utt1 (utt.synth (eval (list 'Utterance 'Text text)))) + (mapcar + (lambda (phrase) + (mapcar (lambda (w) (format t "%s " (car (car w)))) (cdr phrase)) + (format t "%s\n " (car (car phrase)))) + (utt.relation_tree utt1 'Phrase)) + (format t "\n") + t) + +(provide 'festtest) diff --git a/lib/gswdurtreeZ.scm b/lib/gswdurtreeZ.scm new file mode 100644 index 0000000..4968192 --- /dev/null +++ b/lib/gswdurtreeZ.scm @@ -0,0 +1,947 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; A tree to predict zcore durations build from gsw 450 (timit) +;;; doesn't use actual phonemes so it can have better generalizations +;;; + +;; pre Sue's changes to mrpa_phones (on traing data) +;; RMSE 0.79102 Correlation is 0.610184 Mean (abs) Error 0.605081 (0.509517) +;; Post with balance +;; train test spit --stop 19 --balance 16 +;; RMSE 0.841861 Correlation is 0.526064 Mean (abs) Error 0.646614 (0.539288) +;; on training data +;; RMSE 0.784032 Correlation is 0.619165 Mean (abs) Error 0.602819 (0.501332) +;; +;; Oct 29th 1997 +;; stepwise (but its over trained) +;; RMSE 0.8322 Correlation is 0.5286 Mean (abs) Error 0.6375 (0.5350) +;; +;; May 11th 1998 +;; new architecture, full new train on f2b on test data +;; in zscore domain +;; RMSE 0.8076 Correlation is 0.5307 Mean (abs) Error 0.6113 (0.5278) +;; in absolute domain +;; RMSE 0.0276 Correlation 0.7468 Mean (abs) error 0.0203 (0.0187) +;; +;; May 18th 1998 +;; various corrections f2bdur.bbz.H0.S50.tree no names zscore +;; in zscore domain +;; RMSE 0.8049 Correlation is 0.6003 Mean (abs) Error 0.6008 (0.5357) +;; in absolute domain +;; RMSE 0.0268 Correlation 0.7766 Mean (abs) error 0.0196 (0.0183) + +(set! gsw_duration_cart_tree +' +((name is #) + ((emph_sil is +) + ((0.0 -0.5)) + ((p.R:SylStructure.parent.parent.pbreak is BB) + ((0.0 2.0)) + ((0.0 0.0)))) + +((R:SylStructure.parent.accented is 0) + ((n.ph_ctype is 0) + ((p.ph_vlng is 0) + ((R:SylStructure.parent.syl_codasize < 1.5) + ((p.ph_ctype is n) + ((ph_ctype is f) + ((0.559208 -0.783163)) + ((1.05215 -0.222704))) + ((ph_ctype is s) + ((R:SylStructure.parent.syl_break is 2) + ((0.589948 0.764459)) + ((R:SylStructure.parent.asyl_in < 0.7) + ((1.06385 0.567944)) + ((0.691943 0.0530272)))) + ((ph_vlng is l) + ((pp.ph_vfront is 1) + ((1.06991 0.766486)) + ((R:SylStructure.parent.syl_break is 1) + ((0.69665 0.279248)) + ((0.670353 0.0567774)))) + ((p.ph_ctype is s) + ((seg_onsetcoda is coda) + ((0.828638 -0.038356)) + ((ph_ctype is f) + ((0.7631 -0.545853)) + ((0.49329 -0.765994)))) + ((R:SylStructure.parent.parent.gpos is det) + ((R:SylStructure.parent.last_accent < 0.3) + ((R:SylStructure.parent.sub_phrases < 1) + ((0.811686 0.160195)) + ((0.799015 0.713958))) + ((0.731599 -0.215472))) + ((ph_ctype is r) + ((0.673487 0.092772)) + ((R:SylStructure.parent.asyl_in < 1) + ((0.745273 0.00132813)) + ((0.75457 -0.334898))))))))) + ((pos_in_syl < 0.5) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 2) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((0.902446 -0.041618)) + ((R:SylStructure.parent.sub_phrases < 2.3) + ((0.900629 0.262952)) + ((1.18474 0.594794)))) + ((seg_onset_stop is 0) + ((R:SylStructure.parent.position_type is mid) + ((0.512323 -0.760444)) + ((R:SylStructure.parent.syl_out < 6.8) + ((pp.ph_vlng is a) + ((0.640575 -0.450449)) + ((ph_ctype is f) + ((R:SylStructure.parent.sub_phrases < 1.3) + ((0.862876 -0.296956)) + ((R:SylStructure.parent.syl_out < 2.4) + ((0.803215 0.0422868)) + ((0.877856 -0.154465)))) + ((R:SylStructure.parent.syl_out < 3.6) + ((R:SylStructure.parent.syl_out < 1.2) + ((0.567081 -0.264199)) + ((0.598043 -0.541738))) + ((0.676843 -0.166623))))) + ((0.691678 -0.57173)))) + ((R:SylStructure.parent.parent.gpos is cc) + ((1.15995 0.313289)) + ((pp.ph_vfront is 1) + ((0.555993 0.0695819)) + ((R:SylStructure.parent.asyl_in < 1.2) + ((R:SylStructure.parent.sub_phrases < 2.7) + ((0.721635 -0.367088)) + ((0.71919 -0.194887))) + ((0.547052 -0.0637491))))))) + ((ph_ctype is s) + ((R:SylStructure.parent.syl_break is 0) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 1) + ((0.650007 -0.333421)) + ((0.846301 -0.165383))) + ((0.527756 -0.516332))) + ((R:SylStructure.parent.syl_break is 0) + ((p.ph_ctype is s) + ((0.504414 -0.779112)) + ((0.812498 -0.337611))) + ((pos_in_syl < 1.4) + ((0.513041 -0.745807)) + ((p.ph_ctype is s) + ((0.350582 -1.04907)) + ((0.362 -0.914974)))))))) + ((R:SylStructure.parent.syl_break is 0) + ((ph_ctype is n) + ((R:SylStructure.parent.position_type is initial) + ((pos_in_syl < 1.2) + ((0.580485 0.172658)) + ((0.630973 -0.101423))) + ((0.577937 -0.360092))) + ((R:SylStructure.parent.syl_out < 2.9) + ((R:SylStructure.parent.syl_out < 1.1) + ((R:SylStructure.parent.position_type is initial) + ((0.896092 0.764189)) + ((R:SylStructure.parent.sub_phrases < 3.6) + ((ph_ctype is s) + ((0.877362 0.555132)) + ((0.604511 0.369882))) + ((0.799982 0.666966)))) + ((seg_onsetcoda is coda) + ((p.ph_vlng is a) + ((R:SylStructure.parent.last_accent < 0.4) + ((0.800736 0.240634)) + ((0.720606 0.486176))) + ((1.18173 0.573811))) + ((0.607147 0.194468)))) + ((ph_ctype is r) + ((0.88377 0.499383)) + ((R:SylStructure.parent.last_accent < 0.5) + ((R:SylStructure.parent.position_type is initial) + ((R:SylStructure.parent.parent.word_numsyls < 2.4) + ((0.62798 0.0737318)) + ((0.787334 0.331014))) + ((ph_ctype is s) + ((0.808368 0.0929299)) + ((0.527948 -0.0443271)))) + ((seg_coda_fric is 0) + ((p.ph_vlng is a) + ((0.679745 0.517681)) + ((R:SylStructure.parent.sub_phrases < 1.1) + ((0.759979 0.128316)) + ((0.775233 0.361383)))) + ((R:SylStructure.parent.last_accent < 1.3) + ((0.696255 0.054136)) + ((0.632425 0.246742)))))))) + ((pos_in_syl < 0.3) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 2) + ((0.847602 0.621547)) + ((ph_ctype is s) + ((0.880645 0.501679)) + ((R:SylStructure.parent.sub_phrases < 3.3) + ((R:SylStructure.parent.sub_phrases < 0.3) + ((0.901014 -0.042049)) + ((0.657493 0.183226))) + ((0.680126 0.284799))))) + ((ph_ctype is s) + ((p.ph_vlng is s) + ((0.670033 -0.820934)) + ((0.863306 -0.348735))) + ((ph_ctype is n) + ((R:SylStructure.parent.asyl_in < 1.2) + ((0.656966 -0.40092)) + ((0.530966 -0.639366))) + ((seg_coda_fric is 0) + ((1.04153 0.364857)) + ((pos_in_syl < 1.2) + ((R:SylStructure.parent.syl_out < 3.4) + ((0.81503 -0.00768613)) + ((0.602665 -0.197753))) + ((0.601844 -0.394632))))))))) + ((n.ph_ctype is f) + ((pos_in_syl < 1.5) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 2) + ((pos_in_syl < 0.1) + ((1.63863 0.938841)) + ((R:SylStructure.parent.position_type is initial) + ((0.897722 -0.0796637)) + ((nn.ph_vheight is 0) + ((0.781081 0.480026)) + ((0.779711 0.127175))))) + ((ph_ctype is r) + ((p.ph_ctype is s) + ((0.581329 -0.708767)) + ((0.564366 -0.236212))) + ((ph_vlng is a) + ((p.ph_ctype is r) + ((0.70992 -0.273389)) + ((R:SylStructure.parent.parent.gpos is in) + ((0.764696 0.0581338)) + ((nn.ph_vheight is 0) + ((0.977737 0.721904)) + ((R:SylStructure.parent.sub_phrases < 2.2) + ((pp.ph_vfront is 0) + ((0.586708 0.0161206)) + ((0.619949 0.227372))) + ((0.707285 0.445569)))))) + ((ph_ctype is n) + ((R:SylStructure.parent.syl_break is 1) + ((nn.ph_vfront is 2) + ((0.430295 -0.120097)) + ((0.741371 0.219042))) + ((0.587492 0.321245))) + ((p.ph_ctype is n) + ((0.871586 0.134075)) + ((p.ph_ctype is r) + ((0.490751 -0.466418)) + ((R:SylStructure.parent.syl_codasize < 1.3) + ((R:SylStructure.parent.sub_phrases < 2.2) + ((p.ph_ctype is s) + ((0.407452 -0.425925)) + ((0.644771 -0.542809))) + ((0.688772 -0.201899))) + ((ph_vheight is 1) + ((nn.ph_vheight is 0) + ((0.692018 0.209018)) + ((0.751345 -0.178136))) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.3) + ((R:SylStructure.parent.asyl_in < 1.5) + ((0.599633 -0.235593)) + ((0.60042 0.126118))) + ((p.ph_vlng is a) + ((0.7148 -0.174812)) + ((R:SylStructure.parent.parent.gpos is content) + ((0.761296 -0.231509)) + ((0.813081 -0.536405))))))))))))) + ((ph_ctype is n) + ((0.898844 0.163343)) + ((p.ph_vlng is s) + ((seg_coda_fric is 0) + ((0.752921 -0.45528)) + ((0.890079 -0.0998025))) + ((ph_ctype is f) + ((0.729376 -0.930547)) + ((ph_ctype is s) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 0) + ((0.745052 -0.634119)) + ((0.521502 -0.760176))) + ((R:SylStructure.parent.syl_break is 1) + ((0.766575 -0.121355)) + ((0.795616 -0.557509)))))))) + ((p.ph_vlng is 0) + ((p.ph_ctype is r) + ((ph_vlng is 0) + ((0.733659 -0.402734)) + ((R:SylStructure.parent.sub_phrases < 1.5) + ((ph_vlng is s) + ((0.326176 -0.988478)) + ((n.ph_ctype is s) + ((0.276471 -0.802536)) + ((0.438283 -0.900628)))) + ((nn.ph_vheight is 0) + ((ph_vheight is 2) + ((0.521 -0.768992)) + ((0.615436 -0.574918))) + ((ph_vheight is 1) + ((0.387376 -0.756359)) + ((pos_in_syl < 0.3) + ((0.417235 -0.808937)) + ((0.384043 -0.93315))))))) + ((ph_vlng is a) + ((ph_ctype is 0) + ((n.ph_ctype is s) + ((p.ph_ctype is f) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((0.415908 -0.428493)) + ((pos_in_syl < 0.1) + ((0.790441 0.0211071)) + ((0.452465 -0.254485)))) + ((p.ph_ctype is s) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((0.582447 -0.389966)) + ((0.757648 0.185781))) + ((R:SylStructure.parent.sub_phrases < 1.4) + ((0.628965 0.422551)) + ((0.713613 0.145576))))) + ((seg_onset_stop is 0) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 0) + ((pp.ph_vfront is 1) + ((0.412363 -0.62319)) + ((R:SylStructure.parent.syl_out < 3.6) + ((0.729259 -0.317324)) + ((0.441633 -0.591051)))) + ((R:SylStructure.parent.syl_break is 1) + ((R:SylStructure.parent.sub_phrases < 2.7) + ((0.457728 -0.405607)) + ((0.532411 -0.313148))) + ((R:SylStructure.parent.last_accent < 0.3) + ((1.14175 0.159416)) + ((0.616396 -0.254651))))) + ((R:SylStructure.parent.position_type is initial) + ((0.264181 -0.799896)) + ((0.439801 -0.551309))))) + ((R:SylStructure.parent.position_type is final) + ((0.552027 -0.707084)) + ((0.585661 -0.901874)))) + ((ph_ctype is s) + ((pos_in_syl < 1.2) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((pp.ph_vfront is 1) + ((0.607449 0.196466)) + ((0.599662 0.00382414))) + ((0.64109 -0.12859))) + ((pp.ph_vfront is 1) + ((0.720484 -0.219339)) + ((0.688707 -0.516734)))) + ((ph_vlng is s) + ((n.ph_ctype is s) + ((R:SylStructure.parent.parent.gpos is content) + ((R:SylStructure.parent.position_type is single) + ((0.659206 0.159445)) + ((R:SylStructure.parent.parent.word_numsyls < 3.5) + ((R:SylStructure.parent.sub_phrases < 2) + ((0.447186 -0.419103)) + ((0.631822 -0.0928561))) + ((0.451623 -0.576116)))) + ((ph_vheight is 3) + ((0.578626 -0.64583)) + ((0.56636 -0.4665)))) + ((R:SylStructure.parent.parent.gpos is in) + ((0.771516 -0.217292)) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 2) + ((0.688571 -0.304382)) + ((R:SylStructure.parent.parent.gpos is content) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 1) + ((n.ph_ctype is n) + ((0.556085 -0.572203)) + ((0.820173 -0.240338))) + ((R:SylStructure.parent.parent.word_numsyls < 2.2) + ((0.595398 -0.588171)) + ((0.524737 -0.95797)))) + ((R:SylStructure.parent.sub_phrases < 3.9) + ((0.371492 -0.959427)) + ((0.440479 -0.845747))))))) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 0) + ((p.ph_ctype is f) + ((0.524088 -0.482247)) + ((nn.ph_vheight is 1) + ((0.587666 -0.632362)) + ((ph_vlng is l) + ((R:SylStructure.parent.position_type is final) + ((0.513286 -0.713117)) + ((0.604613 -0.924308))) + ((R:SylStructure.parent.syl_codasize < 2.2) + ((0.577997 -0.891342)) + ((0.659804 -1.15252)))))) + ((pp.ph_vlng is s) + ((ph_ctype is f) + ((0.813383 -0.599624)) + ((0.984027 -0.0771909))) + ((p.ph_ctype is f) + ((R:SylStructure.parent.parent.gpos is in) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 1) + ((0.313572 -1.03242)) + ((0.525854 -0.542799))) + ((R:SylStructure.parent.syl_out < 2.8) + ((0.613007 -0.423979)) + ((0.570258 -0.766379)))) + ((R:SylStructure.parent.syl_break is 1) + ((R:SylStructure.parent.parent.gpos is to) + ((0.364585 -0.792895)) + ((ph_vlng is l) + ((0.69143 -0.276816)) + ((0.65673 -0.523721)))) + ((R:SylStructure.parent.syl_out < 3.6) + ((R:SylStructure.parent.position_type is initial) + ((0.682096 -0.488102)) + ((0.406364 -0.731758))) + ((0.584694 -0.822229))))))))))) + ((n.ph_ctype is r) + ((R:SylStructure.parent.position_type is initial) + ((p.ph_vlng is a) + ((0.797058 1.02334)) + ((ph_ctype is s) + ((1.0548 0.536277)) + ((0.817253 0.138201)))) + ((R:SylStructure.parent.sub_phrases < 1.1) + ((R:SylStructure.parent.syl_out < 3.3) + ((0.884574 -0.23471)) + ((0.772063 -0.525292))) + ((nn.ph_vfront is 1) + ((1.25254 0.417485)) + ((0.955557 -0.0781996))))) + ((pp.ph_vfront is 0) + ((ph_ctype is f) + ((n.ph_ctype is s) + ((R:SylStructure.parent.parent.gpos is content) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 0) + ((0.583506 -0.56941)) + ((0.525949 -0.289362))) + ((0.749316 -0.0921038))) + ((p.ph_vlng is s) + ((0.734234 0.139463)) + ((0.680119 -0.0708717)))) + ((ph_vlng is s) + ((ph_vheight is 1) + ((0.908712 -0.618971)) + ((0.55344 -0.840495))) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 1.2) + ((pos_in_syl < 1.2) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 2) + ((0.838715 0.00913392)) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 1) + ((ph_vheight is 2) + ((0.555513 -0.512523)) + ((R:SylStructure.parent.position_type is initial) + ((0.758711 0.121704)) + ((0.737555 -0.25637)))) + ((R:SylStructure.parent.syl_out < 3.1) + ((n.ph_ctype is s) + ((0.611756 -0.474522)) + ((1.05437 -0.247206))) + ((R:SylStructure.parent.syl_codasize < 2.2) + ((R:SylStructure.parent.position_type is final) + ((0.567761 -0.597866)) + ((0.785599 -0.407765))) + ((0.575598 -0.741256)))))) + ((ph_ctype is s) + ((n.ph_ctype is s) + ((0.661069 -1.08426)) + ((0.783184 -0.39789))) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 1) + ((R:SylStructure.parent.sub_phrases < 2.6) + ((0.511323 -0.666011)) + ((0.691878 -0.499492))) + ((ph_ctype is r) + ((0.482131 -0.253186)) + ((0.852955 -0.372832)))))) + ((0.854447 -0.0936489))))) + ((R:SylStructure.parent.position_type is final) + ((0.685939 -0.249982)) + ((R:SylStructure.parent.syl_out < 3.2) + ((0.989843 0.18086)) + ((0.686805 -0.0402908))))))))) + ((R:SylStructure.parent.syl_out < 2.4) + ((R:SylStructure.parent.syl_out < 0.2) + ((seg_onsetcoda is coda) + ((ph_ctype is s) + ((R:SylStructure.parent.syl_break is 4) + ((pp.ph_vlng is 0) + ((0.959737 1.63203)) + ((1.20714 0.994933))) + ((n.ph_ctype is 0) + ((R:SylStructure.parent.syl_break is 2) + ((0.864809 0.214457)) + ((0.874278 0.730381))) + ((pp.ph_vfront is 0) + ((seg_coda_fric is 0) + ((1.20844 -0.336221)) + ((1.01357 0.468302))) + ((0.658106 -0.799121))))) + ((n.ph_ctype is f) + ((ph_ctype is f) + ((1.26332 0.0300613)) + ((ph_vlng is d) + ((1.02719 1.1649)) + ((ph_ctype is 0) + ((R:SylStructure.parent.asyl_in < 1.2) + ((1.14048 2.2668)) + ((ph_vheight is 1) + ((1.15528 1.50375)) + ((1.42406 2.07927)))) + ((R:SylStructure.parent.sub_phrases < 1.1) + ((0.955892 1.10243)) + ((R:SylStructure.parent.syl_break is 2) + ((1.32682 1.8432)) + ((1.27582 1.59853))))))) + ((n.ph_ctype is 0) + ((ph_ctype is n) + ((R:SylStructure.parent.syl_break is 2) + ((1.45399 1.12927)) + ((1.05543 0.442376))) + ((R:SylStructure.parent.syl_break is 4) + ((R:SylStructure.parent.position_type is final) + ((ph_ctype is f) + ((1.46434 1.76508)) + ((0.978055 0.7486))) + ((1.2395 2.30826))) + ((ph_ctype is 0) + ((0.935325 1.69917)) + ((nn.ph_vfront is 1) + ((1.20456 1.31128)) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((nn.ph_vheight is 0) + ((1.16907 0.212421)) + ((0.952091 0.653094))) + ((p.ph_ctype is 0) + ((1.05502 1.25802)) + ((0.818731 0.777568)))))))) + ((ph_ctype is f) + ((p.ph_ctype is 0) + ((1.03918 0.163941)) + ((0.737545 -0.167063))) + ((R:SylStructure.parent.position_type is final) + ((n.ph_ctype is n) + ((R:SylStructure.parent.last_accent < 0.5) + ((R:SylStructure.parent.sub_phrases < 2.8) + ((0.826207 -0.000859005)) + ((0.871119 0.273433))) + ((R:SylStructure.parent.parent.word_numsyls < 2.4) + ((1.17405 1.05694)) + ((0.858394 0.244916)))) + ((R:SylStructure.parent.syl_codasize < 2.2) + ((p.ph_ctype is 0) + ((1.14092 1.21187)) + ((R:SylStructure.parent.syl_break is 2) + ((1.02653 0.59865)) + ((0.94248 1.1634)))) + ((seg_coda_fric is 0) + ((1.07441 0.292935)) + ((1.15736 0.92574))))) + ((ph_vlng is s) + ((R:SylStructure.parent.syl_break is 2) + ((1.34638 1.23484)) + ((0.951514 2.02008))) + ((ph_ctype is 0) + ((p.ph_ctype is r) + ((0.806106 0.697089)) + ((R:SylStructure.parent.syl_break is 2) + ((1.10891 0.992197)) + ((1.04657 1.51093)))) + ((1.18165 0.520952))))))))) + ((p.ph_vlng is 0) + ((pos_in_syl < 0.7) + ((R:SylStructure.parent.position_type is final) + ((ph_ctype is r) + ((0.966357 0.185827)) + ((ph_ctype is s) + ((0.647163 0.0332298)) + ((0.692972 -0.534917)))) + ((ph_ctype is s) + ((0.881521 0.575107)) + ((p.ph_ctype is f) + ((0.8223 -0.111275)) + ((R:SylStructure.parent.last_accent < 0.3) + ((0.969188 0.09447)) + ((0.894438 0.381947)))))) + ((p.ph_ctype is f) + ((0.479748 -0.490108)) + ((0.813125 -0.201268)))) + ((ph_ctype is s) + ((0.908566 1.20397)) + ((R:SylStructure.parent.last_accent < 1.2) + ((0.88078 0.636568)) + ((0.978087 1.07763)))))) + ((pos_in_syl < 1.3) + ((R:SylStructure.parent.syl_break is 0) + ((pos_in_syl < 0.1) + ((R:SylStructure.parent.position_type is initial) + ((p.ph_ctype is n) + ((0.801651 -0.0163359)) + ((ph_ctype is s) + ((n.ph_ctype is r) + ((0.893307 1.07253)) + ((p.ph_vlng is 0) + ((0.92651 0.525806)) + ((0.652444 0.952792)))) + ((p.ph_vlng is 0) + ((seg_onsetcoda is coda) + ((0.820151 0.469117)) + ((p.ph_ctype is f) + ((0.747972 -0.0716448)) + ((ph_ctype is f) + ((0.770882 0.457137)) + ((0.840905 0.102492))))) + ((R:SylStructure.parent.syl_out < 1.1) + ((0.667824 0.697337)) + ((0.737967 0.375114)))))) + ((ph_vheight is 1) + ((0.624353 0.410671)) + ((R:SylStructure.parent.asyl_in < 0.8) + ((0.647905 -0.331055)) + ((p.ph_ctype is s) + ((0.629039 -0.240616)) + ((0.749277 -0.0191273)))))) + ((ph_vheight is 3) + ((p.ph_ctype is s) + ((0.626922 0.556537)) + ((0.789357 0.153892))) + ((seg_onsetcoda is coda) + ((n.ph_ctype is 0) + ((R:SylStructure.parent.parent.word_numsyls < 3.4) + ((0.744714 0.123242)) + ((0.742039 0.295753))) + ((seg_coda_fric is 0) + ((R:SylStructure.parent.parent.word_numsyls < 2.4) + ((ph_vheight is 1) + ((0.549715 -0.341018)) + ((0.573641 -0.00893114))) + ((nn.ph_vfront is 2) + ((0.67099 -0.744625)) + ((0.664438 -0.302803)))) + ((p.ph_vlng is 0) + ((0.630028 0.113815)) + ((0.632794 -0.128733))))) + ((ph_ctype is r) + ((0.367169 -0.854509)) + ((0.94334 -0.216179)))))) + ((n.ph_ctype is f) + ((ph_vlng is 0) + ((1.3089 0.46195)) + ((R:SylStructure.parent.syl_codasize < 1.3) + ((1.07673 0.657169)) + ((pp.ph_vlng is 0) + ((0.972319 1.08222)) + ((1.00038 1.46257))))) + ((p.ph_vlng is l) + ((1.03617 0.785204)) + ((p.ph_vlng is a) + ((R:SylStructure.parent.position_type is final) + ((1.00681 0.321168)) + ((0.928115 0.950834))) + ((ph_vlng is 0) + ((pos_in_syl < 0.1) + ((R:SylStructure.parent.position_type is final) + ((0.863682 -0.167374)) + ((nn.ph_vheight is 0) + ((p.ph_ctype is f) + ((0.773591 -0.00374425)) + ((R:SylStructure.parent.syl_out < 1.1) + ((0.951802 0.228448)) + ((1.02282 0.504252)))) + ((1.09721 0.736476)))) + ((R:SylStructure.parent.position_type is final) + ((1.04302 0.0590974)) + ((0.589208 -0.431535)))) + ((n.ph_ctype is 0) + ((1.27879 1.00642)) + ((ph_vlng is s) + ((R:SylStructure.parent.asyl_in < 1.4) + ((0.935787 0.481652)) + ((0.9887 0.749861))) + ((R:SylStructure.parent.syl_out < 1.1) + ((R:SylStructure.parent.position_type is final) + ((0.921307 0.0696307)) + ((0.83675 0.552212))) + ((0.810076 -0.0479225)))))))))) + ((ph_ctype is s) + ((n.ph_ctype is s) + ((0.706959 -1.0609)) + ((p.ph_ctype is n) + ((0.850614 -0.59933)) + ((n.ph_ctype is r) + ((0.665947 0.00698725)) + ((n.ph_ctype is 0) + ((R:SylStructure.parent.position_type is initial) + ((0.762889 -0.0649044)) + ((0.723956 -0.248899))) + ((R:SylStructure.parent.sub_phrases < 1.4) + ((0.632957 -0.601987)) + ((0.889114 -0.302401))))))) + ((ph_ctype is f) + ((R:SylStructure.parent.syl_codasize < 2.2) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((R:SylStructure.parent.syl_out < 1.1) + ((0.865267 0.164636)) + ((0.581827 -0.0989051))) + ((nn.ph_vfront is 2) + ((0.684459 -0.316836)) + ((0.778854 -0.0961191)))) + ((R:SylStructure.parent.syl_out < 1.1) + ((p.ph_ctype is s) + ((0.837964 -0.429437)) + ((0.875304 -0.0652743))) + ((0.611071 -0.635089)))) + ((p.ph_ctype is r) + ((R:SylStructure.parent.syl_out < 1.1) + ((0.762012 0.0139361)) + ((0.567983 -0.454845))) + ((R:SylStructure.parent.syl_codasize < 2.2) + ((ph_ctype is l) + ((1.18845 0.809091)) + ((R:SylStructure.parent.position_type is initial) + ((ph_ctype is n) + ((0.773548 -0.277092)) + ((1.01586 0.281001))) + ((p.ph_ctype is 0) + ((1.06831 0.699145)) + ((0.924189 0.241873))))) + ((R:SylStructure.parent.syl_break is 0) + ((ph_ctype is n) + ((0.592321 -0.470784)) + ((0.778688 -0.072112))) + ((n.ph_ctype is s) + ((1.08848 0.0733489)) + ((1.25674 0.608371)))))))))) + ((pos_in_syl < 0.7) + ((p.ph_vlng is 0) + ((R:SylStructure.parent.position_type is mid) + ((ph_ctype is 0) + ((ph_vheight is 2) + ((0.456225 -0.293282)) + ((0.561529 -0.0816115))) + ((0.6537 -0.504024))) + ((ph_ctype is s) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 2) + ((1.31586 0.98395)) + ((R:SylStructure.parent.position_type is single) + ((0.816869 0.634789)) + ((R:SylStructure.parent.syl_out < 4.4) + ((1.05578 0.479029)) + ((R:SylStructure.parent.asyl_in < 0.4) + ((1.11813 0.143214)) + ((0.87178 0.406834)))))) + ((n.ph_ctype is n) + ((R:SylStructure.parent.last_accent < 0.6) + ((0.838154 -0.415599)) + ((0.924024 0.110288))) + ((seg_onsetcoda is coda) + ((nn.ph_vfront is 2) + ((0.670096 0.0314187)) + ((n.ph_ctype is f) + ((1.00363 0.693893)) + ((R:SylStructure.parent.syl_out < 6) + ((0.772363 0.215675)) + ((0.920313 0.574068))))) + ((R:SylStructure.parent.position_type is final) + ((0.673837 -0.458142)) + ((R:SylStructure.parent.sub_phrases < 2.8) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 2) + ((0.894817 0.304628)) + ((ph_ctype is n) + ((0.787302 -0.23094)) + ((R:SylStructure.parent.asyl_in < 1.2) + ((ph_ctype is f) + ((R:SylStructure.parent.last_accent < 0.5) + ((1.12278 0.326954)) + ((0.802236 -0.100616))) + ((0.791255 -0.0919132))) + ((0.95233 0.219053))))) + ((R:SylStructure.parent.position_type is initial) + ((ph_ctype is f) + ((1.0616 0.216118)) + ((0.703216 -0.00834086))) + ((ph_ctype is f) + ((1.22277 0.761763)) + ((0.904811 0.332721)))))))))) + ((ph_vheight is 0) + ((p.ph_vlng is s) + ((0.873379 0.217178)) + ((n.ph_ctype is r) + ((0.723915 1.29451)) + ((n.ph_ctype is 0) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 1) + ((R:SylStructure.parent.sub_phrases < 4) + ((seg_coda_fric is 0) + ((p.ph_vlng is l) + ((0.849154 0.945261)) + ((0.633261 0.687498))) + ((0.728546 0.403076))) + ((0.850962 1.00255))) + ((0.957999 1.09113))) + ((0.85771 0.209045))))) + ((ph_vheight is 2) + ((0.803401 -0.0544067)) + ((0.681353 0.256045))))) + ((n.ph_ctype is f) + ((ph_ctype is s) + ((p.ph_vlng is 0) + ((0.479307 -0.9673)) + ((0.700477 -0.351397))) + ((ph_ctype is f) + ((0.73467 -0.6233)) + ((R:SylStructure.parent.syl_break is 0) + ((p.ph_ctype is s) + ((0.56282 0.266234)) + ((p.ph_ctype is r) + ((0.446203 -0.302281)) + ((R:SylStructure.parent.sub_phrases < 2.7) + ((ph_ctype is 0) + ((0.572016 -0.0102436)) + ((0.497358 -0.274514))) + ((0.545477 0.0482177))))) + ((ph_vlng is s) + ((0.805269 0.888495)) + ((ph_ctype is n) + ((0.869854 0.653018)) + ((R:SylStructure.parent.sub_phrases < 2.2) + ((0.735031 0.0612886)) + ((0.771859 0.346637)))))))) + ((R:SylStructure.parent.syl_codasize < 1.4) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.3) + ((R:SylStructure.parent.position_type is initial) + ((0.743458 0.0411808)) + ((1.13068 0.613305))) + ((pos_in_syl < 1.2) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 1) + ((1.11481 0.175467)) + ((0.937893 -0.276407))) + ((0.74264 -0.550878)))) + ((pos_in_syl < 3.4) + ((seg_onsetcoda is coda) + ((ph_ctype is r) + ((n.ph_ctype is s) + ((0.714319 -0.240328)) + ((p.ph_ctype is 0) + ((0.976987 0.330352)) + ((1.1781 -0.0816682)))) + ((ph_ctype is l) + ((n.ph_ctype is 0) + ((1.39137 0.383533)) + ((0.725585 -0.324515))) + ((ph_vheight is 3) + ((ph_vlng is d) + ((0.802626 -0.62487)) + ((n.ph_ctype is r) + ((0.661091 -0.513869)) + ((R:SylStructure.parent.position_type is initial) + ((R:SylStructure.parent.parent.word_numsyls < 2.4) + ((0.482285 0.207874)) + ((0.401601 -0.0204711))) + ((0.733755 0.397372))))) + ((n.ph_ctype is r) + ((p.ph_ctype is 0) + ((pos_in_syl < 1.2) + ((0.666325 0.271734)) + ((nn.ph_vheight is 0) + ((0.642401 -0.261466)) + ((0.783684 -0.00956571)))) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((0.692225 -0.381895)) + ((0.741921 -0.0898767)))) + ((nn.ph_vfront is 2) + ((ph_ctype is s) + ((0.697527 -1.12626)) + ((n.ph_ctype is s) + ((ph_vlng is 0) + ((R:SylStructure.parent.sub_phrases < 2.4) + ((0.498719 -0.906926)) + ((0.635342 -0.625651))) + ((0.45886 -0.385089))) + ((0.848596 -0.359702)))) + ((p.ph_vlng is a) + ((p.ph_ctype is 0) + ((0.947278 0.216904)) + ((0.637933 -0.394349))) + ((p.ph_ctype is r) + ((R:SylStructure.parent.syl_break is 0) + ((0.529903 -0.860573)) + ((0.581378 -0.510488))) + ((ph_vlng is 0) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((seg_onset_stop is 0) + ((R:SylStructure.parent.syl_break is 0) + ((p.ph_vlng is d) + ((0.768363 0.0108428)) + ((ph_ctype is s) + ((0.835756 -0.035054)) + ((ph_ctype is f) + ((p.ph_vlng is s) + ((0.602016 -0.179727)) + ((0.640126 -0.297341))) + ((0.674628 -0.542602))))) + ((ph_ctype is s) + ((0.662261 -0.60496)) + ((0.662088 -0.432058)))) + ((R:SylStructure.parent.syl_out < 4.4) + ((0.582448 -0.389079)) + ((ph_ctype is s) + ((0.60413 -0.73564)) + ((0.567153 -0.605444))))) + ((R:SylStructure.parent.R:Syllable.p.syl_break is 2) + ((0.761115 -0.827377)) + ((ph_ctype is n) + ((0.855183 -0.275338)) + ((R:SylStructure.parent.syl_break is 0) + ((0.788288 -0.802801)) + ((R:SylStructure.parent.syl_codasize < 2.2) + ((0.686134 -0.371234)) + ((0.840184 -0.772883))))))) + ((pos_in_syl < 1.2) + ((R:SylStructure.parent.syl_break is 0) + ((n.ph_ctype is n) + ((0.423592 -0.655006)) + ((R:SylStructure.parent.syl_out < 4.4) + ((0.595269 -0.303751)) + ((0.478433 -0.456882)))) + ((0.688133 -0.133182))) + ((seg_onset_stop is 0) + ((1.27464 0.114442)) + ((0.406837 -0.167545)))))))))))) + ((ph_ctype is r) + ((0.462874 -0.87695)) + ((R:SylStructure.parent.R:Syllable.n.syl_onsetsize < 0.2) + ((0.645442 -0.640572)) + ((0.673717 -0.321322))))) + ((0.61008 -0.925472)))))))) +;; RMSE 0.8085 Correlation is 0.5899 Mean (abs) Error 0.6024 (0.5393) + + +)) + +(provide 'gswdurtreeZ) diff --git a/lib/holmes_phones.scm b/lib/holmes_phones.scm new file mode 100644 index 0000000..29e38ed --- /dev/null +++ b/lib/holmes_phones.scm @@ -0,0 +1,118 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; ;; +;; Centre for Speech Technology Research ;; +;; University of Edinburgh, UK ;; +;; Copyright (c) 1996,1997 ;; +;; All Rights Reserved. ;; +;; ;; +;; Permission is hereby granted, free of charge, to use and distribute ;; +;; this software and its documentation without restriction, including ;; +;; without limitation the rights to use, copy, modify, merge, publish, ;; +;; distribute, sublicense, and/or sell copies of this work, and to ;; +;; permit persons to whom this work is furnished to do so, subject to ;; +;; the following conditions: ;; +;; 1. The code must retain the above copyright notice, this list of ;; +;; conditions and the following disclaimer. ;; +;; 2. Any modifications must be clearly marked as such. ;; +;; 3. Original authors' names are not deleted. ;; +;; 4. The authors' names are not used to endorse or promote products ;; +;; derived from this software without specific prior written ;; +;; permission. ;; +;; ;; +;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;; THIS SOFTWARE. ;; +;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; A definition of the Holmes phone set used by the Donovan LPC +;; diphone synthesizer, the rest of the synthesis process will +;; typically use mrpa phones and map to these. +;; +;; Hmm not sure I've got the right mapping (as usual) + +(defPhoneSet + holmes + ;;; Phone Features + (;; vowel or consonant + (vc + -) + ;; vowel length: short long dipthong schwa + (vlng s l d a 0) + ;; vowel height: high mid low + (vheight 1 2 3 - 0) + ;; vowel frontness: front mid back + (vfront 1 2 3 - 0) + ;; lip rounding + (vrnd + - 0) + ;; consonant type: stop fricative affricative nasal lateral approximant + (ctype s f a n l r 0) + ;; place of articulation: labial alveolar palatal labio-dental + ;; dental velar glottal + (cplace l a p b d v g 0) + ;; consonant voicing + (cvox + - 0) + ) + ;; Phone set members + ( + ;; Note these features were set by awb so they are wrong !!! + (ee + l 1 1 - 0 0 0) ;; beet + (i + s 1 1 - 0 0 0) ;; bit + (ai + d 2 1 - 0 0 0) ;; gate + (e + s 2 1 - 0 0 0) ;; get + (aa + s 3 1 - 0 0 0) ;; fat + (ar + l 3 3 - 0 0 0) ;; father + (aw + l 3 3 + 0 0 0) ;; lawn + (oa + d 2 2 - 0 0 0) ;; lone + (oo + s 1 3 + 0 0 0) ;; full + (uu + l 1 3 + 0 0 0) ;; fool + (o + s 2 3 + 0 0 0) + (er + l 2 2 - 0 0 0) ;; murder + (a + a 2 2 - 0 0 0) ;; about + (u + s 2 3 - 0 0 0) ;; but + (ie + d 3 2 - 0 0 0) ;; hide + (ou + d 3 2 + 0 0 0) ;; how + (oi + d 3 3 + 0 0 0) ;; toy + (eer + d 2 1 - 0 0 0) + (air + d 1 1 - 0 0 0) + (oor + d 3 1 + 0 0 0) +;; (yu + l 2 3 + 0 0 +) ;; you ??? + + (p - 0 0 0 0 s l -) + (b - 0 0 0 0 s l +) + (t - 0 0 0 0 s a -) + (d - 0 0 0 0 s a +) + (k - 0 0 0 0 s v -) + (g - 0 0 0 0 s v +) + (f - 0 0 0 0 f b -) + (v - 0 0 0 0 f b +) + (th - 0 0 0 0 f d -) + (dh - 0 0 0 0 f d +) + (s - 0 0 0 0 f a -) + (z - 0 0 0 0 f a +) + (sh - 0 0 0 0 f p -) + (zh - 0 0 0 0 f p +) + (h - 0 0 0 0 f g -) + (m - 0 0 0 0 n l +) + (n - 0 0 0 0 n a +) + (ng - 0 0 0 0 n v +) + (ch - 0 0 0 0 a p -) + (j - 0 0 0 0 a p +) + (l - 0 0 0 0 l a +) + (w - 0 0 0 0 r l +) + (y - 0 0 0 0 r p +) + (r - 0 0 0 0 r a +) +;; (wh - 0 - - + l l -) ;; ?? +;; (wh - 0 - - + l l +) ;; map to w + (# - 0 0 0 0 0 0 -) + ) + ) + +(PhoneSet.silences '(#)) + +(provide 'holmes_phones) diff --git a/lib/hts.scm b/lib/hts.scm new file mode 100644 index 0000000..9cc8f45 --- /dev/null +++ b/lib/hts.scm @@ -0,0 +1,522 @@ +;; ---------------------------------------------------------------- ;; +;; Nagoya Institute of Technology and ;; +;; Carnegie Mellon University ;; +;; Copyright (c) 2002 ;; +;; All Rights Reserved. ;; +;; ;; +;; Permission is hereby granted, free of charge, to use and ;; +;; distribute this software and its documentation without ;; +;; restriction, including without limitation the rights to use, ;; +;; copy, modify, merge, publish, distribute, sublicense, and/or ;; +;; sell copies of this work, and to permit persons to whom this ;; +;; work is furnished to do so, subject to the following conditions: ;; +;; ;; +;; 1. The code must retain the above copyright notice, this list ;; +;; of conditions and the following disclaimer. ;; +;; ;; +;; 2. Any modifications must be clearly marked as such. ;; +;; ;; +;; 3. Original authors' names are not deleted. ;; +;; ;; +;; 4. The authors' names are not used to endorse or promote ;; +;; products derived from this software without specific prior ;; +;; written permission. ;; +;; ;; +;; NAGOYA INSTITUTE OF TECHNOLOGY, CARNEGIE MELLON UNIVERSITY AND ;; +;; THE CONTRIBUTORS TO THIS WORK DISCLAIM ALL WARRANTIES WITH ;; +;; REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF ;; +;; MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL NAGOYA INSTITUTE ;; +;; OF TECHNOLOGY, CARNEGIE MELLON UNIVERSITY NOR THE CONTRIBUTORS ;; +;; BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ;; +;; ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR ;; +;; PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER ;; +;; TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR ;; +;; PERFORMANCE OF THIS SOFTWARE. ;; +;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Generic HTS support code and specific features ;; +;; http://hts.ics.nitech.ac.jp ;; +;; Author : Alan W Black ;; +;; Date : August 2002 (and April 2004) ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; ;; +;; Still has language specific features in here, that will have to ;; +;; move out to the voices ;; +;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defvar hts_synth_pre_hooks nil) +(defvar hts_synth_post_hooks nil) +(defvar hts_engine_params nil) + +(defvar hts_duration_stretch 0) +(defvar hts_f0_mean 0) +(defvar hts_f0_std 1) +(defvar hts_fw_factor 0.42) +(defvar hts_total_length 0.0) +(defvar hts_uv_threshold 0.5) +(defvar hts_use_phone_align 0) + +(defSynthType HTS + (let ((featfile (make_tmp_filename)) + (mcepfile (make_tmp_filename)) + (f0file (make_tmp_filename)) + (wavfile (make_tmp_filename)) + (labfile (make_tmp_filename))) + + (apply_hooks hts_synth_pre_hooks utt) + + (set! hts_output_params + (list + (list "-labelfile" featfile) + (list "-om" mcepfile) + (list "-of" f0file) + (list "-or" wavfile) + (list "-od" labfile)) + ) + + (hts_dump_feats utt hts_feats_list featfile) + + (HTS_Synthesize utt) + + (delete-file featfile) + (delete-file mcepfile) + (delete-file f0file) + (delete-file wavfile) + (delete-file labfile) + + (apply_hooks hts_synth_post_hooks utt) + utt) +) + +(define (hts_feats_output ofd s) + "This is bad as it makes decisions about what the feats are" +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; SEGMENT + +; boundary + (format ofd "%10.0f %10.0f " + (* 10000000 (item.feat s "segment_start")) + (* 10000000 (item.feat s "segment_end"))) + +; pp.name + (format ofd "%s" (if (string-equal "0" (item.feat s "p.p.name")) + "x" (item.feat s "p.p.name"))) +; p.name + (format ofd "^%s" (if (string-equal "0" (item.feat s "p.name")) + "x" (item.feat s "p.name"))) +; c.name + (format ofd "-%s" (if (string-equal "0" (item.feat s "name")) + "x" (item.feat s "name"))) +; n.name + (format ofd "+%s" (if (string-equal "0" (item.feat s "n.name")) + "x" (item.feat s "n.name"))) +; nn.name + (format ofd "=%s" (if (string-equal "0" (item.feat s "n.n.name")) + "x" (item.feat s "n.n.name"))) + +; position in syllable (segment) + (format ofd "@") + (format ofd "%s" (if (string-equal "pau" (item.feat s "name")) + "x" (+ 1 (item.feat s "pos_in_syl")))) + (format ofd "_%s" (if (string-equal "pau" (item.feat s "name")) + "x" (- (item.feat s "R:SylStructure.parent.R:Syllable.syl_numphones") + (item.feat s "pos_in_syl")))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; SYLLABLE + +;; previous syllable + +; p.stress + (format ofd "/A:%s" + (if (string-equal "pau" (item.feat s "name")) + (item.feat s "p.R:SylStructure.parent.R:Syllable.stress") + (item.feat s "R:SylStructure.parent.R:Syllable.p.stress"))) +; p.accent + (format ofd "_%s" + (if (string-equal "pau" (item.feat s "name")) + (item.feat s "p.R:SylStructure.parent.R:Syllable.accented") + (item.feat s "R:SylStructure.parent.R:Syllable.p.accented"))) +; p.length + (format ofd "_%s" + (if (string-equal "pau" (item.feat s "name")) + (item.feat s "p.R:SylStructure.parent.R:Syllable.syl_numphones") + (item.feat s "R:SylStructure.parent.R:Syllable.p.syl_numphones"))) +;; current syllable + +; c.stress + (format ofd "/B:%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (item.feat s "R:SylStructure.parent.R:Syllable.stress"))) +; c.accent + (format ofd "-%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (item.feat s "R:SylStructure.parent.R:Syllable.accented"))) +; c.length + (format ofd "-%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (item.feat s "R:SylStructure.parent.R:Syllable.syl_numphones"))) + +; position in word (syllable) + (format ofd "@%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (+ 1 (item.feat s "R:SylStructure.parent.R:Syllable.pos_in_word")))) + (format ofd "-%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (- + (item.feat s "R:SylStructure.parent.parent.R:Word.word_numsyls") + (item.feat s "R:SylStructure.parent.R:Syllable.pos_in_word")))) + +; position in phrase (syllable) + (format ofd "&%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (+ 1 + (item.feat s "R:SylStructure.parent.R:Syllable.syl_in")))) + (format ofd "-%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (+ 1 + (item.feat s "R:SylStructure.parent.R:Syllable.syl_out")))) + +; position in phrase (stressed syllable) + (format ofd "#%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (+ 1 + (item.feat s "R:SylStructure.parent.R:Syllable.ssyl_in")))) + (format ofd "-%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (+ 1 + (item.feat s "R:SylStructure.parent.R:Syllable.ssyl_out")))) + +; position in phrase (accented syllable) + (format ofd "$%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (+ 1 + (item.feat s "R:SylStructure.parent.R:Syllable.asyl_in")))) + (format ofd "-%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (+ 1 + (item.feat s "R:SylStructure.parent.R:Syllable.asyl_out")))) + +; distance from stressed syllable + (format ofd "!%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (item.feat s "R:SylStructure.parent.R:Syllable.lisp_distance_to_p_stress"))) + (format ofd "-%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (item.feat s "R:SylStructure.parent.R:Syllable.lisp_distance_to_n_stress"))) + +; distance from accented syllable + (format ofd ";%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (item.feat s "R:SylStructure.parent.R:Syllable.lisp_distance_to_p_accent"))) + (format ofd "-%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (item.feat s "R:SylStructure.parent.R:Syllable.lisp_distance_to_n_accent"))) + +; name of the vowel of current syllable + (format ofd "|%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (item.feat s "R:SylStructure.parent.R:Syllable.syl_vowel"))) + +;; next syllable + (format ofd "/C:%s" + (if (string-equal "pau" (item.feat s "name")) + (item.feat s "n.R:SylStructure.parent.R:Syllable.stress") + (item.feat s "R:SylStructure.parent.R:Syllable.n.stress"))) +; n.accent + (format ofd "+%s" + (if (string-equal "pau" (item.feat s "name")) + (item.feat s "n.R:SylStructure.parent.R:Syllable.accented") + (item.feat s "R:SylStructure.parent.R:Syllable.n.accented"))) +; n.length + (format ofd "+%s" + (if (string-equal "pau" (item.feat s "name")) + (item.feat s "n.R:SylStructure.parent.R:Syllable.syl_numphones") + (item.feat s "R:SylStructure.parent.R:Syllable.n.syl_numphones"))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +; WORD + +;;;;;;;;;;;;;;;;;; +;; previous word + +; p.gpos + (format ofd "/D:%s" + (if (string-equal "pau" (item.feat s "name")) + (item.feat s "p.R:SylStructure.parent.parent.R:Word.gpos") + (item.feat s "R:SylStructure.parent.parent.R:Word.p.gpos"))) +; p.lenght (syllable) + (format ofd "_%s" + (if (string-equal "pau" (item.feat s "name")) + (item.feat s "p.R:SylStructure.parent.parent.R:Word.word_numsyls") + (item.feat s "R:SylStructure.parent.parent.R:Word.p.word_numsyls"))) + +;;;;;;;;;;;;;;;;; +;; current word + +; c.gpos + (format ofd "/E:%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (item.feat s "R:SylStructure.parent.parent.R:Word.gpos"))) +; c.lenght (syllable) + (format ofd "+%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (item.feat s "R:SylStructure.parent.parent.R:Word.word_numsyls"))) + +; position in phrase (word) + (format ofd "@%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (+ 1 (item.feat s "R:SylStructure.parent.parent.R:Word.pos_in_phrase")))) + (format ofd "+%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (item.feat s "R:SylStructure.parent.parent.R:Word.words_out"))) + +; position in phrase (content word) + (format ofd "&%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (+ 1 (item.feat s "R:SylStructure.parent.parent.R:Word.content_words_in")))) + (format ofd "+%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (item.feat s "R:SylStructure.parent.parent.R:Word.content_words_out"))) + +; distance from content word in phrase + (format ofd "#%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (item.feat s "R:SylStructure.parent.parent.R:Word.lisp_distance_to_p_content"))) + (format ofd "+%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (item.feat s "R:SylStructure.parent.parent.R:Word.lisp_distance_to_n_content"))) + +;;;;;;;;;;;;;; +;; next word + +; n.gpos + (format ofd "/F:%s" + (if (string-equal "pau" (item.feat s "name")) + (item.feat s "n.R:SylStructure.parent.parent.R:Word.gpos") + (item.feat s "R:SylStructure.parent.parent.R:Word.n.gpos"))) +; n.lenghte (syllable) + (format ofd "_%s" + (if (string-equal "pau" (item.feat s "name")) + (item.feat s "n.R:SylStructure.parent.parent.R:Word.word_numsyls") + (item.feat s "R:SylStructure.parent.parent.R:Word.n.word_numsyls"))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +; PHRASE + +;;;;;;;;;;;;;;;;;;;; +;; previous phrase + +; length of previous phrase (syllable) + (format ofd "/G:%s" + (if (string-equal "pau" (item.feat s "name")) + (item.feat s "p.R:SylStructure.parent.parent.R:Phrase.parent.lisp_num_syls_in_phrase") + (item.feat s "R:SylStructure.parent.parent.R:Phrase.parent.p.lisp_num_syls_in_phrase"))) + +; length of previous phrase (word) + (format ofd "_%s" + (if (string-equal "pau" (item.feat s "name")) + (item.feat s "p.R:SylStructure.parent.parent.R:Phrase.parent.lisp_num_words_in_phrase") + (item.feat s "R:SylStructure.parent.parent.R:Phrase.parent.p.lisp_num_words_in_phrase"))) + +;;;;;;;;;;;;;;;;;;;; +;; current phrase + +; length of current phrase (syllable) + (format ofd "/H:%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (item.feat s "R:SylStructure.parent.parent.R:Phrase.parent.lisp_num_syls_in_phrase"))) + +; length of current phrase (word) + (format ofd "=%s" + (if (string-equal "pau" (item.feat s "name")) + "x" + (item.feat s "R:SylStructure.parent.parent.R:Phrase.parent.lisp_num_words_in_phrase"))) + +; position in major phrase (phrase) + (format ofd "@%s" + (+ 1 (item.feat s "R:SylStructure.parent.R:Syllable.sub_phrases"))) + (format ofd "=%s" + (- + (item.feat s "lisp_total_phrases") + (item.feat s "R:SylStructure.parent.R:Syllable.sub_phrases"))) + +; type of tobi endtone of current phrase + (format ofd "|%s" + (item.feat s "R:SylStructure.parent.parent.R:Phrase.parent.daughtern.R:SylStructure.daughtern.tobi_endtone")) + +;;;;;;;;;;;;;;;;;;;; +;; next phrase + +; length of next phrase (syllable) + (format ofd "/I:%s" + (if (string-equal "pau" (item.feat s "name")) + (item.feat s "n.R:SylStructure.parent.parent.R:Phrase.parent.lisp_num_syls_in_phrase") + (item.feat s "R:SylStructure.parent.parent.R:Phrase.parent.n.lisp_num_syls_in_phrase"))) + +; length of next phrase (word) + (format ofd "=%s" + (if (string-equal "pau" (item.feat s "name")) + (item.feat s "n.R:SylStructure.parent.parent.R:Phrase.parent.lisp_num_words_in_phrase") + (item.feat s "R:SylStructure.parent.parent.R:Phrase.parent.n.lisp_num_words_in_phrase"))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +; UTTERANCE + +; length (syllable) + (format ofd "/J:%s" (item.feat s "lisp_total_syls")) + +; length (word) + (format ofd "+%s" (item.feat s "lisp_total_words")) + +; length (phrase) + (format ofd "-%s" (item.feat s "lisp_total_phrases")) + + (format ofd "\n") + +) + +(define (hts_dump_feats utt feats ofile) + (let ((ofd (fopen ofile "w"))) + (mapcar + (lambda (s) + (hts_feats_output ofd s)) + (utt.relation.items utt 'Segment)) + (fclose ofd) + )) + + +;; +;; Extra features +;; From Segment items refer by +;; +;; R:SylStructure.parent.parent.R:Phrase.parent.lisp_num_syls_in_phrase +;; R:SylStructure.parent.parent.R:Phrase.parent.lisp_num_words_in_phrase +;; lisp_total_words +;; lisp_total_syls +;; lisp_total_phrases +;; +;; The last three will act on any item + +(define (distance_to_p_content i) + (let ((c 0) (rc 0 ) (w (item.relation.prev i "Phrase"))) + (while w + (set! c (+ 1 c)) + (if (string-equal "1" (item.feat w "contentp")) + (begin + (set! rc c) + (set! w nil)) + (set! w (item.prev w))) + ) + rc)) + +(define (distance_to_n_content i) + (let ((c 0) (rc 0) (w (item.relation.next i "Phrase"))) + (while w + (set! c (+ 1 c)) + (if (string-equal "1" (item.feat w "contentp")) + (begin + (set! rc c) + (set! w nil)) + (set! w (item.next w))) + ) + rc)) + +(define (distance_to_p_accent i) + (let ((c 0) (rc 0 ) (w (item.relation.prev i "Syllable"))) + (while (and w (member_string (item.feat w "syl_break") '("0" "1"))) + (set! c (+ 1 c)) + (if (string-equal "1" (item.feat w "accented")) + (begin + (set! rc c) + (set! w nil)) + (set! w (item.prev w))) + ) + rc)) + +(define (distance_to_n_accent i) + (let ((c 0) (rc 0 ) (w (item.relation.next i "Syllable"))) + (while (and w (member_string (item.feat w "p.syl_break") '("0" "1"))) + (set! c (+ 1 c)) + (if (string-equal "1" (item.feat w "accented")) + (begin + (set! rc c) + (set! w nil)) + (set! w (item.next w))) + ) + rc)) + +(define (distance_to_p_stress i) + (let ((c 0) (rc 0 ) (w (item.relation.prev i "Syllable"))) + (while (and w (member_string (item.feat w "syl_break") '("0" "1"))) + (set! c (+ 1 c)) + (if (string-equal "1" (item.feat w "stress")) + (begin + (set! rc c) + (set! w nil)) + (set! w (item.prev w))) + ) + rc)) + +(define (distance_to_n_stress i) + (let ((c 0) (rc 0 ) (w (item.relation.next i "Syllable"))) + (while (and w (member_string (item.feat w "p.syl_break") '("0" "1"))) + (set! c (+ 1 c)) + (if (string-equal "1" (item.feat w "stress")) + (begin + (set! rc c) + (set! w nil)) + (set! w (item.next w))) + ) + rc)) + +(define (num_syls_in_phrase i) + (apply + + + (mapcar + (lambda (w) + (length (item.relation.daughters w 'SylStructure))) + (item.relation.daughters i 'Phrase)))) + +(define (num_words_in_phrase i) + (length (item.relation.daughters i 'Phrase))) + +(define (total_words w) + (length + (utt.relation.items (item.get_utt w) 'Word))) + +(define (total_syls s) + (length + (utt.relation.items (item.get_utt s) 'Syllable))) + +(define (total_phrases s) + (length + (utt.relation_tree (item.get_utt s) 'Phrase))) + +(provide 'hts) diff --git a/lib/init.scm b/lib/init.scm new file mode 100644 index 0000000..90bccb7 --- /dev/null +++ b/lib/init.scm @@ -0,0 +1,157 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Initialisation file -- loaded before anything else +;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;;; Basic siod library (need this before load_library or require works) +(load (path-append libdir "siod.scm")) + +(defvar home-directory (or (getenv "HOME") "/") + "home-directory + Place looked at for .festivalrc etc.") + +;;; User startup initialization, can be used to override load-path +;;; to allow alternate basic modules to be loaded. +(if (probe_file (path-append home-directory ".siodvarsrc")) + (load (path-append home-directory ".siodvarsrc"))) + +(if (probe_file (path-append home-directory ".festivalvarsrc")) + (load (path-append home-directory ".festivalvarsrc"))) + +;;; A chance to set various variables to a local setting e.g. +;;; lexdir, voices_dir audio etc etc. +(if (probe_file (path-append libdir "sitevars.scm")) + (load (path-append libdir "sitevars.scm"))) + +;;; CSTR siod extensions +(require 'cstr) + +;;; Festival specific definitions +(require 'festival) + +;;; Dealing with module descriptions +(require 'module_description) + +;;; Web related definitions +(require 'web) + +;;; Utterance types and support +(require 'synthesis) + +;;; Some default parameters +(Parameter.def 'Wavefiletype 'riff) + +;;; Set default audio method +(cond + ((member 'nas *modules*) + (Parameter.def 'Audio_Method 'netaudio)) + ((member 'esd *modules*) + (Parameter.def 'Audio_Method 'esdaudio)) + ((member 'sun16audio *modules*) + (Parameter.def 'Audio_Method 'sun16audio)) + ((member 'freebsd16audio *modules*) + (Parameter.def 'Audio_Method 'freebsd16audio)) + ((member 'linux16audio *modules*) + (Parameter.def 'Audio_Method 'linux16audio)) + ((member 'irixaudio *modules*) + (Parameter.def 'Audio_Method 'irixaudio)) + ((member 'macosxaudio *modules*) + (Parameter.def 'Audio_Method 'macosxaudio)) + ((member 'win32audio *modules*) + (Parameter.def 'Audio_Method 'win32audio)) + ((member 'os2audio *modules*) + (Parameter.def 'Audio_Method 'os2audio)) + ((member 'mplayeraudio *modules*) + (Parameter.def 'Audio_Method 'mplayeraudio)) + (t ;; can't find direct support so guess that /dev/audio for 8k ulaw exists + (Parameter.def 'Audio_Method 'sunaudio))) +;;; If you have an external program to play audio add its definition +;;; in siteinit.scm + +;;; The audio spooler doesn't work under Windows so redefine audio_mode +(if (member 'mplayeraudio *modules*) + (define (audio_mode param) param) +) + +;;; Intonation +(require 'intonation) + +;;; Duration +(require 'duration) + +;;; A large lexicon +(require 'lexicons) +(require 'pauses) + +;;; Part of speech prediction +(require 'pos) + +;;; Phrasing (dependent on pos) +(require 'phrase) + +;;; POstlexical rules +(require 'postlex) + +;;; Different voices +(require 'voices) ;; sets voice_default +(require 'languages) + +;;; Some higher level functions +(require 'token) +(require 'tts) + +;;; +;;; Local site initialization, if the file exists load it +;;; +(if (probe_file (path-append libdir "siteinit.scm")) + (load (path-append libdir "siteinit.scm"))) + +;;; User initialization, if a user has a personal customization +;;; file loaded it +(if (probe_file (path-append home-directory ".siodrc")) + (load (path-append home-directory ".siodrc"))) + +(if (probe_file (path-append home-directory ".festivalrc")) + (load (path-append home-directory ".festivalrc"))) + +;;; Default voice (have to do something cute so autoloads still work) +(eval (list voice_default)) + +(provide 'init) + + + + + diff --git a/lib/intonation.scm b/lib/intonation.scm new file mode 100644 index 0000000..8062e03 --- /dev/null +++ b/lib/intonation.scm @@ -0,0 +1,187 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Basic Intonation modules. These call appropriate sub-modules +;;; depending on the chosen intonation methods +;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;;; These modules should predict intonation events/labels +;;; based on information in the phrase and word streams + +; to detect prespecified accents (feature "accent" in 'Word relation) +; AS 5/29/00 + +(define (tobi_accent_prespecified utt) + (let ((tobi_found nil) + (words (utt.relation.items utt 'Word))) + + (while (and words (not tobi_found)) +; feature "accent" might be prespecified on words or tokens, AS 05/29/00 + (if (item.feat.present (car words) 'accent) + (set! tobi_found t) +; if Token relation exists, check tokens as well + (if (not (null (item.parent (item.relation (car words) 'Token)))) + (if (item.feat.present (item.parent (item.relation (car words) 'Token)) 'accent) + (set! tobi_found t) + (set! words (cdr words))) + (set! words (cdr words))))) + tobi_found)) + +(set! int_accent_cart_tree_no_accent +'((NONE))) + +(define (Intonation utt) +"(Intonation utt) +Select between different intonation modules depending on the Parameter +Int_Method. Currently offers three types: Simple, hats on each content +word; ToBI, a tree method for predicting ToBI accents; and Default a +really bad method with a simple downward sloping F0. This is the first +of a two-stage intonation prediction process. This adds accent-like +features to syllables, the second, Int_Targets generates the F0 contour +itself. [see Intonation]" + +; AS 5/29/00: Hack to avoid prediction of further accent labels +; on utterance chunks that have already been annotated with +; accent labels +; use CART that doesn't assign any labels when using Intonation_Tree + +(if (tobi_accent_prespecified utt) + (progn + (set! int_accent_cart_tree_save int_accent_cart_tree) + (set! int_accent_cart_tree int_accent_cart_tree_no_accent) + (Intonation_Tree utt) + (set! int_accent_cart_tree int_accent_cart_tree_save)) + + (let ((rval (apply_method 'Int_Method utt))) + (Parameter.get 'Int_Method) + (cond + (rval rval) ;; new style + ((eq 'Simple (Parameter.get 'Int_Method)) + (Intonation_Simple utt)) + ((eq 'ToBI (Parameter.get 'Int_Method)) + (format t "Using Intonation_Tree") + (Intonation_Tree utt)) + ((eq 'General (Parameter.get 'Int_Method)) + (Intonation_Simple utt)) ;; yes this is a duplication + (t + (Intonation_Default utt)))))) + + +;;; These modules should create an actual F0 contour based on the +;;; the existing intonational events/labels etc +;;; Specifically this is called after durations have been predicted + +(define (Int_Targets utt) +"(Int_Targets utt) +The second stage in F0 prediction. This generates F0 targets +related to segments using one of three methods, a simple hat, +linear regression based on ToBI markings, and a simple declining +slope. This second part deals with actual F0 values and durations, +while the previous section only deals with accent (and boundary tone) +assignment. [see Intonation]" + (let ((rval (apply_method 'Int_Target_Method utt))) + (cond + (rval rval) ;; new style + ((eq 'Simple (Parameter.get 'Int_Method)) + (Int_Targets_Simple utt)) + ((eq 'ToBI (Parameter.get 'Int_Method)) + (Int_Targets_LR utt)) + ((eq 'General (Parameter.get 'Int_Method)) + (Int_Targets_General utt)) + (t + (Int_Targets_Default utt))))) + +;;; +;;; A tree that adds accents (H) to all content words +;;; simple but better than nothing at all +;;; +(set! simple_accent_cart_tree + ' + ((R:SylStructure.parent.gpos is content) + ((stress is 1) + ((Accented)) + ((position_type is single) + ((Accented)) + ((NONE)))) + ((NONE)))) + +(defvar duffint_params '((start 130) (end 110)) + "duffint_params +Default parameters for Default (duff) intonation target generation. +This is an assoc list of parameters. Two parameters are supported +start specifies the start F0 in Hertz for an utterance, and end specifies +the end.") + +;;; +;;; For simple testing, this function adds fixed duration and +;;; monotone intonation to a set of phones +;;; +(defvar FP_F0 120 +"FP_F0 +In using Fixed_Prosody as used in Phones type utterances and hence +SayPhones, this is the value in Hertz for the monotone F0.") +(defvar FP_duration 100 +"FP_duration +In using Fixed_Prosody as used in Phones type utterances and hence +SayPhones, this is the fix value in ms for phone durations.") + +(define (Fixed_Prosody utt) +"(Fixed_Prosody UTT) +Add fixed duration and fixed monotone F0 to the sgements in UTT. +Uses values of FP_duration and FP_F0 as fixed values." + (let (utt1 + (dur_stretch (Parameter.get 'Duration_Stretch)) + (orig_duffint_params duffint_params)) + (Parameter.set 'Duration_Stretch (/ FP_duration 100.0)) + (set! duffint_params (list (list 'start FP_F0) (list 'end FP_F0))) + + (set! utt1 (Duration_Default utt)) + (set! utt1 (Int_Targets_Default utt1)) + + ;; Reset Parameter values back + (Parameter.set 'Duration_Stretch dur_stretch) + (set! duffint_params orig_duffint_params) + + utt1 + ) +) + +(define (segment_dpitch seg) +"(segment_dpitch UTT SEG) +Returns delta pitch, this pitch minus previous pitch." + (- + (parse-number (item.feat utt seg 'seg_pitch)) + (parse-number (item.feat utt seg 'R:Segment.p.seg_pitch)))) + +(provide 'intonation) diff --git a/lib/java.scm b/lib/java.scm new file mode 100644 index 0000000..e6f514e --- /dev/null +++ b/lib/java.scm @@ -0,0 +1,39 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1998 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Functions specific to supporting a Java client +;;; + +;; none required yet + +(provide 'java) diff --git a/lib/klatt_durs.scm b/lib/klatt_durs.scm new file mode 100644 index 0000000..8f3864c --- /dev/null +++ b/lib/klatt_durs.scm @@ -0,0 +1,85 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Phone duration info for Klatt rules, for mrpa phone set + +(set! duration_klatt_params +'( +(a 230.0 80.0) +(aa 240.0 100.0) +(@ 120.0 60.0) +(@@ 180.0 80.0) +(ai 250.0 150.0) +(au 240.0 100.0) +(b 85.0 60.0) +(ch 70.0 50.0) +(d 75.0 50.0) +(dh 50.0 30.0) +(e 150.0 70.0) +(e@ 270.0 130.0) +(ei 180.0 100.0) +(f 100.0 80.0) +(g 80.0 60.0) +(h 80.0 20.0) +(i 135.0 40.0) +(i@ 230.0 100.0) +(ii 155.0 55) +(jh 70.0 50.0) +(k 80.0 60.0) +(l 80.0 40.0) +(m 70.0 60.0) +(n 60.0 50.0) +(ng 95.0 60.0) +(o 240.0 130.0) +(oi 280.0 150.0) +(oo 240.0 130.0) +(ou 220.0 80.0) +(p 90.0 50.0) +(r 80.0 30.0) +(s 105.0 60.0) +(sh 105.0 80.0) +(t 75.0 50.0) +(th 90.0 60.0) +(u 210.0 70.0) +(u@ 230.0 110.0) +(uh 160.0 60.0) +(uu 230.0 150.0) +(v 60.0 40.0) +(w 80.0 60.0) +(y 80.0 40.0) +(z 75.0 40.0) +(zh 70.0 40.0) +(# 100.0 100.0) +)) + +(provide 'klatt_durs) diff --git a/lib/languages.scm b/lib/languages.scm new file mode 100644 index 0000000..9382ad3 --- /dev/null +++ b/lib/languages.scm @@ -0,0 +1,120 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Specification of voices and some major choices of synthesis +;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; This should use some sort of database description for voices so +;;; new voices will become automatically available. +;;; + +(define (language_british_english) +"(language_british_english) +Set up language parameters for British English." + (require 'voices) + ;; Will get more elaborate, with different choices of voices in language + + (set! male1 voice_rab_diphone) + (set! male2 voice_don_diphone) + (if (symbol-bound? 'voice_gsw_diphone) + (set! male3 voice_gsw_diphone)) + (if (symbol-bound? 'voice_gsw_450) + (set! male4 voice_gsw_450)) + + (male1) + (Parameter.set 'Language 'britishenglish) +) + +(define (language_american_english) +"(language_american_english) +Set up language parameters for Aemerican English." + + (if (symbol-bound? 'voice_kal_diphone) + (set! female1 voice_kal_diphone)) + (set! male1 voice_ked_diphone) + + (male1) + (Parameter.set 'Language 'americanenglish) +) + +(define (language_scots_gaelic) +"(language_scots_gaelic) +Set up language parameters for Scots Gaelic." + (error "Scots Gaelic not yet supported.") + + (Parameter.set 'Language 'scotsgaelic) +) + +(define (language_welsh) +"(language_welsh) +Set up language parameters for Welsh." + + (set! male1 voice_welsh_hl) + + (male1) + (Parameter.set 'Language 'welsh) +) + +(define (language_castillian_spanish) +"(language_spanish) +Set up language parameters for Castillian Spanish." + + (voice_el_diphone) + (set! male1 voice_el_diphone) + + (Parameter.set 'Language 'spanish) +) + +(define (select_language language) + (cond + ((or (equal? language 'britishenglish) + (equal? language 'english)) ;; we all know its the *real* English + (language_british_english)) + ((equal? language 'americanenglish) + (language_american_english)) + ((equal? language 'scotsgaelic) + (language_scots_gaelic)) + ((equal? language 'welsh) + (language_welsh)) + ((equal? language 'spanish) + (language_castillian_spanish)) + ((equal? language 'klingon) + (language_klingon)) + (t + (print "Unsupported language, using English") + (language_british_english)))) + +(defvar language_default language_british_english) + +(provide 'languages) diff --git a/lib/lexicons.scm b/lib/lexicons.scm new file mode 100644 index 0000000..574c8fa --- /dev/null +++ b/lib/lexicons.scm @@ -0,0 +1,274 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Definition of various lexicons +;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;;; If there exists a sudirectory of the lib-path called dicts then that +;;; is used as the lexicon directory by default. If it doesn't exist +;;; we set lexdir to the directory in CSTR where our lexicons are. +;;; In non-CSTR installations where lexicons are not in lib/dicts, +;;; you should set lexdir in sitevars.scm + +(defvar lexdir + (if (probe_file (path-append libdir "dicts")) + (path-append libdir "dicts/") + ;; else we'll guess we're in the CSTR filespace + (path-as-directory "/projects/festival/lib/dicts/")) + "lexdir + The directory where the lexicon(s) are, by default.") + +(require 'pos) ;; for part of speech mapping + +(define (setup_cstr_lex) +"(setup_cstr_lexicon) +Define and setup the CSTR lexicon. The CSTR lexicon consists +of about 25,000 entries in the mrpa phone set. A large number of +specific local entries are also added to the addenda." + (if (not (member_string "mrpa" (lex.list))) + (begin + (lex.create "mrpa") + (lex.set.compile.file (path-append lexdir "cstrlex.out")) + (lex.set.phoneset "mrpa") + (lex.set.lts.method 'lts_rules) + (lex.set.lts.ruleset 'nrl) + (lex.set.pos.map english_pos_map_wp39_to_wp20) + (mrpa_addenda) + (lex.add.entry + '("previous" nil (((p r ii) 1) ((v ii) 0) ((@ s) 0)))) + (lex.add.entry + '("audio" () (((oo d) 1) ((ii) 0) ((ou) 0)))) + (lex.add.entry + '("modules" () (((m o d) 1) ((uu l s) 0)))) + ))) + +(define (setup_oald_lex) +"(setup_oald_lexicon) +Define and setup the CUVOALD lexicon. This is derived from the +Computer Users Version of the Oxford Advanced Learners' Dictionary +of Current English. This version includes a trained set of letter +to sound rules which have also been used to reduce the actual lexicon +size by over half, for those entries that the lts model gets exactly +the same." + (if (not (member_string "oald" (lex.list))) + (load (path-append lexdir "oald/oaldlex.scm")))) + +(define (setup_cmu_lex) + "(setup_cmu_lex) +Lexicon derived from the CMU lexicon (cmudict-0.4), around 100,000 entries, +in the radio phoneset (sort of darpa-like). Includes letter to sound +rule model trained from this data, and uses the lexical stress predictor +from OALD." + (if (not (member_string "cmu" (lex.list))) + (load (path-append lexdir "cmu/cmulex.scm")))) + +(define (setup_cmumt_lex) + "(setup_cmumt_lex) +Lexicon derived from the CMU lexicon (cmudict-0.4), around 100,000 entries, +in the radio phoneset (sort of darpa-like). Includes letter to sound +rule model trained from this data, and uses the lexical stress predictor +from OALD." + (if (not (member_string "cmumt" (lex.list))) + (load (path-append lexdir "cmu_mt/cmumtlex.scm")))) + +(define (setup_cmu6_lex) + "(setup_cmu6_lex) +Lexicon derived from the CMU lexicon (cmudict-0.6), around 100,000 entries, +in the radio phoneset (sort of darpa-like). Includes letter to sound +rule model trained from this data, the format of this lexicon is suitable +for the UniSyn metrical phonology modules. That is the entries are +not syllabified," + (if (not (member_string "cmu6" (lex.list))) + (load (path-append lexdir "cmu6/cmu6lex.scm")))) + +(define (setup_moby_lex) +"(setup_moby_lexicon) +Define and setup the MOBY lexicon. This is derived from the public +domain version of the Moby (TM) Pronunciator II lexicon. It can be +converted automatically to British English mrpa phoneset which of +course is sub-optimal. It contains around 120,000 entries and has part +of speech information for homographs." + (if (not (member_string "moby" (lex.list))) + (begin + (lex.create "moby") + ; (lex.set.compile.file (path-append lexdir "mobylex.out")) + (lex.set.compile.file "/home/awb/src/mobypron/mobylex.out") + (lex.set.phoneset "mrpa") + (lex.set.lts.method 'lts_rules) + (lex.set.lts.ruleset 'nrl) + (lex.set.pos.map english_pos_map_wp39_to_wp20) + (lex.add.entry + '("a" dt (((@) 0)))) + (lex.add.entry + '("the" dt (((dh @) 0)))) + (lex.add.entry + '("taylor" n (((t ei) 1) ((l @) 0)))) + (lex.add.entry + '("who" prp ((( h uu ) 0)))) + (mrpa_addenda)))) + +(define (setup_beep_lex) + "(setup_beep_lex) +Lexicon derived from the British English Example Pronunciation dictionary +(BEEP) from Tony Robinson ajr@eng.cam.ac.uk. Around 160,000 entries." + (if (not (member_string "beep" (lex.list))) + (begin + (lex.create "beep") + (lex.set.compile.file (path-append lexdir "beep_lex.out")) + (lex.set.phoneset "mrpa") + (lex.set.lts.method 'lts_rules) + (lex.set.lts.ruleset 'nrl) + (lex.set.pos.map english_pos_map_wp39_to_wp20) + (lex.add.entry + '("taylor" nil (((t ei) 1) ((l @) 0)))) + (mrpa_addenda)))) + +;;; The nrl letter to sound rules produce mrpa phone set so we need +;;; to do some fancy things to make them work for American English +(define (f2b_lts word features) +"(f2b_lts WORD FEATURES) +Letter to sound rule system for f2b (American English), uses the NRL +LTS ruleset and maps the result to the radio phone set." + '("unknown" nil (((ah n) 0) ((n ow n) 1))) +) + +;;; A CART tree for predicting lexical stress for strings of phones +;;; generated by the LTS models. This was actually trained from +;;; OALD as that's the only lexicon with stress and part of speech information +;;; It trained in a phoneset independent way and may be used be either +;;; OALD or CMU models (and probably MOBY and OGI lex too). +;;; On held out data it gets +;;; 07390 378 7768 [7390/7768] 95.134 +;;; 1 512 8207 8719 [8207/8719] 94.128 +;;; 7902 8585 +;;; total 16487 correct 15597.000 94.602% +;;; +(set! english_stress_tree +'((sylpos < 1.7) + ((1)) + ((ph_vlng is a) + ((0)) + ((ph_vheight is 1) + ((num2end < 1.5) + ((ph_vfront is 1) + ((ph_vlng is s) ((0)) ((pos is v) ((1)) ((0)))) + ((pos is n) ((0)) ((sylpos < 2.2) ((1)) ((0))))) + ((ph_vlng is l) + ((1)) + ((ph_vfront is 1) + ((num2end < 2.4) + ((0)) + ((pos is a) + ((num2end < 3.3) ((sylpos < 2.3) ((1)) ((0))) ((0))) + ((sylpos < 3.2) + ((num2end < 3.3) ((0)) ((pos is v) ((1)) ((0)))) + ((0))))) + ((0))))) + ((num2end < 1.5) + ((pos is n) + ((0)) + ((sylpos < 2.4) + ((pos is v) + ((1)) + ((ph_vlng is d) + ((ph_vheight is 2) ((ph_vfront is 1) ((1)) ((0))) ((0))) + ((1)))) + ((ph_vlng is d) + ((sylpos < 3.3) + ((pos is v) + ((ph_vheight is 2) ((ph_vfront is 1) ((0)) ((1))) ((0))) + ((0))) + ((0))) + ((ph_vheight is 2) + ((1)) + ((ph_vrnd is +) ((1)) ((ph_vlng is l) ((0)) ((1)))))))) + ((ph_vlng is d) + ((pos is v) + ((sylpos < 2.4) ((1)) ((0))) + ((ph_vfront is 2) + ((pos is n) + ((num2end < 2.4) + ((ph_vrnd is +) + ((0)) + ((sylpos < 2.2) ((1)) ((ph_vheight is 2) ((1)) ((0))))) + ((sylpos < 2.4) ((ph_vheight is 2) ((0)) ((1))) ((0)))) + ((1))) + ((ph_vheight is 2) ((1)) ((ph_vfront is 1) ((0)) ((1)))))) + ((pos is n) + ((num2end < 2.4) + ((ph_vfront is 3) + ((sylpos < 2.3) ((1)) ((ph_vlng is l) ((1)) ((0)))) + ((1))) + ((1))) + ((1))))))))) + +(define (lex_user_unknown_word word feats) + "(lex_user_unknown_word WORD FEATS) +Function called by lexicon when 'function type letter to sound rules +is defined. It is the user's responsibility to defined this function +themselves when they want to deal with unknown words themselves." + (error "lex_user_unknown_word: has not been defined by user")) + +(define (Word utt) +"(Word utt) +Construct (synthesis specific) syllable/segments from Word relation +using current lexicon and specific module." + (let ((rval (apply_method 'Word_Method utt))) + (cond + (rval rval) ;; new style + (t + (Classic_Word utt))))) + +(define (find_oovs vocab oovs) + (let ((fd (fopen vocab "r")) + (ofd (fopen oovs "w")) + (e 0) + (oov 0) + (entry)) + + (while (not (equal? (set! entry (readfp fd)) (eof-val))) + (set! e (+ 1 e)) + (if (not (lex.lookup_all entry)) + (begin + (set! oov (+ 1 oov)) + (format ofd "%l\n" (lex.lookup entry nil)))) + ) + (format t ";; %d words %d oov %2.2f oov_rate\n" + e oov (/ (* oov 100.0) e)) + ) +) + + +(provide 'lexicons) + diff --git a/lib/lts.scm b/lib/lts.scm new file mode 100644 index 0000000..23c2dad --- /dev/null +++ b/lib/lts.scm @@ -0,0 +1,212 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1998 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Functions specific to supporting a trained LTS rules +;;; + +(define (lts_rules_predict word feats) + (let ((dcword (downcase word)) + (syls) (phones)) + (if (string-matches dcword "[a-z]*") + (begin + (set! phones + (cdr (reverse (cdr (reverse (lts_predict dcword)))))) + (set! phones (add_lex_stress word feats phones)) + (set! syls (lex.syllabify.phstress phones)) +;; (set! syls (add_lex_stress word syls)) + ) + (set! syls nil)) + (format t "word %l phones %l\n" word syls) + (list word nil syls))) + +;(define (add_lex_stress word syls) +; (cond +; ((> (length syls) 1) +; (set-car! (cdr (nth (- (length syls) 2) syls)) 1)) +; ((word-is-content word english_guess_pos) +; (set-car! (cdr (car syls)) 1))) +; syls) + +(define (word-is-content word guess_pos) + (cond + ((null guess_pos) + t) + ((member_string word (cdr (car guess_pos))) + nil) + (t + (word-is-content word (cdr guess_pos))))) + +(defvar lts_pos nil) + +(define (lts_predict word rules) + "(lts_predict word rules) +Return list of phones related to word using CART trees." + (let ((utt (make_let_utt (enworden (wordexplode word))))) + (predict_phones utt rules) + (cdr (reverse (cdr (reverse ;; remove #'s + (mapcar + (lambda (p) (intern (item.name p))) + (utt.relation.items utt 'PHONE)))))) + ) +) + +(define (wordexplode lets) + (if (consp lets) + lets + (symbolexplode lets))) + +(define (make_let_utt letters) +"(make_let_utt letters) +Build an utterances from th4ese letters." + (let ((utt (Utterance Text ""))) + (utt.relation.create utt 'LTS) + (utt.relation.create utt 'LETTER) + (utt.relation.create utt 'PHONE) + ;; Create letter stream + (mapcar + (lambda (l) + (let ((lsi (utt.relation.append utt 'LETTER))) + (item.set_feat lsi "pos" lts_pos) + (item.set_name lsi l))) + letters) + utt)) + +(define (predict_phones utt rules) + "(predict_phones utt) +Predict phones using CART." + (add_new_phone utt (utt.relation.first utt 'LETTER) '#) + (mapcar + (lambda (lsi) + (let ((tree (car (cdr (assoc_string (item.name lsi) rules))))) + (if (not tree) + (format t "failed to find tree for %s\n" (item.name lsi)) + (let ((p (wagon_predict lsi tree))) +; (format t "predict %s %s\n" (item.name lsi) p) + (cond + ((string-matches p ".*-.*-.*-.*") ; a quad one + (add_new_phone utt lsi (string-before p "-")) + (add_new_phone utt lsi (string-before (string-after p "-") "-")) + (add_new_phone utt lsi (string-before (string-after (string-after p "-") "-") "-")) + (add_new_phone utt lsi (string-after (string-after (string-after p "-") "-") "-"))) + ((string-matches p ".*-.*-.*") ; a triple one + (add_new_phone utt lsi (string-before p "-")) + (add_new_phone utt lsi (string-before (string-after p "-") "-")) + (add_new_phone utt lsi (string-after (string-after p "-") "-"))) + ((string-matches p ".*-.*");; a double one + (add_new_phone utt lsi (string-before p "-")) + (add_new_phone utt lsi (string-after p "-"))) + (t + (add_new_phone utt lsi p))))))) + (reverse (cdr (reverse (cdr (utt.relation.items utt 'LETTER)))))) + (add_new_phone utt (utt.relation.last utt 'LETTER) '#) + utt) + +(define (add_new_phone utt lsi p) + "(add_new_phone utt lsi p) +Add new phone linking to letter, ignoreing it if its _epsilon_." + (if (not (equal? p '_epsilon_)) + (let ((psi (utt.relation.append utt 'PHONE))) + (item.set_name psi p) + (item.relation.append_daughter + (utt.relation.append utt 'LTS lsi) + 'LTS psi) + ))) + +(define (enworden lets) + (cons '# (reverse (cons '# (reverse lets))))) + +;;; Lexical stress assignment +;;; + +(define (add_lex_stress word pos phones tree) + "(add_lex_stress word syls) +Predict lexical stress by decision tree." + (let ((utt (Utterance Text "")) + (si) + (nphones)) + (utt.relation.create utt 'Letter) + (set! si (utt.relation.append utt 'Letter)) + (item.set_feat si 'pos pos) + (item.set_feat si 'numsyls (count_syls phones)) + (item.set_feat si 'sylpos 1) + (set! nphones (add_lex_stress_syl phones si tree)) +; (format t "%l\n" phones) +; (format t "%l\n" nphones) + nphones)) + +(define (count_syls phones) + (cond + ((null phones) 0) + ((string-matches (car phones) "[aeiou@].*") + (+ 1 (count_syls (cdr phones)))) + (t (count_syls (cdr phones))))) + +(define (add_lex_stress_syl phones si tree) + "(add_lex_stress_syl phones si tree) +Add lexical stressing." + (cond + ((null phones) nil) + ((string-matches (car phones) "[aeiou@].*") + (item.set_feat si 'phone (car phones)) + (item.set_feat si 'name (car phones)) + (item.set_feat si 'num2end + (- (+ 1 (item.feat si 'numsyls)) + (item.feat si 'sylpos))) + (set! stress (wagon_predict si tree)) + (item.set_feat si 'sylpos + (+ 1 (item.feat si 'sylpos))) + (cons + (if (not (string-equal stress "0")) + (string-append (car phones) stress) + (car phones)) + (add_lex_stress_syl (cdr phones) si tree))) + (t + (cons + (car phones) + (add_lex_stress_syl (cdr phones) si tree))))) + +;;; Morphological analysis + + +;(define (wfst_stemmer) +; (wfst.load 'stemmer "/home/awb/projects/morpho/engstemmer.wfst") +; (wfst.load 'stemmerL "/home/awb/projects/morpho/engstemmerL.wfst") +; t) + +;(define (stem word) +; (wfst.transduce 'stemmer (enworden (symbolexplode word)))) + +;(define (stemL word) +; (wfst.transduce 'stemmerL (enworden (symbolexplode word)))) + +(provide 'lts) diff --git a/lib/lts_build.scm b/lib/lts_build.scm new file mode 100644 index 0000000..63567d9 --- /dev/null +++ b/lib/lts_build.scm @@ -0,0 +1,723 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1998 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Functions for building LTS rules sets from lexicons +;;; +;;; + +(defvar pl-table nil) + +(define (allaligns phones letters) + "(cummulate phones lets) +Aligns all possible ways for these strings." + (cond + ((null letters) + ;; (wrongly) assume there are never less letters than phones + (if phones + (format t "wrong end: %s\n" word)) + nil) + ((null phones) + nil) + (t + (if (< (length phones) (length letters)) + (begin + (cummulate '_epsilon_ (car letters)) + (allaligns phones (cdr letters)))) + (cummulate (car phones) (car letters)) + (allaligns (cdr phones) (cdr letters))))) + +(define (valid-pair phone letter) + "(valid-pair phone letter) +If predefined to be valid." + (let ((entry1 (assoc_string letter pl-table))) + (if entry1 + (assoc_string phone (cdr entry1)) + nil))) + +(define (valid-pair-e phone nphone letter) + "(valid-pair-e phone letter) +Special cases for when epsilon may be inserted before letter." + (let ((ll (assoc_string letter pl-table)) + (pp (intern (string-append phone "-" nphone)))) + (assoc_string pp (cdr ll)))) + +(define (find-aligns phones letters) + "(find-aligns phones letters) +Find all feasible alignments." + (let ((r nil)) + (cond + ((and (null (cdr phones)) (null (cdr letters)) + (equal? (car phones) (car letters)) + (equal? '# (car phones))) + (list (list (cons '# '#)))) ;; valid end match + (t + (if (valid-pair '_epsilon_ (car letters)) + (set! r (mapcar + (lambda (p) + (cons (cons '_epsilon_ (car letters)) p)) + (find-aligns phones (cdr letters))))) + (if (valid-pair (car phones) (car letters)) + (set! r + (append r + (mapcar + (lambda (p) + (cons (cons (car phones) (car letters)) p)) + (find-aligns (cdr phones) (cdr letters)))))) + ;; Hmm, change this to always check doubles + (if (valid-pair-e (car phones) (car (cdr phones)) (car letters)) + (set! r + (append r + (mapcar + (lambda (p) + (cons (cons (intern (format nil "%s-%s" + (car phones) + (car (cdr phones)))) + (car letters)) p)) + (find-aligns (cdr (cdr phones)) + (cdr letters)))))) + r)))) + +(define (findallaligns phones letters) + (let ((a (find-aligns phones letters))) + (if (null a) + (begin + (set! failedaligns (+ 1 failedaligns)) + (format t "failed: %l %l\n" letters phones))) + a)) + +(define (cummulate phone letter) + "(cummulate phone letter) +record the alignment of this phone and letter." + (if (or (equal? phone letter) + (and (not (equal? phone '#)) + (not (equal? letter '#)))) + (let ((entry1 (assoc_string letter pl-table)) + score) + (if (equal? phone '_epsilon_) + (set! score 0.1) + (set! score 1)) + (if entry1 + (let ((entry2 (assoc_string phone (cdr entry1)))) + (if entry2 + (set-cdr! entry2 (+ score (cdr entry2))) + (set-cdr! entry1 (cons (cons phone 1) (cdr entry1))))) + (set! pl-table + (cons + (cons letter + (list (cons phone score))) + pl-table))) + t))) + +(define (score-pair phone letter) +"(score-pair phone letter) +Give score for this particular phone letter pair." + (let ((entry1 (assoc_string letter pl-table))) + (if entry1 + (let ((entry2 (assoc_string phone (cdr entry1)))) + (if entry2 + (cdr entry2) + 0)) + 0))) + +(define (cummulate-aligns aligns) + (mapcar + (lambda (a) + (mapcar + (lambda (p) + (cummulate (car p) (cdr p))) + a)) + aligns) + t) + +(define (cummulate-pairs trainfile) + "(cummulate-pairs trainfile) +Build cummulatation table from allowable alignments in trainfile." + (set! failedaligns 0) + (set! allaligns 0) + (if (not pl-table) + (set! pl-table + (mapcar + (lambda (l) + (cons (car l) (mapcar (lambda (c) (cons c 0)) (cdr l)))) + allowables))) + (let ((fd (fopen trainfile "r")) + (c 0) (d 0) + (entry)) + (while (not (equal? (set! entry (readfp fd)) (eof-val))) + (if (equal? c 1000) + (begin + (format t "ENTRY: %d %l\n" (set! d (+ 1000 d)) entry) + (set! c 0))) + (set! word (car entry)) + (cummulate-aligns + (findallaligns + (enworden (car (cdr (cdr entry)))) + (enworden (wordexplode (car entry))))) + (set! allaligns (+ 1 allaligns)) + (format t "aligned %d\n" allaligns) + (set! c (+ 1 c))) + (fclose fd) + (format t "failedaligns %d/%d\n" failedaligns allaligns) + )) + +(define (find_best_alignment phones letters) + "(find_best_alignment phones letters) +Find the alignement containg the most frequent alignment pairs." + ;; hackily do this as a global + (set! fba_best_score 0) + (set! fba_best nil) + (find-best-align phones letters nil 0) + fba_best +) + + +(define (find-best-align phones letters path score) + "(find-best-align phones letters) +Find all feasible alignments." + (cond + ((null letters) + (if (> score fba_best_score) + (begin + (set! fba_best_score score) + (set! fba_best (reverse path)))) + nil) + (t + (if (valid-pair '_epsilon_ (car letters)) + (find-best-align phones (cdr letters) + (cons (cons '_epsilon_ (car letters)) path) + (+ score (score-pair '_epsilon_ (car letters))))) + (if (valid-pair (car phones) (car letters)) + (find-best-align (cdr phones) (cdr letters) + (cons (cons (car phones) (car letters))path) + (+ score (score-pair (car phones) (car letters))))) + (if (valid-pair-e (car phones) (car (cdr phones)) (car letters)) + (find-best-align (cdr (cdr phones)) (cdr letters) + (cons (cons (intern (format nil "%s-%s" + (car phones) + (car (cdr phones)))) + (car letters)) + path) + (+ score (score-pair + (intern (format nil "%s-%s" + (car phones) + (car (cdr phones)))) + (car letters)))))))) + +(define (align_and_score phones letters path score) + "(align_and_score phones lets) +Aligns all possible ways for these strings." + (cond + ((null letters) + (if (> score fba_best_score) + (begin + (set! fba_best_score score) + (set! fba_best (reverse path)))) + nil) + (t + (if (< (length phones) (length letters)) + (align_and_score + phones + (cdr letters) + (cons '_epsilon_ path) + (+ score + (score-pair '_epsilon_ (car letters))))) + (align_and_score + (cdr phones) + (cdr letters) + (cons (car phones) path) + (+ score + (score-pair (car phones) (car letters))))))) + +(define (aligndata file ofile) + (let ((fd (fopen file "r")) + (ofd (fopen ofile "w")) + (c 1) + (entry)) + (while (not (equal? (set! entry (readfp fd)) (eof-val))) + (set! lets (enworden (wordexplode (car entry)))) + (set! bp (find_best_alignment + (enworden (car (cdr (cdr entry)))) + lets)) + (if (not bp) + (format t "align failed: %l\n" entry) + (save_info (car (cdr entry)) bp ofd)) + (set! c (+ 1 c))) + (fclose fd) + (fclose ofd))) + +(define (enworden lets) + (cons '# (reverse (cons '# (reverse lets))))) + +(define (wordexplode lets) + (if (consp lets) + lets + (symbolexplode lets))) + +(define (save_info pos bp ofd) + "(save_info pos bp ofd) +Cut out one expensive step and 50M of diskspace and just save it +in a simpler format." + (format ofd "( ( ") + (mapcar + (lambda (l) + (if (not (string-equal "#" (cdr l))) + (format ofd "%l " (cdr l)))) + bp) + (format ofd ") %l" pos) + (mapcar + (lambda (l) + (if (not (string-equal "#" (car l))) + (format ofd " %s" (car l)))) + bp) + (format ofd " )\n")) + +(define (normalise-table pl-table) + "(normalise-table pl-table) +Change scores into probabilities." + (mapcar + (lambda (s) + (let ((sum (apply + (mapcar cdr (cdr s))))) + (mapcar + (lambda (p) + (if (equal? sum 0) + (set-cdr! p 0) + (set-cdr! p (/ (cdr p) sum)))) + (cdr s)))) + pl-table) + t) + +(define (save-table pre) + (normalise-table pl-table) + (set! fd (fopen (string-append pre "pl-tablesp.scm") "w")) + (format fd "(set! pl-table '\n") + (pprintf pl-table fd) + (format fd ")\n") + (fclose fd) + t) + +(define (build-feat-file alignfile featfile) +"(build-feat-file alignfile featfile) +Build a feature file from the given align file. The feature +file contain predicted phone, and letter with 3 preceding and +3 succeeding letters." + (let ((fd (fopen alignfile "r")) + (ofd (fopen featfile "w")) + (entry) + (pn) + (sylpos 1)) + (while (not (equal? (set! entry (readfp fd)) (eof-val))) +;; (format t "read: %l\n" entry) + (set! lets (append '(0 0 0 0 #) (wordexplode (car entry)) + '(# 0 0 0 0))) + (set! phones (cdr (cdr entry))) + (set! pn 5) + (mapcar + (lambda (p) + (format ofd + "%s %s %s %s %s %s %s %s %s %s %s\n" + p + (nth (- pn 4) lets) + (nth (- pn 3) lets) + (nth (- pn 2) lets) + (nth (- pn 1) lets) + (nth pn lets) + (nth (+ pn 1) lets) + (nth (+ pn 2) lets) + (nth (+ pn 3) lets) + (nth (+ pn 4) lets) + (cond + ((not (consp (car (cdr entry)))) + (car (cdr entry))) + ((not (consp (caar (cdr entry)))) + (caar (cdr entry))) + (t nil)) + ;; sylpos + ;; numsyls + ;; num2end + ) + (set! pn (+ 1 pn))) + phones)) + (fclose fd) + (fclose ofd)) +) + +(define (merge_models name filename allowables) +"(merge_models name filename) +Merge the models into a single list of cart trees as a variable +named by name, in filename." + (require 'cart_aux) + (let (trees fd) + (set! trees nil) + (set! lets (mapcar car allowables)) + (while lets + (if (probe_file (format nil "lts.%s.tree" (car lets))) + (begin + (format t "%s\n" (car lets)) + (set! tree (car (load (format nil "lts.%s.tree" (car lets)) t))) + (set! tree (cart_simplify_tree2 tree nil)) + (set! trees + (cons (list (car lets) tree) trees)))) + (set! lets (cdr lets))) + (set! trees (reverse trees)) + (set! fd (fopen filename "w")) + (format fd ";; LTS rules \n") + (format fd "(set! %s '(\n" name) + (mapcar + (lambda (tree) (pprintf tree fd)) + trees) + (format fd "))\n") + (fclose fd)) +) + +(define (lts_testset file cartmodels) + "(lts_testset file cartmodels) +Test an aligned lexicon file against a set of cart trees. Prints out +The number of letters correct (for each letter), total number of +letters correct and the total number of words correct. cartmodels is +the structure as saved by merge_models." + (let ((fd (fopen file "r")) + (entry) + (wordcount 0) + (correctwords 0) + (phonecount 0) + (correctphones 0)) + (while (not (equal? (set! entry (readfp fd)) (eof-val))) + (let ((letters (enworden (wordexplode (car entry)))) + (phones (enworden (cdr (cdr entry)))) + (pphones)) + (set! wordcount (+ 1 wordcount)) + (set! pphones (gen_cartlts letters (car (cdr entry)) cartmodels)) +; (set! pphones +; (or ; unwind-protect +; (gen_vilts letters (car (cdr entry)) +; cartmodels wfstname) +; nil)) + (if (equal? (ph-normalize pphones) (ph-normalize phones)) + (set! correctwords (+ 1 correctwords)) + (or nil + (format t "failed %l %l %l %l\n" (car entry) (car (cdr entry)) phones pphones))) + (count_correct_letters ;; exclude #, cause they're always right + (cdr letters) + (cdr phones) + (cdr pphones)) + (set! phonecount (+ (length (cdr (cdr letters))) phonecount)) + )) + (fclose fd) + (mapcar + (lambda (linfo) + (format t "%s %d correct %d (%2.2f)\n" + (car linfo) (car (cdr linfo)) + (car (cdr (cdr linfo))) + (/ (* (car (cdr (cdr linfo))) 100) (car (cdr linfo)))) + (set! correctphones (+ correctphones (car (cdr (cdr linfo)))))) + correct_letter_table) + (format t "phones %d correct %d (%2.2f)\n" + phonecount correctphones (/ (* correctphones 100) phonecount)) + (format t "words %d correct %d (%2.2f)\n" + wordcount correctwords (/ (* correctwords 100) wordcount)) + (format t "tree model has %d nodes\n" + (apply + (mapcar (lambda (a) (cart_tree_node_count (car (cdr a)))) + cartmodels))) + )) + +(define (cart_tree_node_count tree) + "(tree_node_count tree) +Count the number nodes (questions and leafs) in the given CART tree." + (cond + ((cdr tree) + (+ 1 + (cart_tree_node_count (car (cdr tree))) + (cart_tree_node_count (car (cdr (cdr tree)))))) + (t + 1))) + +(defvar correct_letter_table + (mapcar + (lambda (l) (list l 0 0)) + '(a b c d e f g h i j k l m n o p q r s t u v w x y z)) + "correct_letter_table +List used to cummulate the number of correct (and incorrect) letter to +phone predictions. This list will be extended if there are more letters +in your alphabet, though it doesn't take a fairly western european +view of the alphabet, but you can change this yourself is necessary.") + +(define (count_correct_letters lets phs pphs) + "(count_correct_letters lets phs pphs) +Count which letters have the correct phone prediction. Cummulate this +is a per letter table." + (cond + ((or (null phs) (null pphs) (null lets)) + (format t "misaligned entry\n") + nil) + ((and (null (cdr lets)) (null (cdr phs)) (null (cdr pphs))) + nil) ;; omit final # + (t + (let ((letinfo (assoc_string (car lets) correct_letter_table))) + (if (not letinfo) + (set! correct_letter_table + (append correct_letter_table + (list (set! letinfo (list (car lets) 0 0)))))) + (set-car! (cdr letinfo) (+ 1 (car (cdr letinfo)))) ;; total + (if (equal? (car phs) (car pphs)) ;; correct + (set-car! (cdr (cdr letinfo)) (+ 1 (car (cdr (cdr letinfo)))))) + (count_correct_letters (cdr lets) (cdr phs) (cdr pphs)))))) + +(define (ph-normalize ph) + (cond + ((null ph) nil) + ((string-equal "_epsilon_" (car ph)) + (ph-normalize (cdr ph))) + ((string-matches (car ph) ".*-.*") + (cons + (string-before (car ph) "-") + (cons + (string-after (car ph) "-") + (ph-normalize (cdr ph))))) + (t + (cons (car ph) (ph-normalize (cdr ph)))))) + +(define (make_let_utt_p letters pos) +"(make_let_utt letters) +Build an utterances from th4ese letters." + (let ((utt (Utterance Text ""))) + (utt.relation.create utt 'LTS) + (utt.relation.create utt 'LETTER) + (utt.relation.create utt 'PHONE) + ;; Create letter stream + (mapcar + (lambda (l) + (let ((lsi (utt.relation.append utt 'LETTER))) + (item.set_name lsi l) + (item.set_feat lsi "pos" pos))) + letters) + utt)) + +(define (gen_vilts letters pos cartmodels ngram) + "(get_vilts letters pos cartmodels ngram) +Use cart plus ngrams in viterbi search." + (require 'lts) + (let ((utt (make_let_utt_p letters pos))) + (set! gen_vit_params + (list + (list 'Relation "LETTER") + (list 'return_feat "phone") + (list 'p_word "#") + (list 'pp_word "0") + (list 'ngramname ngram) +; (list 'wfstname ngram) + (list 'cand_function 'lts_cand_function))) + (Gen_Viterbi utt) + (mapcar + (lambda (lsi) + (intern (item.feat lsi "phone"))) + (utt.relation.items utt 'LETTER)))) + +(define (gen_cartlts letters pos cartmodels) + "(get_cartlts letters cartmodels) +Generate the full list of predicted phones, including +epsilon and unexpanded multi-phones." + (require 'lts) + (let ((utt (make_let_utt_p letters pos))) + (enworden + (mapcar + (lambda (lsi) + (let ((tree (car (cdr (assoc_string (item.name lsi) cartmodels)))) + (p)) + (if (not tree) + (begin + (format t "failed to find tree for %s\n" (item.name lsi)) + nil) + (begin + (set! p (wagon_predict lsi tree)) + (item.set_feat lsi "val" p) + p)))) + (reverse (cdr (reverse (cdr (utt.relation.items utt 'LETTER))))))))) + +(define (reduce_lexicon entryfile exceptionfile lts_function) + "(reduce_lexicon entryfile exceptionfile lts_function) +Look up each word in entryfile using the current lexicon, if the entry +doesn't match save it in the exception file. This is a way of reducing +the lexicon based on a letter to sound model (and lexical stress +model, if appropriate)." + (let ((fd (fopen entryfile "r")) + (ofd (fopen exceptionfile "w")) + (entry) + (wordcount 0) + (correctwords 0)) + (while (not (equal? (set! entry (readfp fd)) (eof-val))) + (if (and (consp entry) + (> (length entry) 1)) + (let ((lts (lts_function (car entry) (car (cdr entry)))) + (encount (lex.entrycount (car entry)))) + (set! wordcount (+ 1 wordcount)) + (if (and (equal? (nth 2 entry) (nth 2 lts)) + (< encount 2)) + (set! correctwords (+ 1 correctwords)) + (format ofd "%l\n" entry)) + ))) + (fclose fd) + (fclose ofd) + (format t "words %d correct %d (%2.2f)\n" + wordcount correctwords (/ (* correctwords 100) wordcount)) + )) + +(define (dump-flat-entries infile outfile ltype) + (let ((ifd (fopen infile "r")) + (ofd (fopen outfile "w")) + clength + entry) +; (set! entry (readfp ifd)) +; (if (or (consp entry) (not (string-equal entry "MNCL"))) +; (begin +; (format t "Expected MNCL at start of file: not a compiled lexicon\n") +; (exit))) + (while (not (equal? (set! entry (readfp ifd)) (eof-val))) + (cond + ((not (consp entry)) + t) ;; not an entry + ((string-equal ltype "utf8") + (set! clength (length (utf8explode (car entry))))) + (t + (set! clength (length (car entry))))) + (cond + ((not (consp entry)) + t) ;; not an entry + ((and ;(string-matches (car entry) "...*") + ;(< clength 14) + (not (string-matches (car entry) ".*'.*")) ;; no quotes + (car (cddr entry))) ;; non-nil pronounciation + (begin + (cond + ((string-equal ltype "utf8") + (format ofd + "( %l %l (" + (utf8explode (car entry)) + (cadr entry))) + ((string-equal ltype "asis") + (format ofd + "( \"%s\" %l (" + (car entry) + (cadr entry))) + (t + (format ofd + "( \"%s\" %l (" + (downcase (car entry)) + (cadr entry)))) + (if (consp (car (car (cddr entry)))) + (begin ;; it is syllabified) + (mapcar + (lambda (syl) + (mapcar + (lambda (seg) + (cond + ((string-matches seg "[aeiouAEIOU@].*") + (format ofd "%s " (string-append seg (cadr syl)))) + (t + (format ofd "%s " seg)))) + (car syl))) + (car (cddr entry)))) + (begin ;; it is already flat + (mapcar + (lambda (p) + (format ofd "%s " p)) + (car (cddr entry))) + )) + (format ofd "))\n"))) + (t nil))) + (fclose ifd) + (fclose ofd))) + +(define (dump-lets-phones infile) + "(dump-lets-phones infile) +Dump all the letters to alllets.out and phones to allphones.out for processing. +This expects an external script to sort and uniquify them. This is done +in scheme so we can get utf8/non-utf8 to be easy." + (let ((ifd (fopen infile "r")) + (lfd (fopen "alllets.out" "w")) + (apfd (fopen "allphones.out" "w")) + (pfd (fopen "let2phones.out" "w")) + entry) + (while (not (equal? (set! entry (readfp ifd)) (eof-val))) + (mapcar + (lambda (l) + (format lfd "%s\n" l) + (format pfd "%s " l) + (mapcar + (lambda (p) (format pfd "%s " p)) + (car (cddr entry))) + (format pfd "\n")) + (wordexplode (car entry))) + (mapcar + (lambda (p) (format apfd "%s " p)) + (car (cddr entry))) + (format apfd "\n") + ) + (fclose ifd) + (fclose lfd) + (fclose pfd) + (fclose apfd) + t)) + +(define (dump-flat-entries-all infile outfile) + "(dump-flat-entries-all infile outfile) +Do this for *all* entries not just ones with more than three chars." + (let ((ifd (fopen infile "r")) + (ofd (fopen outfile "w")) + entry) + (readfp ifd) ;; skip "MNCL" + (while (not (equal? (set! entry (readfp ifd)) (eof-val))) + (if (consp entry) + (begin + (format ofd + "( \"%s\" %s (" + (downcase (car entry)) + (cadr entry)) + (mapcar + (lambda (syl) + (mapcar + (lambda (seg) + (cond +; ((string-equal seg "ax") +; (format ofd "%s " seg)) + ((string-matches seg "[aeiouAEIOU@].*") + (format ofd "%s " (string-append seg (cadr syl)))) + (t + (format ofd "%s " seg)))) + (car syl))) + (car (cddr entry))) + (format ofd "))\n")))) + (fclose ifd) + (fclose ofd))) + +(provide 'lts_build) + diff --git a/lib/mbrola.scm b/lib/mbrola.scm new file mode 100644 index 0000000..77d1e42 --- /dev/null +++ b/lib/mbrola.scm @@ -0,0 +1,103 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Support for MBROLA as an external module. +;;; + +;;; You might want to set this in your sitevars.scm +(defvar mbrola_progname "/cstr/external/mbrola/mbrola" + "mbrola_progname + The program name for mbrola.") +(defvar mbrola_database "fr1" + "mbrola_database + The name of the MBROLA database to usde during MBROLA Synthesis.") + +(define (MBROLA_Synth utt) + "(MBROLA_Synth UTT) + Synthesize using MBROLA as external module. Basically dump the info + from this utterance. Call MBROLA and reload the waveform into utt. + [see MBROLA]" + (let ((filename (make_tmp_filename)) + ) + (save_segments_mbrola utt filename) + (system (string-append mbrola_progname " " + mbrola_database " " + filename " " + filename ".au")) + (utt.import.wave utt (string-append filename ".au")) + (apply_hooks after_synth_hooks utt) + (delete-file filename) + (delete-file (string-append filename ".au")) + utt)) + +(define (save_segments_mbrola utt filename) + "(save_segments_mbrola UTT FILENAME) + Save segment information in MBROLA format in filename. The format is + phone duration (ms) [% position F0 target]*. [see MBROLA]" + (let ((fd (fopen filename "w"))) + (mapcar + (lambda (segment) + (save_seg_mbrola_entry + (item.feat segment 'name) + (item.feat segment 'segment_start) + (item.feat segment 'segment_duration) + (mapcar + (lambda (targ_item) + (list + (item.feat targ_item "pos") + (item.feat targ_item "f0"))) + (item.relation.daughters segment 'Target)) ;; list of targets + fd)) + (utt.relation.items utt 'Segment)) + (fclose fd))) + +(define (save_seg_mbrola_entry name start dur targs fd) + "(save_seg_mbrola_entry ENTRY NAME START DUR TARGS FD) + Entry contains, (name duration num_targs start 1st_targ_pos 1st_targ_val)." + (format fd "%s %d " name (nint (* dur 1000))) + (if targs ;; if there are any targets + (mapcar + (lambda (targ) ;; targ_pos and targ_val + (let ((targ_pos (car targ)) + (targ_val (car (cdr targ)))) + + (format fd "%d %d " + (nint (* 100 (/ (- targ_pos start) dur))) ;; % pos of target + (nint (parse-number targ_val))) ;; target value + )) + targs)) + (terpri fd) + (terpri fd) +) + +(provide 'mbrola) diff --git a/lib/mettree.scm b/lib/mettree.scm new file mode 100644 index 0000000..638ded1 --- /dev/null +++ b/lib/mettree.scm @@ -0,0 +1,88 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1998 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Some (experimental) data for investigating metrical trees +;;; + +;;; Set up generation of metrical tree, this includes getting +;;; a syntactic parse +;;; +;;; Use as +;;; (set! utt1 (metsynth (Utterance Text "For afternoon tea"))) +;;; (utt.relation_tree utt1 'MetricalTree) + +(require 'scfg) +(set! scfg_grammar (load (path-append libdir "scfg_wsj_wp20.gram") t)) + +(define (mettext utt) + (Initialize utt) + (Text utt) + (Token_POS utt) + (Token utt) + (POS utt) + (print "here1") + (Phrasify utt) + (print "here2") + (ProbParse utt) + (print "here3") + (auto_metrical_tree utt) +) + +(define (metsynth utt) + (mettext utt) + (Wave_Synth utt) +) + +;;; Assumed everything is using Roger diphones + +;;(lex.create "cmu_mettree") +;;;(lex.set.phoneset "radio_phones") +;;(lex.set.phoneset "radio_phones") + +(define (setup_cmu_mettree_lex) + "(setup_cmu_mettreelex) +Lexicon derived from the CMU lexicon (cmudict-0.1), around 100,000 entries, +in the radio phoneset (sort of darpa-like)." + (if (not (member_string "cmu_mettree" (lex.list))) + (begin + (print "making cmu lexicon") + (lex.create "cmu_mettree") + (lex.set.compile.file (path-append lexdir "cmu_mettree_lex.out")) + (lex.set.phoneset "radio") + (require 'lts__us) ;; US English letter to sound rules + (lex.set.lts.method 'lts_rules) + (lex.set.lts.ruleset 'nrl_us)))) + +(provide 'mettree) + + diff --git a/lib/module_description.scm b/lib/module_description.scm new file mode 100644 index 0000000..0cf426f --- /dev/null +++ b/lib/module_description.scm @@ -0,0 +1,117 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Handle module descriptions. +;;; + +(defvar *module-descriptions* nil + "*module-descriptions* + An association list recording the description objects for proclaimed + modules.") + +(define (set_module_description mod desc) + "(set_module_description MOD DESC) + Set the description for the module named MOD." + (let ((entry (assoc mod *module-descriptions*))) + (if entry + (set-cdr! entry (cons desc nil)) + (set! *module-descriptions* (cons (cons mod (cons desc nil)) + *module-descriptions*)) + ) + ) + ) + +(define (module_description mod) + "(module_description MOD) + Returns the description record of the module named by symbol MOD" + (let ((entry (assoc mod *module-descriptions*))) + (if entry + (car (cdr entry)) + nil + ) + ) + ) + +(defmac (proclaim form) + "(proclaim NAME &opt DESCRIPTION...) + Anounce the availability of a module NAME. DESCRIPTION + is a description in a fixed format." + (let ((name (car (cdr form))) + (description (cdr form)) + ) + (list 'proclaim-real (list 'quote name) (list 'quote description)) + ) + ) + +(define (proclaim-real name description) + (set! *modules* (cons name *modules*)) +; (if description +; (set_module_description name (create_module_description description)) +; ) + ) + +(define (describe_module mod) + "(describe_module MOD) + Describe the module named by the symbol MOD." + + (let ((entry (module_description mod))) + (format t "---------------------\n") + (if entry + (print_module_description entry) + (format t "No description for %l\n" mod) + ) + (format t "---------------------\n") + ) + ) + +(define (describe_all_modules) + "(describe_all_modules) + Print descriptions of all proclaimed modules" + (format t "---------------------\n") + (let ((p *module-descriptions*)) + (while p + (print_module_description (car (cdr (car p)))) + (format t "---------------------\n") + (set! p (cdr p)) + ) + ) + ) + +(proclaim + module_description 1.1 + "CSTR" "Richard Caley " + ( "Handle module descriptions from C++ and from Scheme." + ) + ) + +(provide 'module_description) diff --git a/lib/mrpa_allophones.scm b/lib/mrpa_allophones.scm new file mode 100644 index 0000000..fbabf36 --- /dev/null +++ b/lib/mrpa_allophones.scm @@ -0,0 +1,111 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; ;; +;; Centre for Speech Technology Research ;; +;; University of Edinburgh, UK ;; +;; Copyright (c) 1996,1997 ;; +;; All Rights Reserved. ;; +;; ;; +;; Permission is hereby granted, free of charge, to use and distribute ;; +;; this software and its documentation without restriction, including ;; +;; without limitation the rights to use, copy, modify, merge, publish, ;; +;; distribute, sublicense, and/or sell copies of this work, and to ;; +;; permit persons to whom this work is furnished to do so, subject to ;; +;; the following conditions: ;; +;; 1. The code must retain the above copyright notice, this list of ;; +;; conditions and the following disclaimer. ;; +;; 2. Any modifications must be clearly marked as such. ;; +;; 3. Original authors' names are not deleted. ;; +;; 4. The authors' names are not used to endorse or promote products ;; +;; derived from this software without specific prior written ;; +;; permission. ;; +;; ;; +;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;; THIS SOFTWARE. ;; +;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; A definition of the extended mrpa phone set used for some diphone sets +;; + +(defPhoneSet + mrpa_allophones + ;;; Phone Features + (;; vowel or consonant + (vc + -) + ;; vowel length: short long dipthong schwa + (vlng s l d a 0) + ;; vowel height: high mid low + (vheight 1 2 3 -) + ;; vowel frontness: front mid back + (vfront 1 2 3 -) + ;; lip rounding + (vrnd + -) + ;; consonant type: stop fricative affricative nasal liquid + (ctype s f a n l 0) + ;; place of articulation: labial alveolar palatal labio-dental + ;; dental velar + (cplace l a p b d v 0) + ;; consonant voicing + (cvox + -) + ) + ;; Phone set members + ( + (uh + s 2 3 - 0 0 +) + (e + s 2 1 - 0 0 +) + (a + s 3 1 - 0 0 +) + (o + s 3 3 - 0 0 +) + (i + s 1 1 - 0 0 +) + (u + s 1 3 + 0 0 +) + (ii + l 1 1 - 0 0 +) + (uu + l 2 3 + 0 0 +) + (oo + l 3 2 - 0 0 +) + (aa + l 3 1 - 0 0 +) + (@@ + l 2 2 - 0 0 +) + (ai + d 3 1 - 0 0 +) + (ei + d 2 1 - 0 0 +) + (oi + d 3 3 - 0 0 +) + (au + d 3 3 + 0 0 +) + (ou + d 3 3 + 0 0 +) + (e@ + d 2 1 - 0 0 +) + (i@ + d 1 1 - 0 0 +) + (u@ + d 3 1 - 0 0 +) + (@ + a - - - 0 0 +) + (p - 0 - - + s l -) + (t - 0 - - + s a -) + (k - 0 - - + s p -) + (b - 0 - - + s l +) + (d - 0 - - + s a +) + (g - 0 - - + s p +) + (s - 0 - - + f a -) + (z - 0 - - + f a +) + (sh - 0 - - + f p -) + (zh - 0 - - + f p +) + (f - 0 - - + f b -) + (v - 0 - - + f b +) + (th - 0 - - + f d -) + (dh - 0 - - + f d +) + (ch - 0 - - + a a -) + (jh - 0 - - + a a +) + (h - 0 - - + a v -) + (m - 0 - - + n l +) + (n - 0 - - + n d +) + (ng - 0 - - + n v +) + (l - 0 - - + l d +) + (ll - 0 - - + l d +) + (y - 0 - - + l a +) + (r - 0 - - + l p +) + (w - 0 - - + l l +) + (# - 0 - - - 0 0 -) + ) + ) + +(PhoneSet.silences '(#)) + +(provide 'mrpa_allophones) diff --git a/lib/mrpa_durs.scm b/lib/mrpa_durs.scm new file mode 100644 index 0000000..86b14ca --- /dev/null +++ b/lib/mrpa_durs.scm @@ -0,0 +1,136 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; mrpa average phoneme durations from gsw 450 +;;; +(set! phoneme_durations +'( +(u 0.067) +(i@ 0.146) +(h 0.067) +(uu 0.105) +(uh 0.090) +(v 0.053) +(oo 0.145) +(i 0.060) +(jh 0.097) +(ii 0.095) +(w 0.066) +(k 0.088) +(+ 0.036) +(y 0.051) +(l 0.067) +(zh 0.080) +(ng 0.072) +(m 0.070) +(z 0.079) +(## 0.256) +(au 0.162) +(a 0.118) +(n 0.065) +(o 0.102) +(ai 0.156) +(b 0.071) +(ou 0.129) +(ch 0.119) +(p 0.094) +(oi 0.165) +(# 0.040) +(e@ 0.131) +(d 0.052) +(dh 0.032) +(e 0.091) +(r 0.062) +(sh 0.101) +(@@ 0.149) +(ei 0.131) +(f 0.091) +(s 0.093) +(g 0.066) +(u@ 0.120) +(aa 0.173) +(t 0.073) +(th 0.080) +(@ 0.054) +)) + +(set! gsw_durs +'( +(# 0.200 0.100) +(h 0.061 0.028) +(i@ 0.141 0.061) +(u 0.067 0.024) +(uu 0.107 0.044) +(uh 0.087 0.025) +(v 0.051 0.019) +(oo 0.138 0.046) +(i 0.058 0.023) +(ii 0.092 0.035) +(w 0.054 0.023) +(jh 0.094 0.024) +(k 0.089 0.034) +(y 0.048 0.025) +(l 0.056 0.026) +(zh 0.077 0.030) +(ng 0.064 0.024) +(m 0.063 0.021) +(z 0.072 0.029) +(a 0.120 0.036) +(au 0.171 0.046) +(n 0.059 0.025) +(ou 0.134 0.039) +(b 0.073 0.021) +(o 0.094 0.037) +(ai 0.137 0.047) +(ch 0.128 0.039) +(oi 0.183 0.050) +(p 0.101 0.032) +(e@ 0.144 0.061) +(d 0.048 0.021) +(dh 0.031 0.016) +(e 0.092 0.035) +(r 0.053 0.025) +(sh 0.108 0.031) +(f 0.095 0.033) +(@@ 0.147 0.035) +(ei 0.130 0.042) +(s 0.102 0.037) +(u@ 0.140 0.057) +(th 0.093 0.050) +(g 0.064 0.021) +(aa 0.155 0.045) +(t 0.070 0.034) +(@ 0.046 0.020) +)) + +(provide 'mrpa_durs) diff --git a/lib/mrpa_phones.scm b/lib/mrpa_phones.scm new file mode 100644 index 0000000..84e2c17 --- /dev/null +++ b/lib/mrpa_phones.scm @@ -0,0 +1,114 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; ;; +;; Centre for Speech Technology Research ;; +;; University of Edinburgh, UK ;; +;; Copyright (c) 1996,1997 ;; +;; All Rights Reserved. ;; +;; ;; +;; Permission is hereby granted, free of charge, to use and distribute ;; +;; this software and its documentation without restriction, including ;; +;; without limitation the rights to use, copy, modify, merge, publish, ;; +;; distribute, sublicense, and/or sell copies of this work, and to ;; +;; permit persons to whom this work is furnished to do so, subject to ;; +;; the following conditions: ;; +;; 1. The code must retain the above copyright notice, this list of ;; +;; conditions and the following disclaimer. ;; +;; 2. Any modifications must be clearly marked as such. ;; +;; 3. Original authors' names are not deleted. ;; +;; 4. The authors' names are not used to endorse or promote products ;; +;; derived from this software without specific prior written ;; +;; permission. ;; +;; ;; +;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;; THIS SOFTWARE. ;; +;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; A definition of the mrpa phone set +;; + +(defPhoneSet + mrpa + ;;; Phone Features + (;; vowel or consonant + (vc + -) + ;; vowel length: short long dipthong schwa + (vlng s l d a 0) + ;; vowel height: high mid low + (vheight 1 2 3 0) + ;; vowel frontness: front mid back + (vfront 1 2 3 0) + ;; lip rounding + (vrnd + - 0) + ;; consonant type: stop fricative affricate nasal lateral approximant + (ctype s f a n l r 0) + ;; place of articulation: labial alveolar palatal labio-dental + ;; dental velar glottal + (cplace l a p b d v g 0) + ;; consonant voicing + (cvox + - 0) + ) + ;; Phone set members + ( + (uh + s 2 3 - 0 0 0) + (e + s 2 1 - 0 0 0) + (a + s 3 1 - 0 0 0) + (o + s 2 3 + 0 0 0) + (i + s 1 1 - 0 0 0) + (u + s 1 3 + 0 0 0) + (ii + l 1 1 - 0 0 0) + (uu + l 1 3 + 0 0 0) + (oo + l 3 3 + 0 0 0) + (aa + l 3 3 - 0 0 0) + (@@ + l 2 2 - 0 0 0) + (ai + d 3 2 - 0 0 0) + (ei + d 2 1 - 0 0 0) + (oi + d 3 3 + 0 0 0) + (au + d 3 2 + 0 0 0) + (ou + d 2 2 - 0 0 0) + (e@ + d 2 1 - 0 0 0) + (i@ + d 1 1 - 0 0 0) + (u@ + d 3 1 + 0 0 0) + (@ + a 2 2 - 0 0 0) + (p - 0 0 0 0 s l -) + (t - 0 0 0 0 s a -) + (k - 0 0 0 0 s v -) + (b - 0 0 0 0 s l +) + (d - 0 0 0 0 s a +) + (g - 0 0 0 0 s v +) + (s - 0 0 0 0 f a -) + (z - 0 0 0 0 f a +) + (sh - 0 0 0 0 f p -) + (zh - 0 0 0 0 f p +) + (f - 0 0 0 0 f b -) + (v - 0 0 0 0 f b +) + (th - 0 0 0 0 f d -) + (dh - 0 0 0 0 f d +) + (ch - 0 0 0 0 a p -) + (jh - 0 0 0 0 a p +) + (h - 0 0 0 0 f g -) + (m - 0 0 0 0 n l +) + (n - 0 0 0 0 n a +) + (ng - 0 0 0 0 n v +) + (l - 0 0 0 0 l a +) + (y - 0 0 0 0 r p +) + (r - 0 0 0 0 r a +) + (w - 0 0 0 0 r l +) + (# - 0 0 0 0 0 0 -) + ) + ) + +(PhoneSet.silences '(#)) + +(provide 'mrpa_phones) + + + + diff --git a/lib/multisyn/Makefile b/lib/multisyn/Makefile new file mode 100644 index 0000000..ceed122 --- /dev/null +++ b/lib/multisyn/Makefile @@ -0,0 +1,46 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 2004 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +# # +# Makefile for lib/multisyn directory # +# # +########################################################################### +TOP=../.. +DIRNAME=lib/multisyn + +PHONESETS = radio_phones_multisyn.scm +GENERAL = multisyn.scm multisyn_pauses.scm target_cost.scm +OTHERS = send_xwaves.scm + +FILES=Makefile $(PHONESETS) $(GENERAL) $(OTHERS) + +include $(TOP)/config/common_make_rules diff --git a/lib/multisyn/multisyn.scm b/lib/multisyn/multisyn.scm new file mode 100644 index 0000000..dcdd6a8 --- /dev/null +++ b/lib/multisyn/multisyn.scm @@ -0,0 +1,195 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 2003, 2004 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Multisyn top level scheme code (Korin Richmond and Rob Clark) +;;; + +; Requires +(require_module 'UniSyn) +(require_module 'MultiSyn) +(require 'multisyn_pauses) +(require 'target_cost) + +;; use a global parameter to specify which UnitSelection voice +;; to use to synthesise a given utterance for now, because the +;; standard Festival synthesis mainline doesn't accept a voice +;; parameter. (This should be set to the current voice object) +(defvar currentMultiSynVoice nil) +(defvar relp t) +(defvar flattenVoice nil) + +; extract utt list from a .data file +(define (load_utt_list filename) +"(load_utt_list filename) +Loads a fextvox .data file and extracts an utterance list." +(let (l entries) + (set! entries (load filename t)) + (mapcar + (lambda (d) + (set! l (cons (car d) l)) + t) + entries) +l)) + +;; SynthType definition, main entry point. + +(defSynthType MultiSyn + ;(print "Multisyn unit selection synthesis") + (defvar MultiSyn_module_hooks nil) + (Param.def "unisyn.window_name" "hanning") + (Param.def "unisyn.window_factor" 1.0) + ;; Unisyn requires these to be set. + (set! us_abs_offset 0.0) + (set! us_rel_offset 0.0) + + (apply_hooks MultiSyn_module_hooks utt) ;; 4processing of diphone names + + ;; find appropriate unit sequence and put sythesis + ;; parameters in the Unit relation of the utterance structure + (voice.getUnits currentMultiSynVoice utt) + + ;(print "doing concat") + (us_unit_concat utt) + + ;(print "doing raw concat") + + (utt.relation.create utt 'SourceSegments) + + (set! do_prosmod (du_voice.prosodic_modification currentMultiSynVoice)) + + (if do_prosmod + (begin + (if (not (member 'f0 (utt.relationnames utt))) + (targets_to_f0 utt)) + ;; temporary fix + (if (utt.relation.last utt 'Segment) + (set! pm_end (+ (item.feat (utt.relation.last utt 'Segment) "end") 0.02)) + (set! pm_end 0.02)) + (us_f0_to_pitchmarks utt 'f0 'TargetCoef pm_end) + (us_mapping utt 'segment_single)) + (begin + (utt.copy_relation utt 'SourceCoef 'TargetCoef) + (us_mapping utt "linear"))) + + + ;(print "generating wave") +;; specify something else if you don't want lpc + (us_generate_wave utt 'lpc) +) + + +; target cost scheme code +(define (targetcost it1 it2) + (Default_Target_Cost it1 it2)) + +; Evil function which writes the functions to actually load and switch new voices. +(define (make_voice_definition name srate config_function backoff_rules data_dir config) + "(make_voice_definition NAME SRATE CONFIG_FUNCTION BACKOFF_RULES DATA_DIR CONFIG) +Create the fuction definitions to load and unload a voice." + (let ((voice_name (string-append "voice_" name)) + (free_name (string-append "free_voice_" name)) + (pre_config_function (string-append config_function "_pre")) + (voice_variable (upcase (string-append "voice_" name)))) + + (eval (list 'defvar (intern voice_variable) nil)) + + (eval (list 'define (list (intern voice_name)) + (list 'if (intern pre_config_function) + (list (intern pre_config_function) (intern voice_variable))) + (list 'if (list 'null (intern voice_variable)) + (list 'set! (intern voice_variable) + (list 'multisyn_load_voice_modules + (list 'quote name) + srate + (list 'quote backoff_rules) + data_dir + (list 'quote config)))) + (list (intern config_function) (intern voice_variable)) + (list 'set! 'current-voice (list 'quote name)) + (list 'define_current_voice_reset) + (list 'set! 'currentMultiSynVoice (intern voice_variable)) + )) + + (eval (list 'define + (list (intern free_name)) + (list 'cond + (list (list 'null (intern voice_variable)) + (list 'error "Voice not currently loaded!")) + (list (list 'eq? 'currentMultiSynVoice (intern voice_variable)) + (list 'error "Can't free current voice!")) + (list 't (list set! (intern voice_variable) 'nil)))))) + nil) + +(define (multisyn_load_voice_modules name srate backoff_rules base_dir module_list) +"(multisyn_add_module voice name srate backoff_rules base_dir module_list) +Add voice modules to a voice." +(let (voice) + (mapcar + (lambda (module_entry) + (let ((dirs (car module_entry)) + (utt_list (load_utt_list (path-append base_dir + (cadr module_entry))))) + (if (null voice) + (set! voice (make_du_voice utt_list dirs srate)) + (voice.addModule voice utt_list dirs srate)))) + module_list) + (voice.setName voice name) + (if flattenVoice + (du_voice.setTargetCost voice "flat") + (du_voice.setTargetCost voice t)) + (du_voice.setJoinCost voice t) + (format t "Please wait: Initialising multisyn voice.\n") + (voice.init voice) + (format t " Voice loaded successfully!\n") + (du_voice.set_ob_pruning_beam voice 0.25) + (du_voice.set_pruning_beam voice 0.25) + (du_voice.setDiphoneBackoff voice backoff_rules) +voice)) + + + + +(define (define_current_voice_reset) +"(define_current_voice_reset) +Re-define (current_voice_reset) correctly." + (eval (list 'define + (list 'current_voice_reset) + (list 'multisyn_reset_globals)))) + +(define (multisyn_reset_globals) +"(multisyn_reset_globals) +Reset multisyn specific global variables." +(Param.set 'unisyn.window_symmetric 1)) + + +(provide 'multisyn) diff --git a/lib/multisyn/multisyn_pauses.scm b/lib/multisyn/multisyn_pauses.scm new file mode 100644 index 0000000..9ea457b --- /dev/null +++ b/lib/multisyn/multisyn_pauses.scm @@ -0,0 +1,102 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 2003, 2004 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Multisyn Pause module (Rob Clark and Korin Richmond) +;;; +;;; + +(defvar BB_Pause "B_300") +(defvar B_Pause "B_150") +(defvar mB_Pause "B_150") ; shouldn't be used + +(define (MultiSyn_Pauses utt) + "(MultiSyn_Pauses UTT) +Predict pause insertion in a Multisyn unit selection utterance structure." + (let ((words (utt.relation.items utt 'Word)) lastword tpname) + (if words + (begin + (insert_initial_pause utt) ;; always have a start pause + (set! lastword (car (last words))) + (mapcar + (lambda (w) + (let ((pbreak (item.feat w "pbreak")) + (emph (item.feat w "R:Token.parent.EMPH"))) + (cond + ((string-equal pbreak "BB") + (unitselection_pause_insert w BB_Pause)) + ((string-equal pbreak "mB") + (unitselection_pause_insert w mB_Pause)) + ((string-equal pbreak "B") + (unitselection_pause_insert w B_Pause))))) + words) + ;; The embarassing bit. Remove any words labelled as punc or fpunc + (mapcar + (lambda (w) + (let ((pos (item.feat w "pos"))) + (if (or (string-equal "punc" pos) + (string-equal "fpunc" pos)) + (let ((pbreak (item.feat w "pbreak")) + (wp (item.relation w 'Phrase))) + (if (and (string-matches pbreak "BB?") + (item.relation.prev w 'Word)) + (item.set_feat + (item.relation.prev w 'Word) "pbreak" pbreak)) + (item.relation.remove w 'Word) + ;; can't refer to w as we've just deleted it + (item.relation.remove wp 'Phrase))))) + words))) + (utt.relation.print utt 'Word) + (utt.relation.print utt 'Segment) + utt)) + +(define (unitselection_pause_insert word pause) + "(pause_insert word pause) + Insert segments needed for a pause." +(let ((silence (car (cadr (car (PhoneSet.description '(silences)))))) + (seg (item.relation (find_last_seg word) 'Segment)) + pause_item) + (format t " inserting pause after: %s.\n" (item.name seg)) + (format t " Inserting pause\n") +; if next seg is not silence insert one. + (if (or (not (item.next seg)) + (not (string-equal (item.name (item.next seg)) silence))) + (item.insert seg (list silence) 'after)) +; insert pause after that if not the end. + (if (item.next (item.next seg)) + (begin + (set! pause_item (item.insert (item.next seg) (list pause) 'after)) +;if next seg after that is not silence add one. + (if (not (string-equal (item.name (item.next pause_item)) silence)) + (item.insert pause_item (list silence) 'after)))))) + +(provide 'multisyn_pauses) diff --git a/lib/multisyn/radio_phones_multisyn.scm b/lib/multisyn/radio_phones_multisyn.scm new file mode 100644 index 0000000..1c6af01 --- /dev/null +++ b/lib/multisyn/radio_phones_multisyn.scm @@ -0,0 +1,136 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997,2003, 2004 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; A definition of the radio phone set used in the BU RADIO FM +;;; corpus, some people call this the darpa set. This one +;;; has the closures removed and pauses added for multisyn +;;; + +(defPhoneSet + radio_multisyn + ;;; Phone Features + (;; vowel or consonant + (vc + -) + ;; vowel length: short long dipthong schwa + (vlng s l d a 0) + ;; vowel height: high mid low + (vheight 1 2 3 0) + ;; vowel frontness: front mid back + (vfront 1 2 3 0) + ;; lip rounding + (vrnd + - 0) + ;; consonant type: stop fricative affricate nasal lateral approximant + (ctype s f a n l r 0) + ;; place of articulation: labial alveolar palatal labio-dental + ;; dental velar glottal + (cplace l a p b d v g 0) + ;; consonant voicing + (cvox + - 0) + ) + ;; Phone set members + ( + ;; multisyn extras + (# - 0 0 0 0 0 0 -) ;; slience ... + (B_10 - 0 0 0 0 0 0 -) ;; Pauses + (B_20 - 0 0 0 0 0 0 -) ;; Pauses + (B_30 - 0 0 0 0 0 0 -) ;; Pauses + (B_40 - 0 0 0 0 0 0 -) ;; Pauses + (B_50 - 0 0 0 0 0 0 -) ;; Pauses + (B_100 - 0 0 0 0 0 0 -) ;; Pauses + (B_150 - 0 0 0 0 0 0 -) ;; Pauses + (B_200 - 0 0 0 0 0 0 -) ;; Pauses + (B_250 - 0 0 0 0 0 0 -) ;; Pauses + (B_300 - 0 0 0 0 0 0 -) ;; Pauses + (B_400 - 0 0 0 0 0 0 -) ;; Pauses + + ;; Note these features were set by awb so they are wrong !!! + (aa + l 3 3 - 0 0 0) ;; father + (ae + s 3 1 - 0 0 0) ;; fat + (ah + s 2 2 - 0 0 0) ;; but + (ao + l 3 3 + 0 0 0) ;; lawn + (aw + d 3 2 - 0 0 0) ;; how + (ax + a 2 2 - 0 0 0) ;; about + (axr + a 2 2 - r a +) + (ay + d 3 2 - 0 0 0) ;; hide + (b - 0 0 0 0 s l +) + (ch - 0 0 0 0 a p -) + (d - 0 0 0 0 s a +) + (dh - 0 0 0 0 f d +) + (dx - a 0 0 0 s a +) ;; ?? + (eh + s 2 1 - 0 0 0) ;; get + (el + s 0 0 0 l a +) + (em + s 0 0 0 n l +) + (en + s 0 0 0 n a +) + (er + a 2 2 - r 0 0) ;; always followed by r (er-r == axr) + (ey + d 2 1 - 0 0 0) ;; gate + (f - 0 0 0 0 f b -) + (g - 0 0 0 0 s v +) + (hh - 0 0 0 0 f g -) + (hv - 0 0 0 0 f g +) + (ih + s 1 1 - 0 0 0) ;; bit + (iy + l 1 1 - 0 0 0) ;; beet + (jh - 0 0 0 0 a p +) + (k - 0 0 0 0 s v -) + (l - 0 0 0 0 l a +) + (m - 0 0 0 0 n l +) + (n - 0 0 0 0 n a +) + (nx - 0 0 0 0 n d +) ;; ??? + (ng - 0 0 0 0 n v +) + (ow + d 2 3 + 0 0 0) ;; lone + (oy + d 2 3 + 0 0 0) ;; toy + (p - 0 0 0 0 s l -) + (r - 0 0 0 0 r a +) + (s - 0 0 0 0 f a -) + (sh - 0 0 0 0 f p -) + (t - 0 0 0 0 s a -) + (th - 0 0 0 0 f d -) + (uh + s 1 3 + 0 0 0) ;; full + (uw + l 1 3 + 0 0 0) ;; fool + (v - 0 0 0 0 f b +) + (w - 0 0 0 0 r l +) + (y - 0 0 0 0 r p +) + (z - 0 0 0 0 f a +) + (zh - 0 0 0 0 f p +) + (pau - 0 0 0 0 0 0 -) + (h# - 0 0 0 0 0 0 -) + (brth - 0 0 0 0 0 0 -) + ) +) + +(PhoneSet.silences '(# pau h# brth)) + +(provide 'radio_phones_multisyn) + + + + diff --git a/lib/multisyn/send_xwaves.scm b/lib/multisyn/send_xwaves.scm new file mode 100644 index 0000000..f498324 --- /dev/null +++ b/lib/multisyn/send_xwaves.scm @@ -0,0 +1,318 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 2003, 2004 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; xwaves interface for festival for multisyn (Rob Clark) +;;; +;;; This is never loaded by defualt. +;;; You'd need to change the paths here for this to currently work outside of CSTR. +;;; If anyone else ends up using it let me know and I'll make it more robust. +;;; + +;; Send commands to xwaves + +(defvar send_xwaves_command "/cstr/linux/entropic/esps531.linux/bin/send_xwaves") +(defvar spectrogram_command "/cstr/linux/entropic/esps531.linux/bin/sgram") +(defvar data_path "/projects/cougar/data/cstr/nina") + +(set! xw_object_count 0) +(set! xw_active_list nil) + +;; +;; Display a synthesised utterance +;; +(define (xwaves_display_utterance utt) +"(xwaves_display_utterance utt) +Display join and target information for an utterance." + (let ((units (utt.relation.items utt 'Unit)) + (object (xw_name_object)) + wavfile specfile segfile diphfile joinfile targfile sourcefile timefile) + + (set! wavfile (xw_make_tmp_filename object)) + (set! specfile (xw_make_tmp_filename object)) + (set! segfile (xw_make_tmp_filename object)) + (set! diphfile (xw_make_tmp_filename object)) + (set! joinfile (xw_make_tmp_filename object)) + (set! targfile (xw_make_tmp_filename object)) + (set! sourcefile (xw_make_tmp_filename object)) + (set! timefile (xw_make_tmp_filename object)) + + ; display resulting waveform + (utt.save.wave utt wavfile 'riff) + (xwaves_show_general object wavfile 1500 200 10 10) + ; display resulting spectrogram + (xw_genspec wavfile specfile) + (xwaves_show_general object specfile 1500 400 10 260) + ; segments + (utt.save.unit_selection_segs utt segfile) + (xwaves_show_labels object segfile specfile) + ; Unit information + (utt.save.unit_selection_info utt diphfile joinfile targfile sourcefile timefile) + (xwaves_show_labels object timefile specfile) + (xwaves_show_labels object sourcefile specfile) + (xwaves_show_labels object targfile specfile) + (xwaves_show_labels object joinfile specfile) + (xwaves_show_labels object diphfile specfile) + ; mark files + (xw_register_active object (list wavfile specfile segfile diphfile joinfile sourcefile timefile)) +)) + +;; +;; Edit a diphone source +;; + +(define (xwaves_edit_diphone utt id) + "(xwaves_edit_diphone utt id) +Access the source diphone for label correction." +(let ((diphone nil) + segfilename + wavefilename + (utt (Utterance Text nil)) + segs + (seg nil) + (start 0) + end) + + ;; find unit. + (mapcar + (lambda (unit) + (if (string-equal (format nil "_%s" id) (item.feat unit id)) + (set! diphone unit))) + (utt.relation.items utt 'Unit)) + (if (null diphone) + (error (format nil "Diphone with id _%s not found in utterance."))) + (set! uttname (item.feat diphone "source_utt")) + (set! end (item.feat diphone "source_end")) + + (set! segfilename (format nil "%s/lab/%s.lab" data_path uttname)) + (set! wavefilename (format nil "%s/wav/%s.wav" data_path uttname)) + (utt.relation.load utt 'Segment segfilename) + + (set! segs (utt.relation.items utt 'Segment)) + (while (and segs + (not (equal? (item.feat (car segs) "end") end))) + (set! segs (cdr segs))) + + if null seg ... + + (if (item.prev diphone) + (set! start (item.feat seg "start")) + (set! start 0)) + + +)) + + + + + +;; +;; Interface with xwaves. +;; + + +(define (xwaves_show_general object file width height xpos ypos) +"(xwaves_show_general object file width height xpos ypos) +Display an wave or track file." + (xw_send (format nil "make name %s file %s width %d height %d loc_x %d loc_y %d" object file width height xpos ypos))) + +(define (xwaves_show_wave object file) +"(xwaves_show_wave object file) +Display a waveform." + (xwaves_show_general object file 1500 200 10 10)) + +(define (xwaves_show_labels object file attachto) +"(xwaves_show_labels object file attachto) +Display a label file." + (xw_send (format nil "send make signal %s name %s file %s color 125" attachto object file)) + (xw_send "send activate fields 1 2 3 4 5")) + + +(define (xwaves_attach_xlabel) +"(xwaves_attach_xlabel) +Attach xlabel to xwaves." + (xw_send "attach function xlabel")) + +(define (xwaves_set_markers object left right) +"(xwaves_set_markers object left right) +Set the markers." + (xw_send (format nil "%s set l_marker_time %f" object left)) + (xw_send (format nil "%s set r_marker_time %f" object right))) + +(define (xwaves_bracket_markers object file) +"(xwaves_bracket_markers object file) +Bracket markers." + (xw_send (format nil "%s bracket file %s " object file))) + +(define (xwaves_close_windows object) +"(xwaves_close_windows object) +Close currently open windows related to object or all if nil.." +(cond + ((null object) + (xw_send "kill")) + (t + (xw_send (format nil "kill name %s" object)))) +(xw_clear_active_list object)) + + +(define (xwaves_wait) +"(xwaves_wait) +Wait for xwaves continue signal." + (xw_send "pause")) + + +;; +;; Object naming +;; +(define (xw_name_object) +"(xw_name_object) +Generate a name for this object." +(let (name) + (set! name (string-append "obj" xw_object_count)) + (set! xw_object_count (+ xw_object_count 1)) + name)) + +;; +;; Temp file lists +;; + +(define (xw_clear_active_list object) +"(xw_clear_active_list) +Clear active list of specified object, or all if nil." +(let (new_active_list) +(mapcar + (lambda (objectlist) + (cond + ((or (null object) + (string-equal object (car objectlist))) + (mapcar + (lambda (file) + (delete-file file)) + (cadr objectlist))) + (t + (set! new_active_list (cons objectlist new_active_list))))) + xw_active_list) +(set! xw_active_list new_active_list)) +nil) + + +(define (xw_register_active object flist) + "(xw_register_active object flist) +Adds an object and its filenames to the active list." + (set! xw_active_list (cons (cons object (list flist)) xw_active_list)) + nil) + +(define (xw_make_tmp_filename object) + "(xw_make_tmp_filename) +make tmp file name which incorporates object name." +(format nil "%s_%s" (make_tmp_filename) object)) + + +;; +;; Low level xwaves stuff. +;; + +(define (xw_genspec wavfile specfile) +"(xw_genspec wavfile specfile) +Generate a spectrogram file." + (system (format nil "%s -dHAMMING -o8 -E0.94 -S2 -w8 %s %s\n" spectrogram_command wavfile specfile))) + +(define (xw_send command) +"(xw_send command) +Send a command to xwaves." + (system (format nil "%s %s\n" send_xwaves_command command))) + + + +;; +;; General Festival stuff. +;; + + +(define (utt.save.unit_selection_segs utt filename) +"(utt.save.unit_selection_segs utt filename) + Save unit selection segments of UTT in a FILE in xlabel format." + (let ((fd (fopen filename "w"))) + (format fd "#\n") + (mapcar + (lambda (info) + (format fd "%2.4f 100 %s\n" (car info) (car (cdr info)))) + (utt.features utt 'Segment '(source_end name))) + (fclose fd) + utt)) + +(define (utt.save.unit_selection_info utt diphfile joinfile targfile sourcefile timefile) +"(utt.save.unit_selection_info utt diphfile joinfile targfile sourcefile timefile) + Save stuff in xlabel format." + (let ((fdd (fopen diphfile "w")) + (fdj (fopen joinfile "w")) + (fdt (fopen targfile "w")) + (fds (fopen sourcefile "w")) + (fdx (fopen timefile "w")) + real_join) + (format fdd "#\n") + (format fdj "#\n") + (format fdt "#\n") + (format fds "#\n") + (format fdx "#\n") + (mapcar + (lambda (unit) + (set! real_join "") + (if (item.next unit) + (if (not (string-equal (item.feat unit 'source_utt) + (item.feat (item.next unit) 'source_utt))) + (set! real_join "*"))) + (format fdd "%2.4f 100 %s %s\n" + (item.feat unit 'end) + (item.feat unit 'name) + real_join) + (format fdj "%2.4f 100 %s\n" + (item.feat unit 'end) + (if (item.next unit) + (item.feat (item.next unit) 'join_cost) + 0)) + (format fdt "%2.4f 100 %s\n" + (item.feat unit 'end) + (item.feat unit 'target_cost)) + (format fds "%2.4f 100 %s\n" + (item.feat unit 'end) + (item.feat unit 'source_utt)) + (format fdx "%2.4f 100 %s\n" + (item.feat unit 'end) + (item.feat unit 'source_end))) + (utt.relation.items utt 'Unit)) + (fclose fdd) + (fclose fdj) + (fclose fdt) + (fclose fds) + (fclose fdx) + utt)) diff --git a/lib/multisyn/target_cost.scm b/lib/multisyn/target_cost.scm new file mode 100644 index 0000000..fc5d223 --- /dev/null +++ b/lib/multisyn/target_cost.scm @@ -0,0 +1,410 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 2003, 2004 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Multisyn scheme target cost (Rob Clark and Korin Richmond) +;;; +;;; + +(define (Default_Target_Cost targ cand) +"(Default_Target_Cost targ cand) +A Default Target Cost function." +(let ((cost 0)) + (mapcar + (lambda (row) + (set! cost (+ cost (tc_eval_row row targ cand)))) + target_matrix) + (set! cost (/ cost target_matrix_weight)) + cost)) + + +(define (tc_eval_row row targ cand) + "(tc_eval_row row targ cand) +Evaluate a target matrix row." +(let ((weight (car row)) + (func (cadr row)) + (result 0)) + (set! result (* weight (eval (list func targ cand)))) + result)) + +;; +;; Target cost Matrix +;; '(weight function) + +(define (get_matrix_weight m) + (let ((w 0)) + (mapcar + (lambda (x) + (set! w (+ w (car x)))) + m) + w)) + + +(set! test_matrix_max_weight 1) +(set! test_matrix +'( + (10 tc_stress ) + (5 tc_syl_pos ) + (5 tc_word_pos) + (6 tc_partofspeech) + (7 tc_phrase_pos) + (4 tc_left_context) + (3 tc_right_context) + (25 tc_bad_f0) ;; set to equal 1/3 of total cost (so high because interaction with join) +; (0 tc_segment_score) ;; was 4. turned off until utterances are built for this. + (10 tc_bad_duration) ;; was 6 +)) + +(set! test_matrix_weight (* test_matrix_max_weight (get_matrix_weight test_matrix))) + +(set! target_matrix test_matrix) +(set! target_matrix_weight test_matrix_weight) + + + +;; +;; tc_stress +;; +;; Compares stress on any vowel which form part of the diphone. stress +;; conditions must match for a zero target cost. +;; + +(define (tc_stress targ cand) +"(tc_stress targ cand) +Target Cost stressed. 0 - stress patterns match [ compares: 0 unstressed vs. > 0 stressed ] + 1 - stress miss-match. +" +(let ((c 0) + cand_stress targ_stress) + ;(format t "my_is_vowel %l\n" (my_is_vowel targ)) + ;(format t "phone_is_silence %l\n" (phone_is_silence (item.feat targ 'name))) + ;; For first segment + (if (and (not (phone_is_silence (item.feat targ 'name))) + (my_is_vowel targ)) + (begin + (set! cand_stress (item.feat cand "R:SylStructure.parent.stress")) + (set! targ_stress (item.feat targ "R:SylStructure.parent.stress")) + (if (or (and (eq? cand_stress 0) (> targ_stress 0)) + (and (eq? targ_stress 0) (> cand_stress 0))) + (set! c 1)))) + ;; For second segment + ;(format t "n.my_is_vowel %l\n" (my_is_vowel (item.next targ))) + ;(format t "n.phone_is_silence %l\n" (phone_is_silence (item.feat targ 'n.name))) + (if (and (not (phone_is_silence (item.feat targ 'n.name))) + (my_is_vowel (item.next targ))) + (begin + (set! cand_stress (item.feat cand "n.R:SylStructure.parent.stress")) + (set! targ_stress (item.feat targ "n.R:SylStructure.parent.stress")) + (if (or (and (eq? cand_stress 0) (> targ_stress 0)) + (and (eq? targ_stress 0) (> cand_stress 0))) + (set! c 1)))) +; (format t "tc_stress: %l\n" c) +c)) + + +;; +;; tc_syl_position +;; +;; Find and compare diphone position in syllabic structure. +;; Values are: inter - diphone crosses syllable boundary. +;; initial - diphone is syllable initial. +;; medial - diphone is syllable medial +;; final - diphone is syllable final +;; returns 0 for a match 1 for a mismatch. +;; +(define (tc_syl_pos targ cand) +"(tc_syl_pos targ cand) +Score position in syllable." +(let ((targ_pos "medial") + (cand_pos "medial") + (targ_syl (get_syl targ)) + (targ_next_syl (get_syl (item.next targ))) + (cand_syl (get_syl cand)) + (cand_next_syl (get_syl (item.next cand)))) + ;; target + (cond + ((not (equal? targ_syl targ_next_syl)) + (set! targ_pos "inter")) + ((not (equal? targ_syl (get_syl (item.prev targ)))) + (set! targ_pos "initial")) + ((not (equal? targ_next_syl (get_syl (item.next (item.next targ))))) + (set! targ_pos "final"))) + ;; candidate + (cond + ((not (equal? cand_syl cand_next_syl)) + (set! cand_pos "inter")) + ((not (equal? cand_syl (get_syl (item.prev cand)))) + (set! cand_pos "initial")) + ((not (equal? cand_next_syl (get_syl (item.next (item.next cand))))) + (set! cand_pos "final"))) +; (format t "targ_syl: %l cand_syl %l\n" targ_pos cand_pos) + (if (equal? targ_pos cand_pos) 0 1))) + +;; +;; tc_word_position +;; +;; Find and compare diphone position in word structure +;; Values are: inter - diphone crosses word boundary. +;; initial - diphone is word initial. +;; medial - diphone is word medial +;; final - diphone is word final +;; returns 0 for a match 1 for a mismatch. +;; +(define (tc_word_pos targ cand) +"(tc_word_pos targ cand) +Score position in word." +(let ((targ_pos "medial") + (cand_pos "medial") + (targ_word (get_word targ)) + (targ_next_word (get_word (item.next targ))) + (cand_word (get_word cand)) + (cand_next_word (get_word (item.next cand)))) + ;; target + (cond + ((not (equal? targ_word targ_next_word)) + (set! targ_pos "inter")) + ((not (equal? targ_word (get_word (item.prev targ)))) + (set! targ_pos "initial")) + ((not (equal? targ_next_word (get_word (item.next (item.next targ))))) + (set! targ_pos "final"))) + ;; candidate + (cond + ((not (equal? cand_word cand_next_word)) + (set! cand_pos "inter")) + ((not (equal? cand_word (get_word (item.prev cand)))) + (set! cand_pos "initial")) + ((not (equal? cand_next_word (get_word (item.next (item.next cand))))) + (set! cand_pos "final"))) +; (format t "targ_word: %l cand_word %l\n" targ_pos cand_pos) + (if (equal? targ_pos cand_pos) 0 1))) + + + +;; +;; tc_phrase_position +;; +;; Position (of word) in phrase +;; initial/medial/final +;; +;; 0 - match, 1 - mismatch +;; +(define (tc_phrase_pos targ cand) +"(tc_phrase_pos targ cand) + Score position in phrase." +(let ((targ_word (get_word targ)) + (cand_word (get_word cand))) + (cond + ((and (null targ_word) + (null cand_word)) + 0) + ((or (null targ_word) + (null cand_word)) + 1) + ((string-equal (item.feat targ_word 'pbreak) + (item.feat cand_word 'pbreak)) + 0) + (t 1)))) + +;; +;; tc_partofspeech +;; +;; +;; +(define (tc_partofspeech targ cand) +"(tc_partofspeech targ cand) + Score part of speech." +(let ((targ_word (get_word targ)) + (cand_word (get_word cand)) + targ_pos cand_pos) +(if targ_word + (set! targ_pos (simple_pos (item.feat targ_word 'pos)))) +(if cand_word + (set! cand_pos (simple_pos (item.feat cand_word 'pos)))) + ;(format t "targ_pos %l cand_pos %l\n" targ_pos cand_pos) + (if (equal? targ_pos cand_pos) 0 1))) + +(define (score_contexts targ_context cand_context) + "(score_contexts targ_context cand_context) +If both context items are nil, then score is 0. +If both context items are not nil, and are the same, then +score is 0. Otherwise, score is 1." + (if (and targ_context cand_context) + (if (equal? (item.feat targ_context "name") + (item.feat cand_context "name")) + 0 + 1) + (if (and (equal? targ_context nil) + (equal? cand_context nil)) + 0 + 1))) + + +(define (tc_left_context targ cand) +"(tc_left_context targ cand) +Score left phonetic context." +(let ((targ_context (item.prev targ)) + (cand_context (item.prev cand))) + (score_contexts targ_context cand_context))) + +;; +;; tc_right_context +;; +;; +;; +(define (tc_right_context targ cand) +"(tc_right_context targ cand) +Score right phonetic context." +(let ((targ_context (item.next (item.next targ))) + (cand_context (item.next (item.next cand)))) + (score_contexts targ_context cand_context))) + + +;; +;; tc_segment_score +;; +;; This currently thresholds based on looking at the distributions of the scores. +;; A nice exp function may be better. +(define (tc_segment_score targ cand) +"tc_segment_score targ cand) +A bad alignment score make a bad segment." +(let ((score 0)) + (if (not (phone_is_silence (item.feat cand "name"))) + (set! score (+ score (item.feat cand 'score)))) + (if (not (phone_is_silence (item.feat (item.next cand) "name"))) + (set! score (+ score (item.feat (item.next cand) 'score)))) + (cond + ((> score -4000) ;2000 (x2) is 7.5% + 0) + ((> score -5000) ;2500 (x2) is 5.0% + 0.5) + (t 1)))) + +;; +;; tc_bad_duration +;; +;; If the segment is marked as having a weird duration penalise it. +;; We allow bad_dur to be set on the target so resynthesis works +;; and so you could ask for really long/short segments. +;; +(define (tc_bad_duration targ cand) + (if (equal? (item.feat targ "bad_dur") + (item.feat cand "bad_dur")) + 0 + 1)) + + +;; +;; tc_bad_f0 +;; +;; If the candidate is deemed to have an inappropriate f0, then penalise it. +;; +;; Specifically, if the targ/cand segment type is expected to be voiced, then +;; an f0 of zero is bad (results from poor pitch tracking). In such a case, +;; the join cost would then favour other units with f0 (since the euclidean +;; distance between two zeros is very small ;) +;; We want to avoid that. +;; +;; Presumeably, we also want to penalise cases where supposedly voiceless +;; candidates have an f0 != 0 (either a consequence of bad pitch tracking +;; or bad labelling) but that's not done here yet... +;; +;; (the function itself has been implemented in C for convenience, and +;; this stub is left here just for this note ;) + +(define (tc_bad_f0 targ cand) + (let ((score (temp_tc_bad_f0 targ cand)) + (name (format nil "%s_%s" + (item.feat targ "name") + (item.feat (item.next targ) "name")))) + (if (not (equal? score 0.0)) + (format t "f0 score for %s is %f\n" name score)) + score)) + +;; +;; Is a segment a vowel? ( ph_is_a_vowel doesn't seem to work) +;; +(define (my_is_vowel seg) + (if seg + (if (equal? (item.feat seg 'ph_vc) "+") + t + nil))) + + + +;; get the syllable from sysstructure in normal utterance +;; +(define (get_syl seg) + (let (syl) + (if seg + (set! syl (item.relation.parent seg 'SylStructure))) + syl)) + +;; get the word from sylstructure in normal utterance +;; +(define (get_word seg) + (let ((syl (get_syl seg)) + word) + (if syl + (set! word (item.parent syl))) + word)) + + +;; simple pos +;; +(define (simple_pos pos) +(let (spos) + (cond + ((member_string pos '(vbd vb vbn vbz vbp vbg)) + (set! spos "v")) + ((member_string pos '(nn nnp nns nnps fw sym ls)) + (set! spos "n")) + ((member_string pos '(dt gin prp cc of to cd md pos wdt wp wrb ex uh pdt)) + (set! spos "func")) + ((member_string pos '(jj jjr jjs 1 2 rb rp rbr rbs)) + (set! spos "other"))) + spos)) + + +;; debugging + +(define (test_target_cost utt1 utt2) +(let ((segs1 (utt.relation.items utt1 'Segment)) + (segs2 (utt.relation.items utt2 'Segment)) + (tc 0)) + (while (and segs1 segs2) + (set! tc (Default_Target_Cost (car segs1) (car segs2))) + (format t "targ: %l cand: %l cost: %l\n" (item.name (car segs1)) (item.name (car segs2)) tc) + (set! segs1 (cdr segs1)) + (set! segs2 (cdr segs2))))) + + +(provide 'target_cost) diff --git a/lib/ogimarkup-mode.scm b/lib/ogimarkup-mode.scm new file mode 100644 index 0000000..2bca41a --- /dev/null +++ b/lib/ogimarkup-mode.scm @@ -0,0 +1,191 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; An example tts text mode for reading OGI's CSLU toolkit mark up +;;; +;;; Note not all tokens do something in festival but all are removed +;;; from the actual text +;;; + +(defvar ogimarkup_eou_tree +'((n.name matches "<.*") + ((1)) +((n.whitespace matches ".*\n.*\n\\(.\\|\n\\)*") ;; A significant break (2 nls) + ((1)) + ((punc in ("?" ":" "!")) + ((1)) + ((punc is ".") + ;; This is to distinguish abbreviations vs periods + ;; These are heuristics + ((name matches "\\(.*\\..*\\|[A-Z][A-Za-z]?[A-Za-z]?\\|etc\\)") ;; an abbreviation + ((n.whitespace is " ") + ((0)) ;; if abbrev single space isn't enough for break + ((n.name matches "[A-Z].*") + ((1)) + ((0)))) + ((n.whitespace is " ") ;; if it doesn't look like an abbreviation + ((n.name matches "[A-Z].*") ;; single space and non-cap is no break + ((1)) + ((0))) + ((1)))) + ((0))))))) + +(define (ogimarkup_init_func) + "Called on starting ogimarkup text mode." + (set! ogimarkup_in_tag nil) + (set! ogimarkup_tagtokens "") + (set! ogimarkup_previous_t2w_func token_to_words) + (set! english_token_to_words ogimarkup_token_to_words) + (set! token_to_words ogimarkup_token_to_words) + (set! ogimarkup_previous_eou_tree eou_tree) + (set! eou_tree ogimarkup_eou_tree)) + +(define (ogimarkup_exit_func) + "Called on exit ogimarkup text mode." + (Parameter.set 'Duration_Stretch 1.0) + (set! token_to_words ogimarkup_previous_t2w_func) + (set! english_token_to_words ogimarkup_previous_t2w_func) + (set! eou_tree ogimarkup_previous_eou_tree)) + +(define (ogimarkup_token_to_words token name) + "(ogimarkup_token_to_words token name) +OGI markup specific token to word rules. Tags may have optional +argument e.g. or which means the tag may be over +a number of tokens." + (let (tag (arg nil) (rval nil)) + (cond + ((string-matches name "<.*") + (set! ogimarkup_tagtokens "") + (set! tag (string-after name "<")) + (if (string-matches tag ".*>$") + (set! tag (string-before tag ">")) + (if (string-matches (set! arg (item.feat token "n.name")) + ".*>$") + (set! arg (string-before arg ">")))) + (set! ogimarkup_in_tag tag) + (cond + ((string-equal tag "slow") + (Parameter.set 'Duration_Stretch 1.3)) + ((string-equal tag "SLOW") + (Parameter.set 'Duration_Stretch 2.0)) + ((string-equal tag "normal") + (Parameter.set 'Duration_Stretch 1.0)) + ((string-matches tag "FAST") + (Parameter.set 'Duration_Stretch 0.5)) + ((string-matches tag "fast") + (Parameter.set 'Duration_Stretch 0.8)) + ((string-matches tag"spell") + ;; This ain't really right as we'll get an utterance break here + (set! rval (symbolexplode arg))) + ((string-matches tag "phone") + ;; This ain't really right as we'll get an utterance break here + (item.set_feat token "token_pos" "digits") ;; canonical phone number + (set! rval (ogimarkup_previous_t2w_func token arg))) + ((string-matches tag "male") + (if (and (member 'OGIresLPC *modules*) + (symbol-bound? 'voice_aec_diphone)) + (voice_aec_diphone) + (voice_kal_diphone))) + ((string-matches tag "Male") + (if (and (member 'OGIresLPC *modules*) + (symbol-bound? 'voice_mwm_diphone)) + (voice_mwm_diphone) + (voice_cmu_us_rms_cg))) + ((string-matches tag "MALE") + (if (and (member 'OGIresLPC *modules*) + (symbol-bound? 'voice_jph_diphone)) + (voice_jph_diphone) + (voice_rab_diphone))) + ((string-matches tag "FT") + t) ;; do nothing until the end of this tag + ((string-matches (downcase tag) "female") + ;; only one female voice so map female Female FEMALE to it + (if (and (member 'OGIresLPC *modules*) + (symbol-bound? 'voice_tll_diphone)) + (voice_tll_diphone) + (voice_cmu_us_slt_arctic_hts)))) + (if (string-matches name ".*>$") + (set! ogimarkup_in_tag nil)) + rval ;; mostly nil + ) + ((string-matches name ".*>$") + (set! ogimarkup_tagtokens + (string-append + ogimarkup_tagtokens + (ogimarkup_get_token_string token t))) ;; delete final > + (if (string-equal ogimarkup_in_tag "FT") + (ogimarkup_festival_eval ogimarkup_tagtokens)) + (set! ogimarkup_in_tag nil) ;; end of tag + nil) + (ogimarkup_in_tag + (set! ogimarkup_tagtokens + (string-append + ogimarkup_tagtokens + (ogimarkup_get_token_string token nil))) + nil) ;; still in tag + (t ;; for all other cases + (ogimarkup_previous_t2w_func token name))))) + +(set! tts_text_modes + (cons + (list + 'ogimarkup ;; mode name + (list ;; ogimarkup mode params + (list 'init_func ogimarkup_init_func) + (list 'exit_func ogimarkup_exit_func))) + tts_text_modes)) + +(define (ogimarkup_get_token_string token delend) + "(ogimarkup_get_token_string TOKEN DELEND) +return string for token including whitespace and punctuation. If DELEND +is true remove > from the name." + (string-append + (item.feat token "whitespace") + (item.feat token "prepunctuation") + (if delend + (string-before + (item.feat token "name") ">") + (item.feat token "name")) + (if (string-equal "0" (item.feat token "punc")) + "" + (item.feat token "punc")))) + +(define (ogimarkup_festival_eval tagtokens) +"(ogimarkup_festival_eval TAGTOKENS +Take a string of the tokens within the tag and read an s-expression from +it and then evaluate it." + (let ((com "") (command nil)) + (set! command (read-from-string tagtokens)) + (eval command))) + +(provide 'ogimarkup-mode) diff --git a/lib/pauses.scm b/lib/pauses.scm new file mode 100644 index 0000000..18af2a9 --- /dev/null +++ b/lib/pauses.scm @@ -0,0 +1,242 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Predicting pause insertion + +(define (Pauses utt) +"(Pauses utt) +Insert pauses where required." + (let ((rval (apply_method 'Pause_Method utt))) + (cond + (rval rval) ;; new style + (t + (Classic_Pauses utt)))) + (Pause_optional_deleting_B_X utt)) + +(define (Classic_Pauses utt) + "(Pauses UTT) +Predict pause insertion." + (let ((words (utt.relation.items utt 'Word)) lastword tpname) + (if words + (begin + (insert_initial_pause utt) ;; always have a start pause + (set! lastword (car (last words))) + (mapcar + (lambda (w) + (let ((pbreak (item.feat w "pbreak")) + (emph (item.feat w "R:Token.parent.EMPH"))) + (cond + ((or (string-equal "B" pbreak) + (string-equal "BB" pbreak)) + (insert_pause utt w)) +; ((string-equal emph "1") +; (insert_pause utt w)) + ((equal? w lastword) + (insert_pause utt w))))) + words) + ;; The embarrassing bit. Remove any words labelled as punc or fpunc + (mapcar + (lambda (w) + (let ((pos (item.feat w "pos"))) + (if (or (string-equal "punc" pos) + (string-equal "fpunc" pos)) + (let ((pbreak (item.feat w "pbreak")) + (wp (item.relation w 'Phrase))) + (if (and (string-matches pbreak "BB?") + (item.relation.prev w 'Word)) + (item.set_feat + (item.relation.prev w 'Word) "pbreak" pbreak)) + (item.relation.remove w 'Word) + ;; can't refer to w as we've just deleted it + (item.relation.remove wp 'Phrase))))) + words) + ;; 12/01/2006 V.Strom: Even more embarrasing: Delete all silences + ;; that are followed by a silence. These silence sequences + ;; emerge if 'punc of phrase-final words consists of more than one + ;; character, e.g. period+quote. That in turn causes problems in + ;; build_utts: the 2nd silence ends up with no features but its name, + ;; because there is no corresponding 2nd silence in the phone + ;; segmentation to align with. + ;; This schould be fixed in the functions below, but it is easier for + ;; me to clean up at the end: + (set! sil (car (car (cdr (car (PhoneSet.description '(silences))))))) + (set! seg (item.next(utt.relation.first utt 'Segment))) + (while seg + (if(and(equal? sil (item.name seg)) + (equal? sil (item.name (item.prev seg)))) + (item.delete (item.prev seg))) + (set! seg (item.next seg))))) + utt)) + +(define (insert_pause utt word) +"(insert_pause UTT WORDITEM) +Insert a silence segment after the last segment in WORDITEM in UTT." + (let ((lastseg (find_last_seg word)) + (silence (car (car (cdr (car (PhoneSet.description '(silences)))))))) + (if lastseg + (item.relation.insert + lastseg 'Segment (list silence) 'after)))) + +(define (insert_initial_pause utt) +"(insert_initial_pause UTT) +Always have an initial silence if the utterance is non-empty. +Insert a silence segment after the last segment in WORDITEM in UTT." + (let ((firstseg (car (utt.relation.items utt 'Segment))) + (silence (car (car (cdr (car (PhoneSet.description '(silences)))))))) + (if firstseg + (item.relation.insert + firstseg 'Segment (list silence) 'before)))) + +(define (insert_final_pause utt) +"(insert_final_pause UTT) +Always have a final silence if the utterance is non-empty." + (let ((lastseg (utt.relation.last utt 'Segment)) + (silence (car (car (cdr (car (PhoneSet.description '(silences)))))))) + (set! silence (format nil "%l" silence)) ; to make the symbol a string + ;(format t "silence is %l\n" silence) + ;(format t "lastseg is %l\n" (item.name lastseg)) + (if lastseg + (if (not(equal? (item.name lastseg) silence)) + (begin + (format t "iserted final pause %s\n" silence) + (item.relation.insert lastseg 'Segment (list silence) 'after)))))) + + +(define (find_last_seg word) +;;; Find the segment that is immediately at this end of this word +;;; If this word is punctuation it might not have any segments +;;; so we have to check back until we find a word with a segment in it + (cond + ((null word) + nil) ;; there are no segs (don't think this can happen) + (t + (let ((lsyl (item.relation.daughtern word 'SylStructure))) + (if lsyl + (item.relation.daughtern lsyl 'SylStructure) + (find_last_seg (item.relation.prev word 'Word))))))) + +(define (Unisyn_Pauses utt) + "(Unisyn_Pauses UTT) +Predict pause insertion in a Unisyn utterance structure." + (let ((words (utt.relation.items utt 'Word)) lastword tpname) + (if words + (begin + (us_insert_initial_pause utt) ;; always have a start pause + (set! lastword (car (last words))) + (mapcar + (lambda (w) + (let ((pbreak (item.feat w "pbreak")) + (emph (item.feat w "R:Token.parent.EMPH"))) + (cond + ((or (string-equal "B" pbreak) + (string-equal "BB" pbreak)) + (us_insert_pause utt w)) +; ((string-equal emph "1") +; (us_insert_pause utt w)) + ((equal? w lastword) + (us_insert_pause utt w))))) + words) + ;; The embarrassing bit. Remove any words labelled as punc or fpunc + (mapcar + (lambda (w) + (let ((pos (item.feat w "pos"))) + (if (or (string-equal "punc" pos) + (string-equal "fpunc" pos)) + (let ((pbreak (item.feat w "pbreak")) + (wp (item.relation w 'Phrase))) + (if (and (string-matches pbreak "BB?") + (item.relation.prev w 'Word)) + (item.set_feat + (item.relation.prev w 'Word) "pbreak" pbreak)) + (item.relation.remove w 'Word) + ;; can't refer to w as we've just deleted it + (item.relation.remove wp 'Phrase))))) + words))) + utt)) + +(define (us_insert_pause utt word) +"(us_insert_pause UTT WORDITEM) +Insert a silence segment after the last segment in WORDITEM in UTT." + (let ((lastseg (us_find_last_seg word)) + (silence "pau")) + (if lastseg + (item.relation.insert + lastseg 'Segment (list silence) 'after)))) + +(define (us_insert_initial_pause utt) +"(us_insert_initial_pause UTT) +Always have an initial silence if the utterance is non-empty. +Insert a silence segment after the last segment in WORDITEM in UTT." + (let ((firstseg (utt.relation.first utt 'Segment)) + (silence "pau")) + (if firstseg + (item.relation.insert + firstseg 'Segment (list silence) 'before)))) + +(define (us_find_last_seg word) +;;; Find the segment that is immediately at this end of this word +;;; If this word is punctuation it might not have any segments +;;; so we have to check back until we find a word with a segment in it + (cond + ((null word) + nil) ;; there are no segs (don't think this can happen) + (t + (if (item.daughtern_to (item.relation word 'WordStructure) 'Syllable) + (item.daughtern_to + (item.relation + (item.daughtern_to (item.relation word 'WordStructure) 'Syllable) + 'SylStructure) + 'Segment) + (us_find_last_seg (item.relation.prev word 'Word)))))) + +(define (Pause_optional_deleting_B_X utt) +"(Pause_optional_deleting_B_X utt) + +Delete all phone symbols starting with 'B_' from the segemt relation +(a B_150 e.g. is a 150ms pause) if symbol 'Pause_delete_B_X is defined. +" +; The B_X never occur in the phone segmentation but are predicted by +; some pause methods, in particular the default I used to produce the +; .utt files for the 2009 test sentences for the Blizzard challange. +; Some participants complained about them and I had to fix it quickly. + (if (symbol-bound? 'Pause_delete_B_X) + (let(seg ) + (set! seg (item.next(utt.relation.first utt 'Segment))) + (while seg + (set! next_seg (item.next seg)) + ;(format t "segment %l\n" (item.name seg)) + (if(string-matches (item.name seg) "B_[0-9]*") + (item.delete seg)) + (set! seg next_seg))))) + +(provide 'pauses) diff --git a/lib/phoneset.scm b/lib/phoneset.scm new file mode 100644 index 0000000..19d9b84 --- /dev/null +++ b/lib/phoneset.scm @@ -0,0 +1,134 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1999 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Author: Alan W Black +;;; Date: April 1999 +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Support code for phone set definitions +;;; + +(defmac (defPhoneSet form) + (list 'defPhoneSet_real + (list 'quote (cadr form)) + (list 'quote (car (cddr form))) + (list 'quote (cadr (cddr form))))) + +(define (defPhoneSet_real name featdefs phones) + "(defPhoneSet NAME FEATTYPES PHONES) +Define a phone set with given name, feature types and +list of phones. This also selects name as the current phoneset." + (let (info) + (if (not (eq? 'Features (car featdefs))) + (begin + ;; Old format that has the same number of phone features for + ;; all phones + (set! info + (mapcar + (lambda (ph) + (let ((fvs + (mapcar + list + (mapcar car featdefs) + (cdr ph)))) + (ps_check_fvals + (cons (car ph) (cons (list 'type t) fvs)) + (cons t fvs)) + (list (car ph) fvs))) + phones))) + ;; else + ;; New format where types are specified so phones may have + ;; different features + (set! info + (mapcar + (lambda (ph) + (let ((fvs + (cons + (list 'type (cadr ph)) + (mapcar + list + (mapcar car (cdr (assoc (cadr ph) (cdr featdefs)))) + (cddr ph))))) + (ps_check_fvals + (cons (car ph) fvs) + (assoc (cadr ph) (cdr featdefs))) + (list (car ph) fvs))) + (cdr phones)))) + (Param.set + (string-append "phonesets." name) + info) + (PhoneSet.select name) + (list name info))) + +(define (ps_check_fvals fvs featdefs) + "(ps_check_fvals fvs featdefs) +Check that feature values in a phone definition are in the defined +set of possibles." + (mapcar + (lambda (fp) + (let ((def (cdr (assoc (car fp) (cdr featdefs))))) + (cond + ((not def) + (error "Phoneset definition: phone has no defined type" fvs)) + ((not (member_string (car (cdr fp)) def)) + (error + (format nil "Phoneset definition: phone feature %l is undefined" fp) fvs))))) + (cdr (cdr fvs)))) + +(define (PhoneSet.select name) + "(PhoneSet.select name) +Select named phonset as current." + (if (feats.present Param (string-append "phonesets." name)) + (Param.set "phoneset" (Param.get (string-append "phonesets." name))) + (error "no phoneset defined: " name))) + +(define (PhoneSet.description name) + "(PhoneSet.description) +Return (lisp) representation of current phoneset." + (feats.tolisp (Param.get "phoneset"))) + +(define (PhoneSet.list) + "(PhoneSet.list) +List of the names of the currently defined phonesets." + ;; This isn't a particularly efficient way to get the answer + (mapcar car (feats.tolisp (Param.get "phonesets")))) + +(define (PhoneSet.silences sils) + "(PhoneSet.silences SILLIST) +Define the silence phones for the currently selected phoneset." + (Param.set "phoneset.silences" sils)) + +(provide 'phoneset) + + + + diff --git a/lib/phrase.scm b/lib/phrase.scm new file mode 100644 index 0000000..bbabba6 --- /dev/null +++ b/lib/phrase.scm @@ -0,0 +1,171 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Phrase boundary prediction. +;;; +;;; Two methods supported, if POS is enabled we use ngrams for that +;;; otherwise we use a CART tree +;;; +;;; Models trained from the IBM/Lancaster Spoken English Corpus and +;;; Boston University's FM Radio Corpus. + +;;; +;;; Here's a very simple CART tree for predicting phrase breaks +;;; based on punctuation only +;;; +(set! simple_phrase_cart_tree +' +((lisp_token_end_punc in ("?" "." ":")) + ((BB)) + ((lisp_token_end_punc in ("'" "\"" "," ";")) + ((B)) + ((n.name is 0) ;; end of utterance + ((BB)) + ((NB)))))) + +(define (token_end_punc word) + "(token_end_punc UTT WORD) +If punctuation at end of related Token and if WORD is last word +in Token return punc, otherwise 0." + (if (item.relation.next word "Token") + "0" + (item.feat word "R:Token.parent.punc"))) + +;;; This is a simple CART tree used after boundaries are predicted +;;; by the probabilistic method to get two levels of break +(set! english_phrase_type_tree +'((pbreak is NB) + ((num_break is 1) + ((mB)) + ((R:Token.parent.EMPH is 1) + ((NB)) + ((n.R:Token.parent.EMPH is 1) + ((NB)) + ((NB))))) + ((pbreak is BB) + ((BB)) + ((pbreak is mB) + ((mB)) + ((name in ("." "!" "?"));; only (potentially) change Bs to BBs + ((BB)) + ((B))))))) + +(set! f2b_phrase_cart_tree +' +((gpos is punc) + (((1 0.00238095) (3 0) (4 0.997619) B)) + (((4 0.00238095) (3 0) (1 0.997619) NB)))) + +;;; For more detailed prediction of phrase breaks we use POS and +;;; probability distribution of breaks +;;; These models were trained using data from the Lancaster/IBM +;;; Spoken English Corpus + +(require 'pos) ;; for part of speech map + +(defvar pbreak_ngram_dir libdir + "pbreak_ngram_dir + The directory containing the ngram models for predicting phrase + breaks. By default this is the standard library directory.") + +(defvar english_phr_break_params + (list + ;; The name and filename off the ngram with the a priori ngram model + ;; for predicting phrase breaks in the Phrasify module. This model should + ;; predict probability distributions for B and NB given some context of + ;; part of speech tags. + (list 'pos_ngram_name 'english_break_pos_ngram) + (list 'pos_ngram_filename + (path-append pbreak_ngram_dir "sec.ts20.quad.ngrambin")) + ;; The name and filename of the ngram containing the a posteriori ngram + ;; for predicting phrase breaks in the Phrasify module. This module should + ;; predict probability distributions for B and NB given previous B and + ;; NBs. + (list 'break_ngram_name 'english_break_ngram) + (list 'break_ngram_filename + (path-append pbreak_ngram_dir "sec.B.hept.ngrambin")) + ;; A weighting factor for breaks in the break/non-break ngram. + (list 'gram_scale_s 0.59) + ;; When Phrase_Method is prob_models, this tree, if set is used to + ;; potentially predict phrase type. At least some prob_models only + ;; predict B or NB, this tree may be used to change some Bs into + ;; BBs. If it is nil, the pbreak value predicted by prob_models + ;; remains the same. + (list 'phrase_type_tree english_phrase_type_tree) + ;; A list of tags used in identifying breaks. Typically B and NB (and + ;; BB). This should be the alphabet of the ngram identified in + ;; break_ngram_name + (list 'break_tags '(B NB)) + (list 'pos_map english_pos_map_wp39_to_wp20) + ) + "english_phr_break_params +Parameters for English phrase break statistical model.") + +(defvar phr_break_params nil + "phr_break_params +Parameters for phrase break statistical model. This is typcal set by +a voice selection function to the parameters for a particular model.") + +;;; +;;; Declaration of some features +;;; + +(def_feature_docstring + 'Word.pbreak + "Word.pbreak + Result from statistical phrasing module, may be B or NB denoting + phrase break or non-phrase break after the word.") + +(def_feature_docstring + 'Word.pbreak_score + "Word.pbreak_score + Log likelihood score from statistical phrasing module, for pbreak + value.") + +(def_feature_docstring + 'Word.blevel + "Word.blevel + A crude translation of phrase break into ToBI like phrase level. + Values may be 0,1,2,3,4.") + +(define (Phrasify utt) +"(Phrasify utt) +Construct phrasify over Words module." + (let ((rval (apply_method 'Phrasify_Method utt))) + (cond + (rval rval) ;; new style + (t + (Classic_Phrasify utt))))) + + +(provide 'phrase) diff --git a/lib/pos.scm b/lib/pos.scm new file mode 100644 index 0000000..2b678ef --- /dev/null +++ b/lib/pos.scm @@ -0,0 +1,225 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; A part of speech tagger +;;; + +(set! english_guess_pos + '((in of for in on that with by at from as if that against about + before because if under after over into while without + through new between among until per up down) + (to to) + (det the a an no some this that each another those every all any + these both neither no many) + (md will may would can could should must ought might) + (cc and but or plus yet nor) + (wp who what where how when) + (pps her his their its our their its mine) + (aux is am are was were has have had be) + (punc "." "," ":" ";" "\"" "'" "(" "?" ")" "!") + )) + +(defvar guess_pos english_guess_pos + "guess_pos + An assoc-list of simple part of speech tag to list of words in that + class. This basically only contains closed class words all other + words may be assumed to be content words. This was built from information + in the f2b database and is used by the ffeature gpos.") + +;;; A more elaborate part of speech tagger using ngrams works but +;;; at present requires a large list of a priori probabilities +;;; to work. If that file exists on your system we'll use it otherwise +;;; POS is guessed by the lexicon + +;;; These models were build from the Penn TreeBank, WSJ corpus + +(defvar pos_model_dir lexdir + "pos_model_dir + The directory contains the various models for the POS module. By + default this is the same directory as lexdir. The directory should + contain two models: a part of speech lexicon with reverse log probabilities + and an ngram model for the same part of speech tag set.") + +(defvar pos_p_start_tag "punc" + "pos_p_start_tag + This variable's value is the tag most likely to appear before + the start of a sentence. It is used when looking for pos context + before an utterance. Typically it should be some type of punctuation + tag.") + +(defvar pos_pp_start_tag "n" + "pos_pp_start_tag + This variable's value is the tag most likely to appear before + pos_p_start_tag and any position preceding that. It is typically + some type of noun tag. This is used to provide pos context for + early words in an utterance.") + +(defvar pos_supported nil + "pos_supported + If set to non-nil use part of speech prediction, if nil just get + pos information from the lexicon.") + +(defvar pos_ngram_name nil + "pos_ngram_name + The name of a loaded ngram containing the a posteriori ngram model for + predicting part of speech. The a priori model is held as a + lexicon call poslex.") + +(defvar pos_map nil + "pos_map + If set this should be a reverse assoc-list mapping on part of speech + tag set to another. It is used after using the defined POS models to + map the pos feature on each word to a new tagset.") + +;;; +;;; All the names here don't really allow multiple versions +;;; they should be prefixed with english_ +;;; + +(if (probe_file (path-append pos_model_dir "wsj.wp39.poslexR")) + (begin + (lex.create "english_poslex") + (lex.set.compile.file + (path-append pos_model_dir "wsj.wp39.poslexR")) + (lex.set.phoneset "mrpa") + (lex.set.lts.method nil) + (set! pos_lex_name "english_poslex") + (set! pos_p_start_tag "punc") + (set! pos_pp_start_tag "nn") + ;; wp39 + (lex.add.entry '("_OOV_" ((nnp -2.9144) (jj -2.7357) (nn -3.5787) + (nns -3.4933) (vbn -3.2486) (vbg -2.9419) + (vb -3.5471) (vbd -3.7896) (vbz -3.7820) + (rb -4.1940) (vbp -3.2755) (nnps -2.1605)) + ())) + (lex.add.entry '("_number_" + ((cd -0.35202) (jj -4.1083) (nns -6.4488) (nnp -7.3595)) + () )) + (lex.add.entry '("," ((punc -0.88488)) () )) + (lex.add.entry '("." ((punc -1.1104)) () )) + (lex.add.entry '(":" ((punc -4.4236)) () )) + (lex.add.entry '("``" ((punc -2.7867)) () )) + (lex.add.entry '("`" ((punc -2.7867)) () )) + (lex.add.entry '("'" ((punc -2.7867)) () )) + (lex.add.entry '("\"" ((punc -2.7867)) () )) + ;; wp17 +;; (lex.add.entry '("_OOV_" ((n -3.4109) (j -2.7892) (v -3.7426)) ())) +; (lex.add.entry '("_OOV_" ((n -1.968) (j -2.351) (v -2.287)) ())) +; (lex.add.entry '("_number_" ((j -0.35202)) ())) +; (lex.add.entry '("," ((punc -0.88359)) () )) +; (lex.add.entry '("." ((punc -1.1101)) () )) +; (lex.add.entry '(":" ((punc -4.4236)) () )) +; (lex.add.entry '("``" ((punc -2.7867)) () )) +; (lex.add.entry '("`" ((punc -2.7867)) () )) +; (lex.add.entry '("'" ((punc -2.7867)) () )) +; (lex.add.entry '("\"" ((punc -2.7867)) () )) + ;; wp22 +; (lex.add.entry '("_OOV_" ((n -3.4109) (j -2.7892) (v -3.7426)) ())) +; (lex.add.entry '("_number_" ((cd -0.35202) (j -4.1908) (n -7.3890)) ())) +; (lex.add.entry '("," ((punc -0.88359)) () )) +; (lex.add.entry '("." ((punc -1.1101)) () )) +; (lex.add.entry '(":" ((punc -4.4236)) () )) +; (lex.add.entry '("``" ((punc -2.7867)) () )) + ;; wp18 +; (lex.add.entry '("_OOV_" ((n -3.4109) (j -2.7892) (v -3.7426)) ())) +; (lex.add.entry '("_number_" ((j -0.35202)) ())) +; (lex.add.entry '("`" ((punc -6.539) ) () )) +; (lex.add.entry '("``" ((punc -2.399) ) () )) +; (lex.add.entry '("," ((punc -0.480) ) () )) +; (lex.add.entry '("." ((fpunc -0.012) ) () )) +; (lex.add.entry '(":" ((punc -4.100) ) () )) + + (ngram.load 'english_pos_ngram + (path-append pos_model_dir "wsj.wp39.tri.ngrambin")) +; (ngram.load 'english_pos_ngram +; (path-append pos_model_dir "wsj.wp45.tri.ngram")) + (set! pos_supported t) + ) + (set! pos_supported nil)) + +(setq english_pos_map_wp39_to_wp20 + '( + (( vbd vb vbn vbz vbp vbg ) v) + (( nn nnp nns nnps fw sym ls ) n) + (( dt ) dt) + (( punc fpunc ) punc) + (( in ) in) + (( jj jjr jjs 1 2 ) j) + (( prp ) prp) + (( rb rp rbr rbs ) r) + (( cc ) cc) + (( of ) of) + (( to ) to) + (( cd ) cd) + (( md ) md) + (( pos ) pos) + (( wdt ) wdt) + (( wp ) wp) + (( wrb ) wrb) + (( ex ) ex) + (( uh ) uh) + (( pdt ) pdt) + )) + +(defvar pos_map nil + "pos_map +A reverse assoc list of predicted pos tags to some other tag set. Note +using this changes the pos tag loosing the actual predicted value. Rather +than map here you may find it more appropriate to map tags sets locally +in the modules that use them (e.g. phrasing and lexicons).") + +;;(setq pos_map_remap +;; '( +;; (( fpunc ) punc) +;; (( of ) in))) + +(def_feature_docstring 'Word.pos + "Word.pos + Part of speech tag value returned by the POS tagger module.") + +(def_feature_docstring 'Word.pos_score + "Word.pos_score + Part of speech tag log likelihood from Viterbi search.") + +(define (POS utt) +"(POS utt) +Apply part of speech tagging (and possible parsing too) to Word +relation." + (let ((rval (apply_method 'POS_Method utt))) + (cond + (rval rval) ;; new style + (t + (Classic_POS utt))))) + + +(provide 'pos) diff --git a/lib/postlex.scm b/lib/postlex.scm new file mode 100644 index 0000000..7fb038b --- /dev/null +++ b/lib/postlex.scm @@ -0,0 +1,587 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Postlexical rules +;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Modifed for CSTR HTS Voice Library ;; +;; Author : Junichi Yamagishi (jyamagis@inf.ed.ac.uk) ;; +;; Date : Sept 2008 ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + + +(define (PostLex utt) +"(PostLex utt) +Apply post lexical rules to segment stream. These may be almost +arbitrary rules as specified by the particular voice, through the +postlex_hooks variable. A number of standard post lexical rule +sets are provided including reduction, posessives etc. These +rules are also used to mark standard segments with their cluster +information used in creating diphone names." +(let ((rval (apply_method 'PostLex_Method utt))) + (cond + (rval rval) ;; new style + (t ;; should only really need this one + (apply_hooks postlex_rules_hooks utt))) + utt +)) + +(define (Classic_PostLex utt) + "(Classic_PostLex utt) +Apply post lexical rules (both builtin and those specified in +postlex_rules_hooks)." + (Builtin_PostLex utt) ;; haven't translated all the rules yet + (apply_hooks postlex_rules_hooks utt) + utt +) + +(defvar postlex_rules_hooks nil +"postlex_rules_hooks +A function or list of functions which encode post lexical rules. +This will be voice specific, though some rules will be shared across +languages.") + +;;; Mapping of full vowels to reduced vowels, this should be part +;;; of the phoneset definitions +(defvar postlex_vowel_reduce_table + '((mrpa + ((uh @) (i @) (a @) (e @) (u @) (o @) (oo @))) + (radio + ((ah ax el en em) + (ih ax) +; (er axr ax) +; (iy ih) +; (ey ax) + (aa ax) + (ae ax) + (eh ax)))) +"postlex_vowel_reduce_table +Mapping of vowels to their reduced form. This in an assoc list of +phoneset name to an assoc list of full vowel to reduced form.") + +(defvar postlex_vowel_reduce_cart_tree nil +"postlex_vowel_reduce_cart_tree +CART tree for vowel reduction.") + +(defvar postlex_vowel_reduce_cart_tree_hand + '((stress is 0) + ((p.syl_break < 2) + ((syl_break < 2) + ((1)) + ((0))) + ((0))) + ((0))) +"postlex_vowel_reduce_cart_tree_hand +A CART tree for vowel reduction. This is hand-written.") + +(defvar postlex_vowel_reduce_cart_data +' +((R:SylStructure.parent.gpos is cc) + (((0 0.993548) (1 0.00645161) 0)) + ((p.R:SylStructure.parent.gpos is md) + (((0 0.903226) (1 0.0967742) 0)) + ((p.R:SylStructure.parent.gpos is det) + ((n.R:SylStructure.parent.gpos is content) + ((last_accent < 2.5) + ((next_accent < 2.5) + ((next_accent < 1.2) + ((n.syl_break is 4) + (((0 0.967213) (1 0.0327869) 0)) + ((syl_break is 4) + (((0 0.952381) (1 0.047619) 0)) + ((n.syl_break is 4) + (((0 0.953488) (1 0.0465116) 0)) + ((position_type is single) + (((0 0.947368) (1 0.0526316) 0)) + ((accented is 0) + ((n.accented is 0) + (((0 0.857143) (1 0.142857) 0)) + (((0 0.415385) (1 0.584615) 1))) + (((0 0.974359) (1 0.025641) 0))))))) + (((0 0.968254) (1 0.031746) 0))) + (((0 0.969697) (1 0.030303) 0))) + (((0 0.976744) (1 0.0232558) 0))) + (((0 0.990291) (1 0.00970874) 0))) + ((next_accent < 108.5) + ((p.R:SylStructure.parent.gpos is pps) + (((0 0.828947) (1 0.171053) 0)) + ((R:SylStructure.parent.gpos is det) + ((accented is 0) + (((0 0.0599572) (1 0.940043) 1)) + (((0 0.949367) (1 0.0506329) 0))) + ((p.R:SylStructure.parent.gpos is cc) + (((0 0.880952) (1 0.119048) 0)) + ((p.R:SylStructure.parent.gpos is wp) + (((0 0.875) (1 0.125) 0)) + ((p.R:SylStructure.parent.gpos is in) + ((n.syl_break is 4) + (((0 0.961538) (1 0.0384615) 0)) + ((next_accent < 2.5) + ((syl_break is 4) + (((0 0.95122) (1 0.0487805) 0)) + ((next_accent < 1.2) + ((accented is 0) + ((n.stress is 0) + (((0 0.788462) (1 0.211538) 0)) + ((R:SylStructure.parent.R:Word.p.gpos is content) + (((0 0.863636) (1 0.136364) 0)) + ((position_type is single) + (((0 0.729167) (1 0.270833) 0)) + (((0 0.4) (1 0.6) 1))))) + (((0 0.983871) (1 0.016129) 0))) + (((0 0.96) (1 0.04) 0)))) + (((0 0.963636) (1 0.0363636) 0)))) + ((position_type is single) + ((syl_break is 4) + (((0 0.993865) (1 0.00613497) 0)) + ((p.R:SylStructure.parent.gpos is to) + (((0 0.984375) (1 0.015625) 0)) + ((syl_break is 1) + ((accented is 0) + ((n.R:SylStructure.parent.gpos is in) + (((0 0.869565) (1 0.130435) 0)) + ((R:SylStructure.parent.gpos is content) + (((0 0.861789) (1 0.138211) 0)) + ((p.R:SylStructure.parent.gpos is content) + ((p.syl_break is 4) + (((0 0.858065) (1 0.141935) 0)) + ((R:SylStructure.parent.gpos is in) + ((p.syl_break is 1) + ((n.R:SylStructure.parent.gpos is det) + (((0 0.659574) (1 0.340426) 0)) + ((p.stress is 0) + (((0 0.422222) (1 0.577778) 1)) + (((0 0.582278) (1 0.417722) 0)))) + ((n.accented is 0) + ((n.R:SylStructure.parent.gpos is content) + (((0 0.65) (1 0.35) 0)) + ((p.stress is 0) + (((0 0.464286) (1 0.535714) 1)) + (((0 0.538462) (1 0.461538) 0)))) + (((0 0.803279) (1 0.196721) 0)))) + ((n.R:SylStructure.parent.gpos is det) + (((0 0.952381) (1 0.047619) 0)) + ((n.syl_break is 4) + (((0 0.833333) (1 0.166667) 0)) + ((p.stress is 0) + ((p.syl_break is 1) + ((n.syl_break is 1) + (((0 0.740741) (1 0.259259) 0)) + ((R:SylStructure.parent.gpos is aux) + (((0 0.478261) (1 0.521739) 1)) + (((0 0.769231) (1 0.230769) 0)))) + (((0 0.755556) (1 0.244444) 0))) + (((0 0.797619) (1 0.202381) 0))))))) + (((0 0.870968) (1 0.129032) 0))))) + (((0 0.983806) (1 0.0161943) 0))) + (((0 0.977778) (1 0.0222222) 0))))) + ((next_accent < 21.6) + ((p.stress is 0) + ((R:SylStructure.parent.R:Word.p.gpos is md) + (((0 0.961538) (1 0.0384615) 0)) + ((position_type is mid) + (((0 0.977612) (1 0.0223881) 0)) + ((n.R:SylStructure.parent.gpos is det) + (((0 0.916667) (1 0.0833333) 0)) + ((R:SylStructure.parent.R:Word.n.gpos is 0) + (((0 0.915493) (1 0.084507) 0)) + ((R:SylStructure.parent.R:Word.n.gpos is pps) + (((0 0.884615) (1 0.115385) 0)) + ((n.stress is 0) + ((n.syl_break is 4) + (((0 0.986755) (1 0.013245) 0)) + ((p.syl_break is 4) + (((0 0.977011) (1 0.0229885) 0)) + ((n.syl_break is 4) + (((0 0.965517) (1 0.0344828) 0)) + ((last_accent < 1.2) + ((last_accent < 0.1) + (((0 0.910448) (1 0.0895522) 0)) + ((next_accent < 1.2) + ((R:SylStructure.parent.R:Word.n.gpos is in) + (((0 0.82) (1 0.18) 0)) + ((n.syl_break is 0) + ((R:SylStructure.parent.R:Word.p.gpos is content) + (((0 0.819672) (1 0.180328) 0)) + (((0 0.444444) (1 0.555556) 1))) + (((0 0.785714) (1 0.214286) 0)))) + (((0 0.836364) (1 0.163636) 0)))) + (((0 0.962025) (1 0.0379747) 0)))))) + ((stress is 0) + ((n.syl_break is 4) + (((0 0.21875) (1 0.78125) 1)) + ((R:SylStructure.parent.R:Word.p.gpos is aux) + (((0 0.259259) (1 0.740741) 1)) + ((p.syl_break is 1) + (((0 0.243094) (1 0.756906) 1)) + ((R:SylStructure.parent.R:Word.p.gpos is det) + (((0 0.290323) (1 0.709677) 1)) + ((R:SylStructure.parent.R:Word.p.gpos is in) + (((0 0.3) (1 0.7) 1)) + ((syl_break is 1) + (((0 0.289157) (1 0.710843) 1)) + ((p.syl_break is 4) + (((0 0.352941) (1 0.647059) 1)) + ((n.syl_break is 0) + (((0 0.311475) (1 0.688525) 1)) + ((syl_break is 4) + (((0 0.4) (1 0.6) 1)) + (((0 0.581395) (1 0.418605) 0))))))))))) + (((0 1) (1 0) 0))))))))) + ((stress is 0) + ((R:SylStructure.parent.R:Word.n.gpos is 0) + (((0 0.121212) (1 0.878788) 1)) + ((next_accent < 2.4) + ((R:SylStructure.parent.gpos is content) + ((position_type is mid) + (((0 0.176895) (1 0.823105) 1)) + ((p.syl_break is 1) + (((0 0.229167) (1 0.770833) 1)) + ((syl_break is 4) + (((0 0.242775) (1 0.757225) 1)) + ((p.syl_break is 0) + ((n.R:SylStructure.parent.gpos is in) + (((0 0.253521) (1 0.746479) 1)) + ((R:SylStructure.parent.R:Word.p.gpos is in) + (((0 0.262774) (1 0.737226) 1)) + ((last_accent < 2.1) + ((n.R:SylStructure.parent.gpos is aux) + (((0 0.304348) (1 0.695652) 1)) + ((next_accent < 1.2) + ((n.R:SylStructure.parent.gpos is cc) + (((0 0.291667) (1 0.708333) 1)) + ((syl_break is 1) + ((n.syl_break is 4) + (((0 0.344828) (1 0.655172) 1)) + ((R:SylStructure.parent.R:Word.p.gpos is det) + (((0 0.364706) (1 0.635294) 1)) + ((n.syl_break is 4) + (((0 0.384615) (1 0.615385) 1)) + ((last_accent < 1.2) + ((p.accented is 0) + (((0 0.584906) (1 0.415094) 0)) + ((n.accented is 0) + ((R:SylStructure.parent.R:Word.p.gpos is content) + (((0 0.41) (1 0.59) 1)) + (((0 0.6) (1 0.4) 0))) + (((0 0.333333) (1 0.666667) 1)))) + (((0 0.380952) (1 0.619048) 1)))))) + ((p.accented is 0) + (((0 0.183673) (1 0.816327) 1)) + ((n.R:SylStructure.parent.gpos is content) + ((n.stress is 0) + (((0 0.295455) (1 0.704545) 1)) + ((R:SylStructure.parent.R:Word.p.gpos is content) + ((n.syl_break is 1) + (((0 0.5) (1 0.5) 0)) + (((0 0.40625) (1 0.59375) 1))) + (((0 0.333333) (1 0.666667) 1)))) + (((0 0.2) (1 0.8) 1)))))) + (((0 0.3) (1 0.7) 1)))) + (((0 0.302326) (1 0.697674) 1))))) + (((0 0.25) (1 0.75) 1)))))) + (((0 0.173913) (1 0.826087) 1))) + (((0 0.166667) (1 0.833333) 1)))) + (((0 1) (1 0) 0)))) + (((0 0.2) (1 0.8) 1))))))))) + (((0 0.15) (1 0.85) 1))))))) + +(defvar postlex_mrpa_r_cart_tree +'((name is r) + ((R:Segment.n.ph_vc is -) + ((delete)) + ((nil))) + ((nil))) +"postlex_mrpa_r_cart_tree +For remove final R when not between vowels.") + + +;; Changed this to actually work... (Rob 09/12/04) +;; Changed this to delete the syllable when schwa is unneccesary (awb 19/07/04) +(define (postlex_apos_s_check utt) + "(postlex_apos_s_check UTT) +Deal with possesive s for English (American and British). Delete +schwa of 's if previous is not an alveolar or palatal fricative or affricative, and +change voiced to unvoiced s if previous is not voiced." + (mapcar + (lambda (syl) + ; word is 's + (if (string-equal "'s" (item.feat + syl "R:SylStructure.parent.name")) + (begin + ;; de-voice if last phone of previous word is unvoiced + (if (string-equal + "-" + (item.feat syl "p.R:SylStructure.daughtern.ph_cvox")) + (item.set_name + (item.relation.daughtern syl 'SylStructure) + "s")) ;; change it from "z" to "s" + ; if the previous seg is a aveolar or palatal, + ; fricative or affricate don't delete schwa otherwise delete it + (if (and + (member_string + (item.feat syl "p.R:SylStructure.daughtern.ph_ctype") '(f a)) + (member_string + (item.feat syl "p.R:SylStructure.daughtern.ph_cplace") '(a p))) + (begin + t) + (begin + ;; delete the schwa + (item.delete (item.relation.daughter1 syl 'SylStructure)) + ;; attach orphaned s/z to previous word + (item.relation.append_daughter + (item.prev syl) + 'SylStructure + (item.relation.daughtern syl 'SylStructure)) + ;; delete the now empty syllable + (item.delete syl)))))) + ;; never happens to if 's is first in an utterance + (cdr (utt.relation.items utt 'Syllable))) + utt) + +;; Changed this to work the other way round, too. Volker 10/08/06 +(define (postlex_the_vs_thee utt) +"(postlex_the_vs_thee utt) +Unnreduce the schwa in \"the\" when a vowel follows. +Reduce the vowel in \"the\" when no vowel follows (this +requires a lexicon entry for \"the\" with feature \"reduced\", +otherwise there will be no reduction)." +(let ((fullform (cadr (car (caar (cdr (cdar (lex.lookup_all 'thee))))))) + (reducedform (cadr(car(caar(cddr(lex.lookup 'the '(reduced))))))) + seg) + + (mapcar + (lambda (word) + (if (string-equal "the" (downcase (item.feat word "name"))) + (begin + (set! seg (item.relation (item.daughtern (item.relation.daughtern word 'SylStructure)) 'Segment)) + (if (string-equal "+" (item.feat (item.next seg) 'ph_vc)) + (item.set_feat seg 'name fullform) + (item.set_feat seg 'name reducedform))))) + (utt.relation.items utt 'Word))) +utt) + +(define (postlex_the_vs_thee_changeflag utt) +"(postlex_the_vs_thee_changeflag utt) +Unnreduce the schwa in \"the\" when a vowel follows. +Reduce the vowel in \"the\" when no vowel follows (this +requires a lexicon entry for \"the\" with feature \"reduced\", +otherwise there will be no reduction)." +(let ((fullform (cadr (car (caar (cdr (cdar (lex.lookup_all 'thee))))))) + (reducedform (cadr(car(caar(cddr(lex.lookup 'the '(reduced))))))) + seg) + + (mapcar + (lambda (word) + (if (string-equal "the" (downcase (item.feat word "name"))) + (begin + (set! seg (item.relation (item.daughtern (item.relation.daughtern word 'SylStructure)) 'Segment)) + (if (string-equal "+" (item.feat (item.next seg) 'ph_vc)) + (item.set_feat seg 'reducable 0) + (item.set_feat seg 'reducable 1))))) + (utt.relation.items utt 'Word))) +utt) + + +;; For Multisyn voices only. Volker 14/08/06 +(define (postlex_a utt) +"(postlex_a utt) +If POS of \"a\" is \"nn\" and segment feature \"reducable\", set it to 0. +This is a bugfix, but still requires the target cost function to add a +penalty if a candidate is reducable but the target is not. expro_target_cost +does that." +(let(seg) + (mapcar + (lambda(word) +;; (format t "%s\t%s\n" (item.feat word 'name)(item.feat word 'pos)) + (if(and(string-equal "a" (downcase (item.feat word "name"))) + (string-equal "nn" (item.feat word "pos"))) + (begin + (set! seg (item.relation (item.daughtern (item.relation.daughtern word +'SylStructure)) 'Segment)) +;; (format t "should not be reducable\n") + (if (eq 1 (parse-number (item.feat seg 'reducable))) + (item.set_feat seg 'reducable 0)))) + ) + (utt.relation.items utt 'Word))) +utt) + + + +(define (postlex_unilex_vowel_reduction utt) +"(postlex_unilex_vowel_reduction utt) +Perform vowel reduction based on unilex specification of what can be reduced." +(let () + (mapcar + (lambda (seg) + (if (and (eq? (parse-number (item.feat seg "reducable")) 1) + (not (> (parse-number (item.feat seg "R:SylStructure.parent.stress")) 0))) + (if (not (and (seg_word_final seg) + (string-equal (item.feat (item.next seg) 'ph_vc) "+"))) + (item.set_feat seg "name" (item.feat seg "reducedform"))))) + (utt.relation.items utt 'Segment))) +utt) + + + + +(define (seg_word_final seg) +"(seg_word_final seg) +Is this segment word final?" + (let ((this_seg_word (item.parent (item.relation.parent seg 'SylStructure))) + (silence (car (cadr (car (PhoneSet.description '(silences)))))) + next_seg_word) + (if (item.next seg) + (set! next_seg_word (item.parent (item.relation.parent (item.next seg) 'SylStructure)))) + (if (or (equal? this_seg_word next_seg_word) + (string-equal (item.feat seg "name") silence)) + nil + t))) + + + +;; imported from postlex_intervoc_r.scm Volker 14/08/06 +(define (postlex_intervoc_r utt) +"(postlex_intervoc_r UTT) + +Remove any word-final /r/ which is phrase-final or not going +to be inter-vocalic i.e. the following words does not start +with a vowel. + +NOTE: in older versions of unilex-rpx.out for Festival, there +is no word-final /r/. + +" +(let (word next_word last_phone following_phone) + (set! word (utt.relation.first utt 'Word)) + + (while word + (set! next_word (item.next word)) + (set! last_phone (item.daughtern + (item.daughtern(item.relation word 'SylStructure)))) + (if next_word + (begin + + (set! following_phone (item.daughter1 + (item.daughter1 + (item.relation next_word 'SylStructure)))) + ; last_phone and following_phone should always be defined at this point, + ; but since the upgrade to Fedora and characters no longer being in ISO + ; but in UTF8, the pound sterling is no longer treated correctly. + ; Probably (Token utt) should be fixed. + + (if (and following_phone last_phone) + (begin + (format t "%s\t%s %s %s %s\n" (item.name word) + (item.name last_phone) + (item.name following_phone) + (item.feat following_phone 'ph_vc) + (item.feat word 'pbreak)) + (if(and(equal? "r" (item.name last_phone)) + (or(not(equal? "NB" (item.feat word 'pbreak))) + (equal? "-" (item.feat following_phone 'ph_vc)))) + (begin + (format t "\t\t\t/r/ in \"%s %s\" deleted\n" + (item.name word)(item.name next_word)) + (item.delete last_phone)))))) + (if(and last_phone (equal? "r" (item.name last_phone))) + (begin + (format t "\t\t\tutterance-final /r/ deleted\n") + (item.delete last_phone))) + ) + + (set! word (item.next word)))) + utt) + + +(define (postlex_stop_deletion utt) +"(postlex_stop_deletion utt) + +Delete any stop or affricative (phone which has a closure) +immediately followed by another stop or affricative. + +Also save the identity of the deleted phone for the +context cost functions. Consider: + +backtrack /b a k t r a k/ -> /b a t r a k/ +(actually Jenny reduces : /b a k_cl k t_cl t r a k/ -> /b a k_cl t r a k/) +If we then look for a diphone /a t/ we want to favour +candidates coming from the same context i.e. which +are actually a reduced /a k t/. In the data base, +the 1st /a/ gets the feature right_context=k and the +/t/ gets the fearture left_context=k. + +" +(let(seg next_seg prev_seg) + (set! seg (utt.relation.first utt 'Segment)) + (while seg + (set! prev_seg (item.prev seg)) + (if prev_seg + (begin + ;(format t "%s %s %s\n" (item.name seg) + ; (item.feat seg 'ph_ctype) + ; (item.feat seg 'p.ph_ctype)) + (if(and(or(equal? "s" (item.feat seg 'ph_ctype)) + (equal? "a" (item.feat seg 'ph_ctype))) + (or(equal? "s" (item.feat seg 'p.ph_ctype)) + (equal? "a" (item.feat seg 'p.ph_ctype))) + ; When there are 3 stops in a row, and after the 1st has been + ; deleted, this prevents the 2nd to be deleted as well: + (equal? 0 (item.feat prev_seg 'left_context))) + (begin + (set! prev_prev_seg (item.prev prev_seg)) + (format t "postlex_stop_deletion: %s in %s\n" + (item.name prev_seg) + (item.name(item.parent(item.relation.parent prev_seg + 'SylStructure)))) + (if prev_prev_seg + (begin + ;(format t "setting left_context of %s and right context of %s to %s\n" + ; (item.name seg) + ; (item.name prev_prev_seg) + ; (item.name prev_seg)) + (item.set_feat seg 'left_context (item.name prev_seg)) + (item.set_feat prev_prev_seg 'right_context (item.name prev_seg)))) + (if(and(item.next seg) + (equal? (item.name seg) (item.name prev_seg))) + (begin + ;(format t "setting left_context of %s to %s\n" + ; (item.name (item.next seg) + ; (item.name prev_seg)) + + (item.set_feat (item.next seg) 'left_context (item.name prev_seg)))) + (item.delete prev_seg))))) + (set! seg (item.next seg)))) +utt) + +(provide 'postlex) diff --git a/lib/radio_phones.scm b/lib/radio_phones.scm new file mode 100644 index 0000000..7c6b524 --- /dev/null +++ b/lib/radio_phones.scm @@ -0,0 +1,122 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; A definition of the radio phone set used in the BU RADIO FM +;;; corpus, some people call this the darpa set. This one +;;; has the closures removed +;;; + +(defPhoneSet + radio + ;;; Phone Features + (;; vowel or consonant + (vc + -) + ;; vowel length: short long dipthong schwa + (vlng s l d a 0) + ;; vowel height: high mid low + (vheight 1 2 3 0) + ;; vowel frontness: front mid back + (vfront 1 2 3 0) + ;; lip rounding + (vrnd + - 0) + ;; consonant type: stop fricative affricate nasal lateral approximant + (ctype s f a n l r 0) + ;; place of articulation: labial alveolar palatal labio-dental + ;; dental velar glottal + (cplace l a p b d v g 0) + ;; consonant voicing + (cvox + - 0) + ) + ;; Phone set members + ( + ;; Note these features were set by awb so they are wrong !!! + (aa + l 3 3 - 0 0 0) ;; father + (ae + s 3 1 - 0 0 0) ;; fat + (ah + s 2 2 - 0 0 0) ;; but + (ao + l 3 3 + 0 0 0) ;; lawn + (aw + d 3 2 - 0 0 0) ;; how + (ax + a 2 2 - 0 0 0) ;; about + (axr + a 2 2 - r a +) + (ay + d 3 2 - 0 0 0) ;; hide + (b - 0 0 0 0 s l +) + (ch - 0 0 0 0 a p -) + (d - 0 0 0 0 s a +) + (dh - 0 0 0 0 f d +) + (dx - a 0 0 0 s a +) ;; ?? + (eh + s 2 1 - 0 0 0) ;; get + (el + s 0 0 0 l a +) + (em + s 0 0 0 n l +) + (en + s 0 0 0 n a +) + (er + a 2 2 - r 0 0) ;; always followed by r (er-r == axr) + (ey + d 2 1 - 0 0 0) ;; gate + (f - 0 0 0 0 f b -) + (g - 0 0 0 0 s v +) + (hh - 0 0 0 0 f g -) + (hv - 0 0 0 0 f g +) + (ih + s 1 1 - 0 0 0) ;; bit + (iy + l 1 1 - 0 0 0) ;; beet + (jh - 0 0 0 0 a p +) + (k - 0 0 0 0 s v -) + (l - 0 0 0 0 l a +) + (m - 0 0 0 0 n l +) + (n - 0 0 0 0 n a +) + (nx - 0 0 0 0 n d +) ;; ??? + (ng - 0 0 0 0 n v +) + (ow + d 2 3 + 0 0 0) ;; lone + (oy + d 2 3 + 0 0 0) ;; toy + (p - 0 0 0 0 s l -) + (r - 0 0 0 0 r a +) + (s - 0 0 0 0 f a -) + (sh - 0 0 0 0 f p -) + (t - 0 0 0 0 s a -) + (th - 0 0 0 0 f d -) + (uh + s 1 3 + 0 0 0) ;; full + (uw + l 1 3 + 0 0 0) ;; fool + (v - 0 0 0 0 f b +) + (w - 0 0 0 0 r l +) + (y - 0 0 0 0 r p +) + (z - 0 0 0 0 f a +) + (zh - 0 0 0 0 f p +) + (pau - 0 0 0 0 0 0 -) + (h# - 0 0 0 0 0 0 -) + (brth - 0 0 0 0 0 0 -) + ) +) + +(PhoneSet.silences '(pau h# brth)) + +(provide 'radio_phones) + + + + diff --git a/lib/sable-latin.ent b/lib/sable-latin.ent new file mode 100644 index 0000000..f068020 --- /dev/null +++ b/lib/sable-latin.ent @@ -0,0 +1,171 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/lib/sable-mode.scm b/lib/sable-mode.scm new file mode 100644 index 0000000..a11d80c --- /dev/null +++ b/lib/sable-mode.scm @@ -0,0 +1,560 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1998 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Festival (1.3.X) support for SABLE 0.2 the SGML/XML based mark up ;; +;;; language. ;; +;;; ;; +;;; This is XML version requiring Edinburgh's LTG's rxp XML parser as ;; +;;; distributed with Festival ;; +;;; ;; + +(require_module 'rxp) + +;;(set! auto-text-mode-alist +;; (cons +;; (cons "\\.sable$" 'sable) +;; auto-text-mode-alist)) + + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + ;; ;; + ;; Remember where to find these two XML entities. ;; + ;; ;; + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + + +(xml_register_id "-//SABLE//DTD SABLE speech mark up//EN" + (path-append libdir "Sable.v0_2.dtd") + ) + +(xml_register_id "-//SABLE//ENTITIES Added Latin 1 for SABLE//EN" + (path-append libdir "sable-latin.ent") + ) + +;; (print (xml_registered_ids)) + +(defvar SABLE_RXDOUBLE "-?\\(\\([0-9]+\\.[0-9]*\\)\\|\\([0-9]+\\)\\|\\(\\.[0-9]+\\)\\)\\([eE][---+]?[0-9]+\\)?") + +(defvar sable_pitch_base_map + '((highest 1.2) + (high 1.1) + (medium 1.0) + (default 1.0) + (low 0.9) + (lowest 0.8))) +(defvar sable_pitch_med_map + '((highest 1.2) + (high 1.1) + (medium 1.0) + (default 1.0) + (low 0.9) + (lowest 0.8))) +(defvar sable_pitch_range_map + '((largest 1.2) + (large 1.1) + (medium 1.0) + (default 1.0) + (small 0.9) + (smallest 0.8))) +(defvar sable_rate_speed_map + '((fastest 1.5) + (fast 1.2) + (medium 1.0) + (default 1.0) + (slow 0.8) + (slowest 0.6))) +(defvar sable_volume_level_map + '((loudest 2.0) + (loud 1.5) + (default 1.0) + (medium 1.0) + (quiet 0.5))) + +(define (sable_init_globals) + (set! utts nil) + (set! sable_omitted_mode nil) + (set! sable_word_features_stack nil) + (set! sable_pitch_context nil) + (set! sable_vol_context nil) + (set! sable_vol_type 'no_change) + (set! sable_vol_factor 1.0) + (set! sable_current_language 'britishenglish) + (set! sable_unsupported_language nil) + (set! sable_language_stack nil) + (set! sable_current_speaker 'voice_kal_diphone) + (set! sable_speaker_stack nil) +) + +(define (sable_token_to_words token name) + "(sable_token_to_words utt token name) +SABLE mode token specific analysis." + (cond + ((or sable_omitted_mode sable_unsupported_language) + ;; don't say anything (whole utterance) + nil) + ((string-equal "1" (item.feat token "done_sable_sub")) + ;; to catch recursive calls this when splitting up sub expressions + (sable_previous_token_to_words token name)) + ((and (not (string-equal "0" (item.feat token "sable_sub"))) + (string-equal "0" (item.feat token "p.sable_sub"))) + (let (words (sub (item.feat token "sable_sub"))) + (item.set_feat token "done_sable_sub" "1") + (set! words + (apply append + (mapcar + (lambda (w) + (set! www (sable_previous_token_to_words token w)) + www) + (read-from-string sub)))) + (item.set_feat token "done_sable_sub" "0") + words)) + ((string-equal "1" (item.feat token "sable_ignore")) + ;; don't say anything (individual word) + nil) + ((string-equal "1" (item.feat token "sable_ipa")) + ;; Each token is an IPA phone + (item.set_feat token "phonemes" (sable-map-ipa name)) + (list name)) + ((string-equal "1" (item.feat token "sable_literal")) + ;; Only deal with spell here + (let ((subwords) (subword)) + (item.set_feat token "pos" token.letter_pos) + (mapcar + (lambda (letter) + ;; might be symbols or digits + (set! subword (sable_previous_token_to_words token letter)) + (if subwords + (set! subwords (append subwords subword)) + (set! subwords subword))) + (symbolexplode name)) + subwords)) + ((not (string-equal "0" (item.feat token "token_pos"))) + ;; bypass the prediction stage, if English + (if (member_string (Parameter.get 'Language) + '(britishenglish americanenglish)) + (builtin_english_token_to_words token name) + (sable_previous_token_to_words token name))) + ;; could be others here later + (t + (sable_previous_token_to_words token name)))) + +(defvar sable_elements +'( + ("(SABLE" (ATTLIST UTT) + (eval (list sable_current_speaker)) ;; so we know what state we start in + (sable_setup_voice_params) + nil + ) + (")SABLE" (ATTLIST UTT) + (xxml_synth UTT) ;; Synthesis the remaining tokens + nil + ) + ;; Utterance break elements + ("(LANGUAGE" (ATTLIST UTT) + ;; Status: probably complete + (xxml_synth UTT) + (set! sable_language_stack + (cons + (list sable_current_language sable_unsupported_language) + sable_language_stack)) + ;; Select a new language + (let ((language (upcase (car (xxml_attval "ID" ATTLIST))))) + (cond + ((or (string-equal language "SPANISH") + (string-equal language "ES")) + (set! sable_current_language 'spanish) + (set! sable_unsupported_language nil) + (select_language 'spanish)) + ((or (string-equal language "ENGLISH") + (string-equal language "EN")) + (set! sable_current_language 'britishenglish) + (set! sable_unsupported_language nil) + (select_language 'britishenglish)) + (t ;; skip languages you don't know + ;; BUG: if current language isn't English this wont work + (apply_hooks tts_hooks + (eval (list 'Utterance 'Text + (string-append "Some text in " language)))) + (set! sable_unsupported_language t))) + nil)) + (")LANGUAGE" (ATTLIST UTT) + (xxml_synth UTT) + (set! sable_unsupported_language (car (cdr (car sable_language_stack)))) + (set! sable_current_language (car (car sable_language_stack))) + (set! sable_language_stack (cdr sable_language_stack)) + (if (not sable_omitted_mode) + (begin + (select_language sable_current_language) + (sable_setup_voice_params))) + nil) + ("(SPEAKER" (ATTLIST UTT) + ;; Status: GENDER/AGE ignored, should be done by sable-def-speaker + ;; function to define Festival voices to SABLE + (xxml_synth UTT) + (set! sable_speaker_stack (cons sable_current_speaker sable_speaker_stack)) + (cond + ((not equal? sable_current_language 'britishenglish) + (print "SABLE: choosen unknown voice, current voice unchanged")) + ((equal? (car (xxml_attval "NAME" ATTLIST)) 'male1) + (set! sable_current_speaker 'voice_kal_diphone) + (voice_kal_diphone)) + ((equal? (car (xxml_attval "NAME" ATTLIST)) 'male2) + (set! sable_current_speaker 'voice_cmu_us_rms_cg) + (voice_cmu_us_rms_cg)) + ((equal? (car (xxml_attval "NAME" ATTLIST)) 'male3) + (set! sable_current_speaker 'voice_ked_diphone) + (voice_ked_diphone)) + ((equal? (car (xxml_attval "NAME" ATTLIST)) 'male4) + (set! sable_current_speaker 'voice_rab_diphone) + (voice_rab_diphone)) + ((equal? (car (xxml_attval "NAME" ATTLIST)) 'male5) + (set! sable_current_speaker 'voice_cmu_us_awb_cg) + (voice_cmu_us_awb_cg)) + ((equal? (car (xxml_attval "NAME" ATTLIST)) 'female1) + (set! sable_current_speaker 'voice_cmu_us_slt_arctic_hts) + (voice_us1_mbrola)) + (t + (set! sable_current_speaker (intern (string-append "voice_" (car (xxml_attval "NAME" ATTLIST))))) + (eval (list sable_current_speaker)))) + (sable_setup_voice_params) + nil) + (")SPEAKER" (ATTLIST UTT) + (xxml_synth UTT) + (set! sable_utt UTT) + (set! sable_current_speaker (car sable_speaker_stack)) + (set! sable_speaker_stack (cdr sable_speaker_stack)) + (eval (list sable_current_speaker)) + (sable_setup_voice_params) + nil) + ("BREAK" (ATTLIST UTT) + ;; Status: probably complete + ;; may cause an utterance break + (let ((level (upcase (car (xxml_attval "LEVEL" ATTLIST))))) + (cond + ((null UTT) nil) + ((string-equal "LARGE" level) + (xxml_synth UTT) + nil) + (t + (let ((last_token (utt.relation.last UTT'Token))) + (if last_token + (item.set_feat last_token "pbreak" "B")) + UTT))))) + ("(DIV" (ATLIST UTT) + ;; Status: probably complete + (xxml_synth UTT) + nil) + ("AUDIO" (ATTLIST UTT) + ;; Status: MODE (background) ignored, only insertion supported + ;; mime type of file also ignored, as its LEVEL + (let ((tmpfile (make_tmp_filename))) + ;; ignoring mode-background (and will for sometime) + ;; ignoring level option + (xxml_synth UTT) ;; synthesizing anything ready to be synthesized + (get_url (car (xxml_attval "SRC" ATTLIST)) tmpfile) + (apply_hooks tts_hooks + (eval (list 'Utterance 'Wave tmpfile))) + (delete-file tmpfile) + nil)) + ("(EMPH" (ATTLIST UTT) + ;; Status: nesting makes no difference, levels ignored + ;; Festival is particularly bad at adding specific emphasis + ;; that's what happens when you use statistical methods that + ;; don't include any notion of emphasis + ;; This is *not* recursive and only one level of EMPH supported + (sable_push_word_features) + (set! xxml_word_features + (cons (list "dur_stretch" 1.6) + (cons + (list "EMPH" "1") xxml_word_features))) + UTT) + (")EMPH" (ATTLIST UTT) + (set! xxml_word_features (sable_pop_word_features)) + UTT) + ("(PITCH" (ATTLIST UTT) + ;; Status: probably complete + ;; At present festival requires an utterance break here + (xxml_synth UTT) + (set! sable_pitch_context (cons int_lr_params sable_pitch_context)) + (let ((base (sable_interpret_param + (car (xxml_attval "BASE" ATTLIST)) + sable_pitch_base_map + (cadr (assoc 'target_f0_mean int_lr_params)) + sable_pitch_base_original)) + (med (sable_interpret_param + (car (xxml_attval "MED" ATTLIST)) + sable_pitch_med_map + (cadr (assoc 'target_f0_mean int_lr_params)) + sable_pitch_med_original)) + (range (sable_interpret_param + (car (xxml_attval "RANGE" ATTLIST)) + sable_pitch_range_map + (cadr (assoc 'target_f0_std int_lr_params)) + sable_pitch_range_original)) + (oldmean (cadr (assoc 'target_f0_mean int_lr_params)))) + ;; Festival (if it supports anything) supports mean and std + ;; so we treat base as med if med doesn't seem to do anything + (if (equal? med oldmean) + (set! med base)) + (set! int_lr_params + (cons + (list 'target_f0_mean med) + (cons + (list 'target_f0_std range) + int_lr_params))) + nil)) + (")PITCH" (ATTLIST UTT) + (xxml_synth UTT) + (set! int_lr_params (car sable_pitch_context)) + (set! sable_pitch_context (cdr sable_pitch_context)) + nil) + ("(RATE" (ATTLIST UTT) + ;; Status: can't deal with absolute word per minute SPEED. + (sable_push_word_features) + ;; can't deal with words per minute value + (let ((rate (sable_interpret_param + (car (xxml_attval "SPEED" ATTLIST)) + sable_rate_speed_map + (sable_find_fval "dur_stretch" xxml_word_features 1.0) + sable_rate_speed_original))) + (set! xxml_word_features + (cons (list "dur_stretch" (/ 1.0 rate)) xxml_word_features)) + UTT)) + (")RATE" (ATTLIST UTT) + (set! xxml_word_features (sable_pop_word_features)) + UTT) + ("(VOLUME" (ATTLIST UTT) + ;; Status: probably complete + ;; At present festival requires an utterance break here + (xxml_synth UTT) + (set! sable_vol_context (cons (list sable_vol_type sable_vol_factor) + sable_vol_context)) + (let ((level (sable_interpret_param + (car (xxml_attval "LEVEL" ATTLIST)) + sable_volume_level_map + sable_vol_factor + 1.0))) + (cond + ((string-matches (car (xxml_attval "LEVEL" ATTLIST)) ".*%") + (set! sable_vol_type 'relative)) + ((string-matches (car (xxml_attval "LEVEL" ATTLIST)) SABLE_RXDOUBLE) + (set! sable_vol_type 'absolute)) + (t + (set! sable_vol_type 'relative))) + (set! sable_vol_factor level)) + nil) + (")VOLUME" (ATTLIST UTT) + (xxml_synth UTT) + (set! sable_vol_type (car (car sable_vol_context))) + (set! sable_vol_factor (car (cdr (car sable_vol_context)))) + (set! sable_vol_context (cdr sable_vol_context)) + nil) + ("(ENGINE" (ATTLIST UTT) + ;; Status: probably complete + (xxml_synth UTT) + (if (string-matches (car (xxml_attval "ID" ATTLIST)) "festival.*") + (let ((datastr "")) + (mapcar + (lambda (c) (set! datastr (string-append datastr " " c))) + (xxml_attval "DATA" ATTLIST)) + (apply_hooks tts_hooks (eval (list 'Utterance 'Text datastr))) + (set! sable_omitted_mode t)) ;; ignore contents + ;; else + ;; its not relevant to me + ) + nil) + (")ENGINE" (ATTLIST UTT) + (xxml_synth UTT) + (set! sable_omitted_mode nil) + nil) + ("MARKER" (ATTLIST UTT) + ;; Status: does nothing + ;; Can't support this without low-level control of audio spooler + (format t "SABLE: marker \"%s\"\n" + (car (xxml_attval "MARK" ATTLIST))) + UTT) + ("(PRON" (ATTLIST UTT) + ;; Status: IPA currently ignored + (sable_push_word_features) + ;; can't deal with words per minute value + (let ((ipa (xxml_attval "IPA" ATTLIST)) + (sub (xxml_attval "SUB" ATTLIST))) + (cond + (ipa + (format t "SABLE: ipa ignored\n") + (set! xxml_word_features + (cons (list "sable_ignore" "1") xxml_word_features))) + (sub + (set! xxml_word_features + (cons (list "sable_sub" (format nil "%l" sub)) + xxml_word_features)) + (set! xxml_word_features + (cons (list "sable_ignore" "1") xxml_word_features)))) + UTT)) + (")PRON" (ATTLIST UTT) + (set! xxml_word_features (sable_pop_word_features)) + UTT) + ("(SAYAS" (ATTLIST UTT) + ;; Status: only a few of the types are dealt with + (sable_push_word_features) + (set! sable_utt UTT) + ;; can't deal with words per minute value + (let ((mode (downcase (car (xxml_attval "MODE" ATTLIST)))) + (modetype (car (xxml_attval "MODETYPE" ATTLIST)))) + (cond + ((string-equal mode "literal") + (set! xxml_word_features + (cons (list "sable_literal" "1") xxml_word_features))) + ((string-equal mode "phone") + (set! xxml_word_features + (cons (list "token_pos" "digits") xxml_word_features))) + ((string-equal mode "ordinal") + (set! xxml_word_features + (cons (list "token_pos" "ordinal") xxml_word_features))) + ((string-equal mode "cardinal") + (set! xxml_word_features + (cons (list "token_pos" "cardinal") xxml_word_features))) + (t + ;; blindly trust festival to get it right + t)) + UTT)) + (")SAYAS" (ATTLIST UTT) + (set! xxml_word_features (sable_pop_word_features)) + UTT) + + +)) + +(define (sable_init_func) + "(sable_init_func) +Initialisation for SABLE mode" + (sable_init_globals) + (voice_kal_diphone) + (set! sable_previous_elements xxml_elements) + (set! xxml_elements sable_elements) + (set! sable_previous_token_to_words english_token_to_words) + (set! english_token_to_words sable_token_to_words) + (set! token_to_words sable_token_to_words)) + +(define (sable_exit_func) + "(sable_exit_func) +Exit function for SABLE mode" + (set! xxml_elements sable_previous_elements) + (set! token_to_words sable_previous_token_to_words) + (set! english_token_to_words sable_previous_token_to_words)) + +(define (sable_push_word_features) +"(sable_push_word_features) +Save current word features on stack." + (set! sable_word_features_stack + (cons xxml_word_features sable_word_features_stack))) + +(define (sable_adjust_volume utt) + "(sable_adjust_volume utt) +Amplify or attenutate signale based on value of sable_vol_factor +and sable_vol_type (absolute or relative)." + (set! utts (cons utt utts)) + (cond + ((equal? sable_vol_type 'no_change) + utt) + ((equal? sable_vol_type 'absolute) + (utt.wave.rescale utt sable_vol_factor 'absolute)) + ((equal? sable_vol_type 'relative) + (utt.wave.rescale utt sable_vol_factor)) + (t + (format stderr "SABLE: volume unknown type \"%s\"\n" sable_vol_type) + utt)) + utt) + +(define (sable_pop_word_features) +"(sable_pop_word_features) +Pop word features from stack." + (let ((r (car sable_word_features_stack))) + (set! sable_word_features_stack (cdr sable_word_features_stack)) + r)) + +(define (sable_find_fval feat flist def) + (cond + ((null flist) def) + ((string-equal feat (car (car flist))) + (car (cdr (car flist)))) + (t + (sable_find_fval feat (cdr flist) def)))) + +(define (sable_interpret_param ident map original current) +"(sable_interpret_param IDENT MAP ORIGINAL CURRENT) +If IDENT is in map return ORIGINAL times value in map, otherwise +treat IDENT of the form +/-N% and modify CURRENT accordingly." + (let ((mm (assoc ident map))) + (cond + (mm + (* original (car (cdr mm)))) + ((string-matches ident SABLE_RXDOUBLE) + (parse-number ident)) + ((string-matches ident ".*%") + (+ current (* current (/ (parse-number (string-before ident "%")) + 100.0)))) +;; ((string-matches ident ".*%") +;; (* current (/ (parse-number (string-before ident "%")) 100.0))) + ((not ident) current) + (t + (format stderr "SABLE: modifier \"%s\" not of float, tag or +/-N\n" + ident) + current)))) + +(define (sable_setup_voice_params) +"(sable_setup_voice_params) +Set up original values for various voice parameters." + (set! sable_pitch_base_original (cadr (assoc 'target_f0_mean int_lr_params))) + (set! sable_pitch_med_original (cadr (assoc 'target_f0_mean int_lr_params))) + (set! sable_pitch_range_original (cadr (assoc 'target_f0_std int_lr_params))) + (set! sable_rate_speed_original 1.0) + (if (and after_synth_hooks (not (consp after_synth_hooks))) + (set! after_synth_hooks + (cons after_synth_hooks (list sable_adjust_volume))) + (set! after_synth_hooks + (append after_synth_hooks (list sable_adjust_volume)))) +) + +;;; Declare the new mode to Festival +(set! tts_text_modes + (cons + (list + 'sable ;; mode name + (list + (list 'init_func sable_init_func) + (list 'exit_func sable_exit_func) + '(analysis_type xml) + )) + tts_text_modes)) + +(provide 'sable-mode) diff --git a/lib/scfg.scm b/lib/scfg.scm new file mode 100644 index 0000000..6716e5b --- /dev/null +++ b/lib/scfg.scm @@ -0,0 +1,62 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Some functions for manipulating a SCFG parse tree + +(require_module 'parser) + +(define (scfg_simplify tree) + "(scfg_brackets_only tree) +Output only the bracketing and the bottom level pos and words." + (cond + ((not tree) nil) + ((car (cdr (assoc 'pos (car (cdr (car tree)))))) + ;; terminal node + (list + (car (cdr (assoc 'pos (car (cdr (car tree)))))) + (car (car tree)))) + (t + (cons + (car (car tree)) + (mapcar scfg_simplify (cdr tree)))))) + +(define (scfg_simplify_relation_tree trees) + (mapcar scfg_simplify trees)) + +(defvar scfg_eos_tree eou_tree + "scfg_eos_tree +In MultiProbParse this CART tree is used to define end of sentence +within an utterance. It is applied to the token relation. +By default it is set to eou_tree.") + +(provide 'scfg) diff --git a/lib/scfg_wsj_wp20.gram b/lib/scfg_wsj_wp20.gram new file mode 100644 index 0000000..2df5414 --- /dev/null +++ b/lib/scfg_wsj_wp20.gram @@ -0,0 +1,523 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;-*-mode:scheme-*- +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; A Stochastic context free grammar for the wp20 tag set with 19 +;;; nonterminals +;;; +;;; This was trained from 10,000 sentences (00-04) of the UPenn WSJ tree +;;; bank using the inside-outside algorithm seeded with the bracketing from +;;; the treebank. The implementation is the scfg_ suite in the +;; speech tools and is based on the paper "Inside-Outside +;;; Reestimation from partially bracketed corpora", F Pereira and +;;; Y. Schabes. pp 128-135, 30th ACL, Newark, Delaware 1992. +;;; +;;; This grammar with 19 nonterminals was trained for 174 passes +;;; using a fifth of training data each time. It was tested against +;;; independent data both bracketed and unbracketed. After training, +;;; all rules with a probability less than 1.0e-6 were pruned. +;;; +;;; On an unseen test set of 686 sentences (from wsj/05/) this gets +;;; 92.2397% bracketing accuracy and 29.5918% sentences fully correct +;;; +;;; previous best 15_20 grammar +;;; 90.2377% bracketing accuracy and 24.7813% sentences fully correct +;;; +;;; Training this grammar took a long time. This is best grammar +;;; by testing grammars varying the number of non-terminals from 11-25 +;;; as the number of NTs increases the time for training also increases +;;; This 19_20 grammar took 20 days on a Sun Ultra 1 140, but I also +;;; had to search 11-18 to confirm this is best, which was done with a +;;; collection of Ultra 140s 170s and Pentium Pros (Linux and FreeBSD) +;;; +(0.00593452 NT00 NT00 NT00) +(0.0319023 NT00 NT00 NT13) +(0.00105452 NT00 NT00 NT18) +(0.00061816 NT00 NT02 NT10) +(0.000399698 NT00 NT02 NT12) +(0.0383818 NT00 NT05 NT00) +(0.00011458 NT00 NT06 NT03) +(0.00164298 NT00 NT06 NT17) +(0.00153884 NT00 NT07 NT07) +(0.00118244 NT00 NT07 NT12) +(0.00171642 NT00 NT07 NT13) +(0.00031308 NT00 NT07 NT17) +(0.0949408 NT00 NT09 NT18) +(0.000932166 NT00 NT10 NT03) +(0.000150288 NT00 NT10 NT17) +(0.0152371 NT00 NT12 NT18) +(0.73409 NT00 NT14 NT13) +(0.0403652 NT00 NT14 NT18) +(0.000195643 NT00 NT16 NT07) +(0.0134222 NT00 NT18 NT13) +(0.015624 NT00 NT18 NT18) +(0.00251118 NT01 NT01 NT07) +(0.00354571 NT01 NT01 NT11) +(0.22337 NT01 NT01 NT16) +(0.0467048 NT01 NT02 NT05) +(0.000518329 NT01 NT04 NT01) +(0.000100574 NT01 NT06 NT05) +(0.0480904 NT01 NT07 NT05) +(0.000358197 NT01 NT11 NT11) +(0.00278007 NT01 NT16 NT05) +(0.000179198 NT01 NT16 NT15) +(0.00140099 NT01 n) +(0.00228587 NT01 v) +(0.524988 NT01 dt) +(0.00128028 NT01 in) +(0.0660845 NT01 j) +(0.0131026 NT01 cd) +(0.00584238 NT01 r) +(0.0548382 NT01 prp) +(0.000445004 NT01 wdt) +(0.00135794 NT01 wp) +(0.000195991 NT01 wrb) +(0.000264526 NT02 NT01 NT01) +(0.00243627 NT02 NT01 NT02) +(0.613543 NT02 NT01 NT07) +(0.00180865 NT02 NT01 NT11) +(0.0042804 NT02 NT01 NT16) +(0.0392418 NT02 NT02 NT07) +(0.026104 NT02 NT02 NT12) +(0.000916683 NT02 NT02 NT16) +(0.00158862 NT02 NT04 NT01) +(0.000206161 NT02 NT04 NT02) +(0.00343189 NT02 NT04 NT16) +(0.000417113 NT02 NT07 NT05) +(0.0988457 NT02 NT07 NT07) +(0.000931386 NT02 NT07 NT11) +(0.00073236 NT02 NT07 NT12) +(0.000153421 NT02 NT10 NT13) +(0.00163484 NT02 NT11 NT02) +(0.0379562 NT02 NT11 NT07) +(0.0149 NT02 NT11 NT11) +(0.00105811 NT02 NT11 NT12) +(0.000175184 NT02 NT16 NT02) +(0.0403395 NT02 NT16 NT07) +(0.00297703 NT02 NT16 NT12) +(0.0875026 NT02 n) +(0.00496719 NT02 v) +(0.000409658 NT02 dt) +(0.00239978 NT02 j) +(0.010203 NT02 r) +(0.000194628 NT02 pdt) +(0.000377009 NT03 NT04 NT02) +(0.11551 NT03 NT08 NT13) +(0.347629 NT03 NT09 NT13) +(0.484911 NT03 NT10 NT13) +(0.00188291 NT03 NT11 NT12) +(0.0495461 NT03 NT17 NT13) +(0.00918797 NT04 NT03 NT05) +(0.000303954 NT04 NT04 NT02) +(0.00284848 NT04 NT04 NT04) +(0.00710115 NT04 NT04 NT12) +(0.000597744 NT04 NT04 NT15) +(0.000377075 NT04 NT04 NT16) +(0.00130088 NT04 NT09 NT05) +(0.00175428 NT04 NT10 NT13) +(0.000127716 NT04 NT15 NT04) +(0.00013648 NT04 NT15 NT06) +(0.00045093 NT04 NT15 NT07) +(0.000626479 NT04 NT15 NT16) +(0.000563588 NT04 NT16 NT15) +(0.0232089 NT04 NT17 NT05) +(0.000138094 NT04 NT17 NT15) +(0.00094009 NT04 n) +(0.671108 NT04 v) +(0.0150619 NT04 punc) +(0.00056566 NT04 dt) +(0.144629 NT04 r) +(0.00270621 NT04 prp) +(0.0449587 NT04 to) +(0.0543755 NT04 md) +(0.00839747 NT04 wdt) +(0.00813689 NT04 wp) +(0.000560496 NT05 NT07 NT05) +(0.000901219 NT05 NT15 NT07) +(0.180172 NT05 punc) +(0.533041 NT05 cc) +(0.285244 NT05 pos) +(0.00164003 NT06 NT00 NT13) +(0.00222915 NT06 NT01 NT06) +(0.275903 NT06 NT01 NT07) +(0.00191616 NT06 NT01 NT11) +(0.00316549 NT06 NT01 NT12) +(0.000730143 NT06 NT01 NT14) +(0.000559842 NT06 NT02 NT06) +(0.0236744 NT06 NT02 NT07) +(0.00284929 NT06 NT02 NT09) +(0.155052 NT06 NT02 NT12) +(0.00387995 NT06 NT02 NT14) +(0.0161403 NT06 NT02 NT18) +(0.000110944 NT06 NT04 NT01) +(0.00237845 NT06 NT04 NT02) +(0.00625142 NT06 NT04 NT06) +(0.00118802 NT06 NT04 NT08) +(0.000132901 NT06 NT04 NT10) +(0.000192545 NT06 NT04 NT11) +(0.000199118 NT06 NT06 NT01) +(0.0081704 NT06 NT06 NT12) +(0.00198439 NT06 NT06 NT14) +(0.000889455 NT06 NT06 NT18) +(0.00142038 NT06 NT07 NT05) +(0.0820095 NT06 NT07 NT07) +(0.000112894 NT06 NT07 NT09) +(0.0220243 NT06 NT07 NT12) +(0.000133911 NT06 NT07 NT14) +(0.00100807 NT06 NT07 NT17) +(0.000191764 NT06 NT08 NT13) +(0.000340112 NT06 NT10 NT08) +(0.000126776 NT06 NT10 NT09) +(0.0136266 NT06 NT10 NT12) +(0.00867414 NT06 NT10 NT13) +(0.00341334 NT06 NT10 NT18) +(0.00154851 NT06 NT11 NT12) +(0.00104947 NT06 NT12 NT12) +(0.000219189 NT06 NT14 NT05) +(0.00313879 NT06 NT14 NT13) +(0.000745073 NT06 NT15 NT02) +(0.000433144 NT06 NT15 NT06) +(0.000159867 NT06 NT15 NT16) +(0.00124313 NT06 NT16 NT02) +(0.00918606 NT06 NT16 NT07) +(0.00373496 NT06 NT16 NT12) +(0.014053 NT06 NT18 NT13) +(0.0155714 NT06 n) +(0.00123379 NT06 punc) +(0.0152764 NT06 dt) +(0.00123486 NT06 j) +(0.00359625 NT06 r) +(0.212966 NT06 prp) +(0.00199168 NT06 cc) +(0.0383471 NT06 wdt) +(0.0182587 NT06 wp) +(0.00204833 NT06 wrb) +(0.0109929 NT06 ex) +(0.0011995 NT07 NT05 NT16) +(0.119588 NT07 NT07 NT07) +(0.000353596 NT07 NT07 NT11) +(0.000177793 NT07 NT07 NT12) +(0.00101956 NT07 NT11 NT11) +(0.000357614 NT07 NT15 NT01) +(0.00084812 NT07 NT15 NT06) +(0.0182872 NT07 NT16 NT07) +(0.00018607 NT07 NT16 NT11) +(0.856315 NT07 n) +(0.000736333 NT07 v) +(0.000645479 NT08 NT00 NT09) +(0.000990156 NT08 NT01 NT02) +(0.0410251 NT08 NT01 NT07) +(0.0013863 NT08 NT01 NT09) +(0.000242552 NT08 NT01 NT12) +(0.00174478 NT08 NT01 NT14) +(0.000596656 NT08 NT01 NT16) +(0.00130945 NT08 NT02 NT07) +(0.166303 NT08 NT02 NT09) +(0.0143253 NT08 NT02 NT12) +(0.0113813 NT08 NT02 NT14) +(0.000597887 NT08 NT02 NT16) +(0.0133053 NT08 NT03 NT09) +(0.0109076 NT08 NT03 NT17) +(0.000211313 NT08 NT04 NT01) +(0.0105796 NT08 NT04 NT02) +(0.00440181 NT08 NT04 NT04) +(0.00203737 NT08 NT04 NT06) +(0.213275 NT08 NT04 NT08) +(0.0781169 NT08 NT04 NT09) +(0.0190657 NT08 NT04 NT10) +(0.00319326 NT08 NT04 NT12) +(0.000693766 NT08 NT04 NT15) +(0.00112226 NT08 NT04 NT16) +(0.00117025 NT08 NT06 NT02) +(0.00807496 NT08 NT06 NT08) +(0.0183971 NT08 NT06 NT09) +(0.00127343 NT08 NT06 NT14) +(0.0322725 NT08 NT06 NT17) +(0.00396897 NT08 NT07 NT07) +(0.0154729 NT08 NT07 NT09) +(0.000708139 NT08 NT07 NT10) +(0.00186499 NT08 NT07 NT11) +(0.000701346 NT08 NT07 NT14) +(0.0116278 NT08 NT08 NT09) +(0.0965117 NT08 NT10 NT09) +(0.000142086 NT08 NT10 NT12) +(0.000210725 NT08 NT10 NT14) +(0.00336223 NT08 NT11 NT07) +(0.00183799 NT08 NT11 NT09) +(0.00109249 NT08 NT11 NT11) +(0.000880671 NT08 NT11 NT12) +(0.0032493 NT08 NT12 NT08) +(0.0372072 NT08 NT12 NT09) +(0.00113127 NT08 NT12 NT12) +(0.00892231 NT08 NT15 NT02) +(0.00383754 NT08 NT15 NT06) +(0.000528365 NT08 NT15 NT07) +(0.0060705 NT08 NT15 NT08) +(0.00853698 NT08 NT15 NT10) +(0.0349777 NT08 NT15 NT14) +(0.000202857 NT08 NT16 NT06) +(0.00709689 NT08 NT16 NT07) +(0.000240097 NT08 NT16 NT08) +(0.0401819 NT08 NT16 NT09) +(0.00124754 NT08 NT16 NT14) +(0.00862498 NT08 n) +(0.0115193 NT08 v) +(0.000974267 NT08 in) +(0.0169837 NT08 j) +(0.00626434 NT08 r) +(0.00437851 NT08 prp) +(0.0062359 NT09 NT01 NT07) +(0.000165196 NT09 NT01 NT14) +(0.00151872 NT09 NT02 NT04) +(0.000660061 NT09 NT02 NT15) +(0.000434321 NT09 NT02 NT16) +(0.00805872 NT09 NT03 NT09) +(0.000180982 NT09 NT04 NT08) +(0.050609 NT09 NT04 NT09) +(0.000307442 NT09 NT04 NT15) +(0.00281491 NT09 NT04 NT17) +(0.000295911 NT09 NT06 NT15) +(0.00133828 NT09 NT07 NT11) +(0.0235741 NT09 NT12 NT09) +(0.00121997 NT09 NT12 NT12) +(0.00391762 NT09 NT15 NT01) +(0.173027 NT09 NT15 NT02) +(0.000462089 NT09 NT15 NT06) +(0.0276663 NT09 NT15 NT07) +(0.210483 NT09 NT15 NT08) +(0.000177004 NT09 NT15 NT09) +(0.243402 NT09 NT15 NT10) +(0.0174403 NT09 NT15 NT11) +(0.00646962 NT09 NT15 NT12) +(0.155174 NT09 NT15 NT14) +(0.00930502 NT09 NT15 NT17) +(0.000311399 NT09 NT16 NT02) +(0.0052031 NT09 NT16 NT07) +(0.00742336 NT09 NT16 NT09) +(0.000409254 NT09 in) +(0.0019424 NT09 j) +(0.0393282 NT09 r) +(0.00016039 NT09 prp) +(0.00268682 NT10 NT01 NT07) +(0.00173594 NT10 NT01 NT09) +(0.00550051 NT10 NT01 NT10) +(0.00269002 NT10 NT01 NT11) +(0.00881491 NT10 NT01 NT12) +(0.0158503 NT10 NT02 NT02) +(0.00229071 NT10 NT02 NT07) +(0.00765082 NT10 NT02 NT09) +(0.00102327 NT10 NT02 NT11) +(0.474288 NT10 NT02 NT12) +(0.0119086 NT10 NT02 NT14) +(0.000270767 NT10 NT02 NT15) +(0.00425023 NT10 NT02 NT16) +(0.0533347 NT10 NT04 NT02) +(0.00286524 NT10 NT04 NT06) +(0.0687658 NT10 NT04 NT10) +(0.0157381 NT10 NT04 NT12) +(0.000809508 NT10 NT05 NT12) +(0.00188343 NT10 NT06 NT04) +(0.000155481 NT10 NT06 NT09) +(0.00569591 NT10 NT06 NT14) +(0.00233367 NT10 NT06 NT17) +(0.000189475 NT10 NT07 NT05) +(0.018548 NT10 NT07 NT07) +(0.00472354 NT10 NT07 NT09) +(0.0121145 NT10 NT07 NT11) +(0.0698482 NT10 NT07 NT12) +(0.000402661 NT10 NT07 NT16) +(0.00183044 NT10 NT07 NT17) +(0.00166519 NT10 NT10 NT02) +(0.015445 NT10 NT10 NT09) +(0.019208 NT10 NT10 NT12) +(0.000942866 NT10 NT10 NT18) +(0.00149941 NT10 NT11 NT01) +(0.00624706 NT10 NT11 NT02) +(0.0381755 NT10 NT11 NT11) +(0.00754256 NT10 NT11 NT12) +(0.00139213 NT10 NT15 NT02) +(0.000523505 NT10 NT15 NT06) +(0.0015256 NT10 NT15 NT10) +(0.00119525 NT10 NT15 NT12) +(0.00683524 NT10 NT16 NT02) +(0.000398591 NT10 NT16 NT04) +(0.0701558 NT10 NT16 NT07) +(0.00198721 NT10 NT16 NT11) +(0.0075364 NT10 NT16 NT12) +(0.0186618 NT10 n) +(0.000591828 NT10 uh) +(0.157827 NT11 NT11 NT11) +(0.0422576 NT11 NT15 NT11) +(0.00247895 NT11 NT15 NT16) +(0.000257833 NT11 dt) +(0.754818 NT11 cd) +(0.0421123 NT11 r) +(0.00236916 NT12 NT01 NT07) +(0.000118511 NT12 NT02 NT16) +(0.00638739 NT12 NT04 NT02) +(0.0055731 NT12 NT04 NT04) +(0.0340903 NT12 NT04 NT12) +(0.00102031 NT12 NT04 NT15) +(0.00143793 NT12 NT04 NT16) +(0.000102621 NT12 NT04 NT17) +(0.0032774 NT12 NT06 NT04) +(0.000366976 NT12 NT07 NT07) +(0.00218153 NT12 NT07 NT11) +(0.0117989 NT12 NT11 NT07) +(0.00303601 NT12 NT12 NT12) +(0.0747798 NT12 NT13 NT03) +(0.000232806 NT12 NT15 NT01) +(0.341016 NT12 NT15 NT02) +(0.0190932 NT12 NT15 NT06) +(0.100931 NT12 NT15 NT07) +(0.193386 NT12 NT15 NT10) +(0.0142796 NT12 NT15 NT11) +(0.000915196 NT12 NT16 NT07) +(0.000299768 NT12 NT16 NT11) +(0.0135637 NT12 NT16 NT12) +(0.115493 NT12 n) +(0.00344871 NT12 v) +(0.0262404 NT12 punc) +(0.000493049 NT12 in) +(0.00235382 NT12 j) +(0.0192274 NT12 r) +(0.00199831 NT12 prp) +(0.000209376 NT13 NT11 NT15) +(0.00188858 NT13 NT13 NT03) +(0.540855 NT13 punc) +(0.00804226 NT13 cc) +(0.000413617 NT14 NT00 NT09) +(0.0218326 NT14 NT00 NT14) +(0.000451496 NT14 NT00 NT18) +(0.00149459 NT14 NT01 NT07) +(0.00384046 NT14 NT01 NT17) +(0.00138254 NT14 NT02 NT09) +(0.0525259 NT14 NT03 NT14) +(0.000893974 NT14 NT04 NT02) +(0.000175088 NT14 NT04 NT06) +(0.000478859 NT14 NT04 NT08) +(0.00086439 NT14 NT04 NT09) +(0.00529624 NT14 NT04 NT10) +(0.000476852 NT14 NT04 NT12) +(0.00549502 NT14 NT04 NT14) +(0.0281873 NT14 NT05 NT14) +(0.76715 NT14 NT06 NT17) +(0.00303311 NT14 NT07 NT07) +(0.00027137 NT14 NT07 NT09) +(0.000748841 NT14 NT07 NT12) +(0.0874896 NT14 NT07 NT17) +(0.00416962 NT14 NT09 NT14) +(0.00175999 NT14 NT10 NT09) +(0.000710869 NT14 NT11 NT17) +(0.000723932 NT14 NT12 NT07) +(0.00440147 NT14 NT12 NT14) +(0.000761726 NT14 NT14 NT09) +(0.00084762 NT14 NT14 NT17) +(0.000323644 NT14 NT15 NT02) +(0.00264492 NT14 NT15 NT14) +(0.000238841 NT14 NT16 NT07) +(0.000126025 NT14 NT16 NT09) +(0.000217731 NT14 r) +(0.00024161 NT14 wrb) +(0.000366989 NT15 NT04 NT04) +(0.00127143 NT15 NT04 NT15) +(0.00137902 NT15 NT11 NT07) +(0.000109067 NT15 NT15 NT04) +(0.00380199 NT15 NT15 NT06) +(0.000193842 NT15 NT15 NT15) +(0.000253898 NT15 NT15 NT16) +(0.00556123 NT15 v) +(0.0798535 NT15 punc) +(0.557206 NT15 in) +(0.0519477 NT15 cc) +(0.170466 NT15 of) +(0.113587 NT15 to) +(0.0125211 NT15 wrb) +(0.00146961 NT15 pdt) +(0.000682686 NT16 NT01 NT16) +(0.000353409 NT16 NT02 NT02) +(0.0034721 NT16 NT02 NT05) +(0.00392739 NT16 NT04 NT04) +(0.0225952 NT16 NT04 NT16) +(0.00368407 NT16 NT05 NT16) +(0.000275916 NT16 NT06 NT05) +(0.0263102 NT16 NT07 NT05) +(0.00344251 NT16 NT07 NT12) +(0.00271063 NT16 NT07 NT16) +(0.000950873 NT16 NT10 NT13) +(0.0229124 NT16 NT11 NT07) +(0.0173136 NT16 NT11 NT11) +(0.0094147 NT16 NT11 NT16) +(0.00210054 NT16 NT13 NT03) +(0.000417271 NT16 NT15 NT01) +(0.0100377 NT16 NT15 NT11) +(0.000679194 NT16 NT16 NT05) +(0.00203961 NT16 NT16 NT11) +(0.00352444 NT16 NT16 NT12) +(0.0133536 NT16 NT16 NT16) +(0.0041124 NT16 n) +(0.0518387 NT16 v) +(0.0133556 NT16 punc) +(0.746857 NT16 j) +(0.0325454 NT16 cd) +(0.000994964 NT16 r) +(0.000325555 NT17 NT03 NT09) +(0.000431668 NT17 NT03 NT17) +(0.000283523 NT17 NT04 NT01) +(0.00308221 NT17 NT04 NT02) +(0.000106449 NT17 NT04 NT07) +(0.584517 NT17 NT04 NT08) +(0.0389749 NT17 NT04 NT09) +(0.00927257 NT17 NT04 NT10) +(0.000698039 NT17 NT04 NT11) +(0.0594712 NT17 NT04 NT14) +(0.000381951 NT17 NT04 NT16) +(0.248255 NT17 NT04 NT17) +(0.000264379 NT17 NT05 NT08) +(0.00194384 NT17 NT05 NT10) +(0.000308808 NT17 NT05 NT14) +(0.000271388 NT17 NT07 NT08) +(0.000131093 NT17 NT07 NT10) +(0.00011195 NT17 NT07 NT17) +(0.000462643 NT17 NT08 NT09) +(0.00153331 NT17 NT11 NT07) +(0.00214335 NT17 NT11 NT11) +(0.000307068 NT17 NT11 NT12) +(0.000550528 NT17 NT15 NT10) +(0.000125644 NT17 NT16 NT02) +(0.000474489 NT17 NT17 NT09) +(0.00032483 NT17 NT17 NT18) +(0.045027 NT17 v) +(0.00425503 NT18 NT07 NT18) +(0.978831 NT18 NT13 NT00) +(0.00130119 NT18 NT13 NT03) +(0.0155958 NT18 NT17 NT13) diff --git a/lib/sec.B.hept.ngrambin b/lib/sec.B.hept.ngrambin new file mode 100644 index 0000000..3434e0f Binary files /dev/null and b/lib/sec.B.hept.ngrambin differ diff --git a/lib/sec.ts20.quad.ngrambin b/lib/sec.ts20.quad.ngrambin new file mode 100644 index 0000000..3b35f45 Binary files /dev/null and b/lib/sec.ts20.quad.ngrambin differ diff --git a/lib/singing-mode.scm b/lib/singing-mode.scm new file mode 100644 index 0000000..d3bd46b --- /dev/null +++ b/lib/singing-mode.scm @@ -0,0 +1,671 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Festival Singing Mode +;;; +;;; Written by Dominic Mazzoni +;;; Carnegie Mellon University +;;; 11-752 - "Speech: Phonetics, Prosody, Perception and Synthesis" +;;; Spring 2001 +;;; +;;; Extended by Milan Zamazal , 2006: +;;; - Slur support. +;;; - Czech support. +;;; - Some cleanup. +;;; - Print debugging information only when singing-debug is true. +;;; +;;; This code is public domain; anyone may use it freely. +;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(require_module 'rxp) + +(xml_register_id "-//SINGING//DTD SINGING mark up//EN" + (path-append xml_dtd_dir "Singing.v0_1.dtd") + ) + +(xml_register_id "-//SINGING//ENTITIES Added Latin 1 for SINGING//EN" + (path-append xml_dtd_dir "sable-latin.ent") + ) + +;; Set this to t to enable debugging messages: +(defvar singing-debug nil) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; XML parsing functions +;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;; +;; singing_xml_targets +;; +;; This variable defines the actions that are to be taken when +;; parsing each of our XML tags: SINGING, PITCH, DURATION, and REST. +;; +;; When we get the pitch and duration of each token, we store them +;; in features of the token. Later our intonation and duration +;; functions access these features. +;; + +(defvar singing_xml_elements + '( + ("(SINGING" (ATTLIST UTT) + (set! singing_pitch_att_list nil) + (set! singing_dur_att_list nil) + (set! singing_global_time 0.0) + (set! singing_bpm (get-bpm ATTLIST)) + (set! singing_bps (/ singing_bpm 60.0)) + nil) + + (")SINGING" (ATTLIST UTT) + (xxml_synth UTT) ;; Synthesize the remaining tokens + nil) + + ("(PITCH" (ATTLIST UTT) + (set! singing_pitch_att_list ATTLIST) + UTT) + + (")PITCH" (ATTLIST UTT) + (let ((freq (get-freqs singing_pitch_att_list))) + (if singing-debug + (begin + (print "freqs") + (print freq))) + (singing-append-feature! UTT 'freq freq)) + UTT) + + ("(DURATION" (ATTLIST UTT) + (set! singing_dur_att_list ATTLIST) + UTT) + + (")DURATION" (ATTLIST UTT) + (let ((dur (get-durs singing_dur_att_list))) + (if singing-debug + (begin + (print "durs") + (print dur))) + (singing-append-feature! UTT 'dur dur)) + UTT) + + ("(REST" (ATTLIST UTT) + (let ((dur (get-durs ATTLIST))) + (if singing-debug + (begin + (print "rest durs") + (print dur))) + (singing-append-feature! UTT 'rest (caar dur))) + UTT) + )) + +;; +;; get-bpm +;; +;; Given the attribute list of a SINGING tag, returns the beats +;; per minute of the song from the BPM parameter. +;; + +(define (get-bpm atts) + (parse-number (car (car (cdr (assoc 'BPM atts)))))) + +;; +;; get-durs +;; +;; Given the attribute list of a DURATION tag, returns a list of +;; durations in seconds for the syllables of the word enclosed by +;; this tag. +;; +;; It first looks for a BEATS parameter, and converts these to +;; seconds using BPM, which was set in the SINGING tag. If this +;; is not present, it looks for the SECONDS parameter. +;; + +(define (get-durs atts) + (let ((seconds (car (car (cdr (assoc 'SECONDS atts))))) + (beats (car (car (cdr (assoc 'BEATS atts)))))) + (if (equal? beats 'X) + (mapcar (lambda (lst) (mapcar parse-number lst)) + (string->list seconds)) + (mapcar (lambda (lst) + (mapcar (lambda (x) (/ (parse-number x) singing_bps)) lst)) + (string->list beats))))) + +;; +;; get-freqs +;; +;; Given the attribute list of a PITCH tag, returns a list of +;; frequencies in Hertz for the syllables of the word enclosed by +;; this tag. +;; +;; It first looks for a NOTE parameter, which can contain a MIDI +;; note of the form "C4", "D#3", or "Ab6", and if this is not +;; present it looks for the FREQ parameter. +;; + +(define (get-freqs atts) + (let ((freqs (car (car (cdr (assoc 'FREQ atts))))) + (notes (car (car (cdr (assoc 'NOTE atts)))))) + (if (equal? notes 'X) + (mapcar (lambda (lst) (mapcar parse-number lst)) + (string->list freqs)) + (mapcar (lambda (lst) (mapcar note->freq lst)) + (string->list notes))))) + +;; +;; note->freq +;; +;; Converts a string representing a MIDI note such as "C4" and +;; turns it into a frequency. We use the convention that +;; A5=440 (some call this note A3). +;; + +(define (note->freq note) + (if singing-debug + (format t "note is %l\n" note)) + (set! note (format nil "%s" note)) + (if singing-debug + (print_string note)) + (let (l octave notename midinote thefreq) + (set! l (string-length note)) + (set! octave (substring note (- l 1) 1)) + (set! notename (substring note 0 (- l 1))) + (set! midinote (+ (* 12 (parse-number octave)) + (notename->midioffset notename))) + (set! thefreq (midinote->freq midinote)) + (if singing-debug + (format t "note %s freq %f\n" note thefreq)) + thefreq)) + +;; +;; midinote->freq +;; +;; Converts a MIDI note number (1 - 127) into a frequency. We use +;; the convention that 69 = "A5" =440 Hz. +;; + +(define (midinote->freq midinote) + (* 440.0 (pow 2.0 (/ (- midinote 69) 12)))) + +;; +;; notename->midioffset +;; +;; Utility function that looks up the name of a note like "F#" and +;; returns its offset from C. +;; + +(define (notename->midioffset notename) + (parse-number (car (cdr (assoc_string notename note_names))))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; Pitch modification functions +;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;; +;; singing_f0_targets +;; +;; This function replaces the normal intonation function used in +;; festival. For each syllable, it extracts the frequency that +;; was calculated from the XML tags and stored in the token this +;; syllable comes from, and sets this frequency as both the start +;; and end f0 target. Really straightforward! +;; + +(defvar singing-last-f0 nil) +(define (singing_f0_targets utt syl) + "(singing_f0_targets utt syl)" + (let ((start (item.feat syl 'syllable_start)) + (end (item.feat syl 'syllable_end)) + (freqs (mapcar parse-number (syl->freq syl))) + (durs (syl->durations syl))) + (let ((total-durs (apply + durs)) + (total-time (- end start)) + (time start) + (prev-segment (item.prev (item.relation (item.daughter1 (item.relation syl 'SylStructure)) 'Segment))) + (last-f0 singing-last-f0)) + (if freqs + (begin + (set! singing-last-f0 (car (last freqs))) + (append (if (and last-f0 + prev-segment + (item.prev prev-segment) + (string-equal (item.feat prev-segment 'name) + (car (car (cdr (car (PhoneSet.description '(silences)))))))) + (let ((s (item.feat prev-segment "p.end")) + (e (item.feat prev-segment "end"))) + (list (list (+ s (* (- e s) 0.8)) last-f0) + (list (+ s (* (- e s) 0.9)) (car freqs))))) + (apply append + (mapcar (lambda (d f) + (let ((range (* (/ d total-durs) total-time)) + (old-time time)) + (set! time (+ time range)) + (let ((range-fraction (* 0.1 range))) + (list (list (+ old-time range-fraction) f) + (list (- time range-fraction) f))))) + durs freqs)))))))) + +;; +;; syl->freq +;; +;; Given a syllable, looks up the frequency in its token. The token +;; stores a list of all of the frequencies associated with its +;; syllables, so this syllable grabs the frequency out of the list +;; corresponding to its index within the word. (This assumes that +;; a frequency was given for each syllable, and that a token +;; corresponds directly to a word. Singing-mode is not guaranteed +;; to work at all if either of these things are not true.) +;; + +(define (syl->freq syl) + (let ((index (item.feat syl "R:Syllable.pos_in_word")) + (freqs (singing-feat syl "R:SylStructure.parent.R:Token.parent.freq"))) + (nth index freqs))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; Duration modification functions +;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;; +;; singing_duration_method +;; +;; Calculates the duration of each phone in the utterance, in three +;; passes. Consult the three functions it calls, below, to see what +;; each one does. +;; + +(define (singing_duration_method utt) + (mapcar singing_adjcons_syllable (utt.relation.items utt 'Syllable)) + (singing_do_initial utt (car (utt.relation.items utt 'Token))) + (mapcar singing_do_syllable (utt.relation.items utt 'Syllable)) + (mapcar singing_fix_segment (utt.relation.items utt 'Segment)) + utt) + +;; +;; singing_adjcons_syllable +;; +;; First pass. Looks at the first phone of each syllable and +;; adjusts the starting time of this syllable such that the +;; perceived start time of the first phone is at the beginning +;; of the originally intended start time of the syllable. +;; +;; If this is not done, telling it to say the word "ta" at time +;; 2.0 actually doesn't "sound" like it says the "t" sound until +;; about 2.1 seconds. +;; +;; This function has a little bit of duplicated code from +;; singing_do_syllable, below - it could be modularized a little +;; better. +;; + +(define (singing_adjcons_syllable syl) + (let ((totlen (apply + (mapcar (lambda (s) + (get_avg_duration (item.feat s "name"))) + (item.leafs + (item.relation syl 'SylStructure))))) + (syldur (apply + (syl->durations syl))) + ;; figure out the offset of the first phone + (phone1 (item.daughter1 (item.relation syl 'SylStructure))) + (prevsyl (item.prev (item.relation syl 'Syllable)))) + (let ((offset (get_duration_offset (item.feat phone1 "name")))) + (if singing-debug + (format t "offset: %f\n" offset) ) + (if (< syldur totlen) + (set! offset (* offset (/ syldur totlen)))) + (if singing-debug + (format t "Want to adjust syl by %f\n" offset)) + (if prevsyl + (begin + (item.set_feat prevsyl 'subtractoffset offset) + (item.set_feat syl 'addoffset offset)))))) + +;; +;; singing_do_syllable +;; +;; Second pass. For each syllable, adds up the amount of time +;; that would normally be spent in consonants and vowels, based +;; on the average durations of these phones. Then, if the +;; intended length of this syllable is longer than this total, +;; stretch only the vowels; otherwise shrink all phones +;; proportionally. This function actually sets the "end" time +;; of each phone using a global "singing_global_time" variable. +;; +;; We also handle rests at this point, which are tagged onto the +;; end of the previous token. +;; + +(defvar singing-max-short-vowel-length 0.11) + +(define (singing_do_initial utt token) + (if (equal? (item.name token) "") + (let ((restlen (car (item.feat token 'rest)))) + (if singing-debug + (format t "restlen %l\n" restlen)) + (if (> restlen 0) + (let ((silence (car (car (cdr (assoc 'silences (PhoneSet.description))))))) + (set! singing_global_time restlen) + (item.relation.insert (utt.relation.first utt 'Segment) 'Segment + (list silence (list (list "end" singing_global_time))) + 'before)))))) + +(define (singing_do_syllable syl) + (let ((conslen 0.0) + (vowlen 0.0) + (segments (item.leafs (item.relation syl 'SylStructure)))) + ;; if there are no vowels, turn a middle consonant into a vowel; + ;; hopefully this works well for languages where syllables may be + ;; created by some consonants too + (let ((segments* segments) + (vowel-found nil)) + (while (and segments* (not vowel-found)) + (if (equal? "+" (item.feat (car segments*) "ph_vc")) + (set! vowel-found t) + (set! segments* (cdr segments*)))) + (if (not vowel-found) + (item.set_feat (nth (nint (/ (- (length segments) 1) 2)) + segments) + "singing-vc" "+"))) + ;; sum up the length of all of the vowels and consonants in + ;; this syllable + (mapcar (lambda (s) + (let ((slen (get_avg_duration (item.feat s "name")))) + (if (or (equal? "+" (item.feat s "ph_vc")) + (equal? "+" (item.feat s "singing-vc"))) + (set! vowlen (+ vowlen slen)) + (set! conslen (+ conslen slen))))) + segments) + (let ((totlen (+ conslen vowlen)) + (syldur (apply + (syl->durations syl))) + (addoffset (item.feat syl 'addoffset)) + (subtractoffset (item.feat syl 'subtractoffset)) + offset) + (set! offset (- subtractoffset addoffset)) + (if singing-debug + (format t "Vowlen: %f conslen: %f totlen: %f\n" vowlen conslen totlen)) + (if (< offset (/ syldur 2.0)) + (begin + (set! syldur (- syldur offset)) + (if singing-debug + (format t "Offset: %f\n" offset)))) + (if singing-debug + (format t "Syldur: %f\n" syldur)) + (if (> totlen syldur) + ;; if the total length of the average durations in the syllable is + ;; greater than the total desired duration of the syllable, stretch + ;; the time proportionally for each phone + (let ((stretch (/ syldur totlen))) + (mapcar (lambda (s) + (let ((slen (* stretch (get_avg_duration (item.feat s "name"))))) + (set! singing_global_time (+ slen singing_global_time)) + (item.set_feat s 'end singing_global_time))) + (item.leafs (item.relation syl 'SylStructure)))) + ;; otherwise, stretch the vowels and not the consonants + (let ((voweltime (- syldur conslen))) + (let ((vowelstretch (/ voweltime vowlen)) + (phones (mapcar car (car (cdar (PhoneSet.description '(phones))))))) + (mapcar (lambda (s) + (let ((slen (get_avg_duration (item.feat s "name")))) + (if (or (equal? "+" (item.feat s "ph_vc")) + (equal? "+" (item.feat s "singing-vc"))) + (begin + (set! slen (* vowelstretch slen)) + ;; If the sound is long enough, better results + ;; may be achieved by using longer versions of + ;; the vowels. + (if (> slen singing-max-short-vowel-length) + (let ((sname (string-append (item.feat s "name") ":"))) + (if (member_string sname phones) + (item.set_feat s "name" sname)))))) + (set! singing_global_time (+ slen singing_global_time)) + (item.set_feat s 'end singing_global_time))) + segments)))))) + (let ((restlen (car (syl->rest syl)))) + (if singing-debug + (format t "restlen %l\n" restlen)) + (if (> restlen 0) + (let ((lastseg (item.daughtern (item.relation syl 'SylStructure))) + (silence (car (car (cdr (assoc 'silences (PhoneSet.description)))))) + (singing_global_time* singing_global_time)) + (let ((seg (item.relation lastseg 'Segment)) + (extra-pause-length 0.00001)) + (set! singing_global_time (+ restlen singing_global_time)) + (item.insert seg (list silence (list (list "end" singing_global_time))) 'after) + ;; insert a very short extra pause to avoid after-effects, especially + ;; after vowels + (if (and seg + (equal? (item.feat seg "ph_vc") "+") + (< extra-pause-length restlen)) + (item.insert seg (list silence (list (list "end" (+ singing_global_time* + extra-pause-length)))) + 'after))))))) + +;; +;; singing_fix_segment +;; +;; Third pass. Finds any segments (phones) that we didn't catch earlier +;; (say if they didn't belong to a syllable, like silence) and sets them +;; to zero duration +;; + +(define (singing_fix_segment seg) + (if (equal? 0.0 (item.feat seg 'end)) + (if (equal? nil (item.prev seg)) + (item.set_feat seg 'end 0.0) + (item.set_feat seg 'end (item.feat (item.prev seg) 'end))) + (if singing-debug + (format t "segment: %s end: %f\n" (item.name seg) (item.feat seg 'end))))) + +;; returns the duration of a syllable (stored in its token) +(define (syl->durations syl) + (let ((index (item.feat syl "R:Syllable.pos_in_word")) + (durs (singing-feat syl "R:SylStructure.parent.R:Token.parent.dur"))) + (mapcar parse-number (nth index durs)))) + +;; returns the duration of the rest following a syllable +(define (syl->rest syl) + (let ((index (item.feat syl "R:Syllable.pos_in_word")) + (durs (singing-feat syl "R:SylStructure.parent.R:Token.parent.dur")) + (pauselen (singing-feat syl "R:SylStructure.parent.R:Token.parent.rest"))) + (if (equal? index (- (length durs) 1)) + (list (or pauselen 0.0)) + (list 0.0)))) + +;; get the average duration of a phone +(define (get_avg_duration phone) + (let ((pd (assoc_string phone phoneme_durations))) + (if pd + (car (cdr pd)) + 0.08))) + +;; get the duration offset of a phone (see the description above) +(define (get_duration_offset phone) + (parse-number (car (cdr (assoc_string phone phoneme_offsets*))))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; Other utility functions +;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(define (char-quote string) + (if (member string '("*" "+" "?" "[" "]" ".")) + (string-append "[" string "]") + string)) + +(define (split-string string separator) + (if (string-matches string (string-append ".+" (char-quote separator) ".+")) + (cons (string-before string separator) + (split-string (string-after string separator) separator)) + ;; We have to convert the weird XML attribute value type to string + (list (string-append string "")))) + +(define (string->list string) + (mapcar (lambda (s) (split-string s "+")) (split-string string ","))) + +(define (singing-append-feature! utt feature value) + (let ((tokens (utt.relation.items utt 'Token))) + (if tokens + ;; we have to wrap value into a list to work around a Festival bug + (item.set_feat (car (last tokens)) feature (list value)) + (begin + (utt.relation.append utt 'Token '("" ((name "") (whitespace "") + (prepunctuation "") (punc "")))) + (item.set_feat (car (last (utt.relation.items utt 'Token))) feature (list value)))))) + +(define (singing-feat item feature) + (let ((value (item.feat item feature))) + (if (equal? value 0) + nil + (car value)))) + +(define (current-language) + (cadr (car (assoc 'language (voice.description current-voice))))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; Initializing and exiting singing mode +;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;; +;; singing_init_func +;; + +(defvar singing_previous_eou_tree nil) + +(define (singing_init_func) + "(singing_init_func) - Initialization for Singing mode" + (if (not (symbol-bound? 'phoneme_durations)) + (set! phoneme_durations '())) + ;; use our intonation function + (Parameter.set 'Int_Method 'General) + (Parameter.set 'Int_Target_Method Int_Targets_General) + (set! int_general_params `((targ_func ,singing_f0_targets))) + (set! singing-last-f0 nil) + ;; use our duration function + (Parameter.set 'Duration_Method singing_duration_method) + ;; set phoneme corrections for the current language + (let ((language (cadr (assoc 'language + (cadr (voice.description current-voice)))))) + (set! phoneme_offsets* (cdr (assoc language phoneme_offsets)))) + ;; avoid splitting to multiple utterances with insertion of unwanted pauses + (set! singing_previous_eou_tree eou_tree) + (set! eou_tree nil) + ;; use our xml parsing function + (set! singing_previous_elements xxml_elements) + (set! xxml_elements singing_xml_elements)) + +;; +;; singing_exit_func +;; + +(define (singing_exit_func) + "(singing_exit_func) - Exit function for Singing mode" + (set! eou_tree singing_previous_eou_tree) + (set! xxml_elements singing_previous_elements)) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; Data tables +;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defvar note_names + '((C 0) + (C# 1) + (Db 1) + (D 2) + (D# 3) + (Eb 3) + (E 4) + (E# 5) + (Fb 4) + (F 5) + (F# 6) + (Gb 6) + (G 7) + (G# 8) + (Ab 8) + (A 9) + (A# 10) + (Bb 10) + (B 11) + (B# 12) + (Cb 11))) + +;; +;; The following list contains the offset into each phone that best +;; represents the perceptual onset of the phone. This is important +;; to know to get durations right in singing. For example, if the +;; offset for "t" is .060, and you want to start a "t" sound at +;; time 2.0 seconds, you should actually start the phone playing +;; at time 1.940 seconds in order for it to sound like the onset of +;; the "t" is really right at 2.0. +;; +;; These were derived empically by looking at and listening to the +;; waveforms of each phone for mwm's voice. +;; + +(defvar phoneme_offsets + `((english (t 0.050) + (T 0.050) + (d 0.090) + (D 0.090) + (p 0.080) + (b 0.080) + (k 0.090) + (g 0.100) + (9r 0.050) ;; r + (l 0.030) + (f 0.050) + (v 0.050) + (s 0.040) + (S 0.040) + (z 0.040) + (Z 0.040) + (n 0.040) + (N 0.040) + (m 0.040) + (j 0.090) + (E 0.0) + (> 0.0) + (>i 0.0) + (aI 0.0) + (& 0.0) + (3r 0.0) + (tS 0.0) + (oU 0.0) + (aU 0.0) + (A 0.0) + (ei 0.0) + (iU 0.0) + (U 0.0) + (@ 0.0) + (h 0.0) + (u 0.0) + (^ 0.0) + (I 0.0) + (dZ 0.0) + (i: 0.0) + (w 0.0) + (pau 0.0) + (brth 0.0) + (h# 0.0) + ))) + +(defvar phoneme_offsets* nil) + +;; +;; Declare the new mode to Festival +;; + +(set! tts_text_modes + (cons `(singing ;; mode name + ((init_func ,singing_init_func) + (exit_func ,singing_exit_func) + (analysis_type xml))) + tts_text_modes)) + +(provide 'singing-mode) diff --git a/lib/siteinit.scm b/lib/siteinit.scm new file mode 100644 index 0000000..7e2adac --- /dev/null +++ b/lib/siteinit.scm @@ -0,0 +1,57 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Site specific initialisation file +;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;; If festival's internal audio playing support doesn't work on your +;; machine you can make Festival use your own program to play waveform +;; files. Uncomment the following and change "play" to the name of +;; your local program that can play files + +;(Parameter.set 'Audio_Required_Format 'riff) +;(Parameter.set 'Audio_Command "afplay $FILE") +;(Parameter.set 'Audio_Method 'Audio_Command) + +;; If you want a voice different from the system installed default +;; uncomment the following line and change the name to the voice you +;; want + +;(set! voice_default 'voice_cmu_us_awb_arctic_hts) + +(provide 'siteinit) + + + + diff --git a/lib/soleml-mode.scm b/lib/soleml-mode.scm new file mode 100644 index 0000000..9856fb2 --- /dev/null +++ b/lib/soleml-mode.scm @@ -0,0 +1,336 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1998 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Support for an SGML based mark-up language used in the SOLE +;;; project. This is all still experimental. +;;; +;;; This currently treats one file as one utterance (to make dealing with +;;; the SOLE museaum database easy + +(set! soleml_word_features_stack nil) +(defvar sole_current_node nil) + +(define (soleml_token_to_words utt token name) + "(soleml_token_to_words utt token name) +SOLEML mode token specific analysis." + (cond + + (t + (soleml_previous_token_to_words utt token name)))) + +(define (voice_soleml) +"(soleml_voice) +Speaker specific initialisation for SOLE museum data." + (voice_rab_diphone) + ;; Utterances only come at end of file + (set! eou_tree '((0))) +) + +(defvar soleml_elements +'( + ("(SOLEML" (ATTLIST UTT) + ;; required to identify type + (voice_soleml) ;; so we know what state we start in + (set! soleml_utt (Utterance Tokens nil)) + (utt.stream.create soleml_utt 'Token) + (utt.relation.create soleml_utt 'SOLEML) + (set! sole_current_node + (utt.relation_append soleml_utt 'SOLEML (cons "sole-ml" ATTLIST))) + soleml_utt + ) + (")SOLEML" (ATTLIST UTT) + ;; required to identify end token + ;; Don't really want to synthesize this + ;; (xxml_synth UTT) ;; Synthesis the remaining tokens + (set! soleml_utt UTT) + UTT + ) + ;; Utterance break elements + ("(LANGUAGE" (ATTLIST UTT) + ;; Select a new language + (select_language (car (xxml_attval "NAME" ATTLIST))) + UTT) + ("(VOICE" (ATTLIST UTT) + ;;(xxml_synth UTT) + ;; Select a new voice + (cond + ((equal? (car (xxml_attval "NAME" ATTLIST)) 'male1) + (voice_soleml_diphone)) + ((equal? (car (xxml_attval "NAME" ATTLIST)) 'male2) + (voice_soleml_diphone)) + ((equal? (car (xxml_attval "NAME" ATTLIST)) 'male3) + (voice_soleml_diphone)) + (t + (print "SOLEML: selecting unknown voice") + (voice_soleml_diphone))) + UTT) + ;; phrase-boundary // mark on token (??) + ;; punct-elem // mark on token + ;; sem-elem + ;; text-elem // ignore + ;; rhet-elem has nucleus and satellite + ;; anaphora-elem + ;; syn-elem + ;; info-struct-elem + ;; other-elem + ("(PUNCT-ELEM" (ATTLIST UTT) + (soleml_push_word_features) + (set! xxml_word_features + (cons (list "punct-elem" "1") + (soleml_conv_attlist ATTLIST))) + UTT) + (")PUNCT-ELEM" (ATTLIST UTT) + (set! xxml_word_features (soleml_pop_word_features)) + UTT) + ("(PHRASE-BOUNDARY" (ATTLIST UTT) + (if (string-equal "4" (car (xxml_attval "STRENGTH" ATTLIST))) + (begin +;; (xxml_synth UTT) + UTT) + (let ((last_token (car (last (utt.stream UTT 'Token))))) + (if last_token + (item.set_feat last_token "pbreak" "B")) + UTT))) + ;; For each recursive element simply build a new node + ("(RHET-ELEM" (ATTLIST UTT) + (let ((sdesc (list 'rhet-elem (soleml_conv_attlist ATTLIST)))) + (set! sole_current_node + (node.append_daughter sole_current_node sdesc)) + UTT)) + (")RHET-ELEM" (ATTLIST UTT) + (set! sole_current_node (node.parent sole_current_node)) + UTT) + ("(RHET-EMPH" (ATTLIST UTT) + (let ((sdesc (list 'rhet-emph (soleml_conv_attlist ATTLIST)))) + (set! sole_current_node + (node.append_daughter sole_current_node sdesc)) + UTT)) + (")RHET-EMPH" (ATTLIST UTT) + (set! sole_current_node (node.parent sole_current_node)) + UTT) + ("(ANAPHORA-ELEM" (ATTLIST UTT) + (let ((sdesc (list 'anaphora-elem (soleml_conv_attlist ATTLIST)))) + (set! sole_current_node + (node.append_daughter sole_current_node sdesc)) + UTT)) + (")ANAPHORA-ELEM" (ATTLIST UTT) + (set! sole_current_node (node.parent sole_current_node)) + UTT) + ("(SYN-ELEM" (ATTLIST UTT) + (let ((sdesc (list 'syn-elem (soleml_conv_attlist ATTLIST)))) + (set! sole_current_node + (node.append_daughter sole_current_node sdesc)) + UTT)) + (")SYN-ELEM" (ATTLIST UTT) + (set! sole_current_node (node.parent sole_current_node)) + UTT) + ("(CONNECTIVE" (ATTLIST UTT) + (let ((sdesc (list 'connective (soleml_conv_attlist ATTLIST)))) + (set! sole_current_node + (node.append_daughter sole_current_node sdesc)) + UTT)) + (")CONNECTIVE" (ATTLIST UTT) + (set! sole_current_node (node.parent sole_current_node)) + UTT) + ("(TEXT-ELEM" (ATTLIST UTT) + (let ((sdesc (list 'text-elem (soleml_conv_attlist ATTLIST)))) + (set! sole_current_node + (node.append_daughter sole_current_node sdesc)) + UTT)) + (")TEXT-ELEM" (ATTLIST UTT) + (set! sole_current_node (node.parent sole_current_node)) + UTT) + ("(SEM-ELEM" (ATTLIST UTT) + (let ((sdesc (list 'sem-elem (soleml_conv_attlist ATTLIST)))) + (set! sole_current_node + (node.append_daughter sole_current_node sdesc)) + UTT)) + (")SEM-ELEM" (ATTLIST UTT) + (set! sole_current_node (node.parent sole_current_node)) + UTT) + ("(INFO-STRUCT-ELEM" (ATTLIST UTT) + (let ((sdesc (list 'info-struct-elem (soleml_conv_attlist ATTLIST)))) + (set! sole_current_node + (node.append_daughter sole_current_node sdesc)) + UTT)) + (")INFO-STRUCT-ELEM" (ATTLIST UTT) + (set! sole_current_node (node.parent sole_current_node)) + UTT) + ("(OTHER-ELEM" (ATTLIST UTT) + (let ((sdesc (list 'other-elem (soleml_conv_attlist ATTLIST)))) + (set! sole_current_node + (node.append_daughter sole_current_node sdesc)) + UTT)) + (")OTHER-ELEM" (ATTLIST UTT) + (set! sole_current_node (node.parent sole_current_node)) + UTT) + ("(NUCLEUS" (ATTLIST UTT) + (let ((sdesc (list 'nucleus (soleml_conv_attlist ATTLIST)))) + (set! sole_current_node + (node.append_daughter sole_current_node sdesc)) + UTT)) + (")NUCLEUS" (ATTLIST UTT) + (set! sole_current_node (node.parent sole_current_node)) + UTT) + ("(SATELLITE" (ATTLIST UTT) + (let ((sdesc (list 'satellite (soleml_conv_attlist ATTLIST)))) + (set! sole_current_node + (node.append_daughter sole_current_node sdesc)) + UTT)) + (")SATELLITE" (ATTLIST UTT) + (set! sole_current_node (node.parent sole_current_node)) + UTT) + ;; Other control functions (probably not used in SOLE) + ("(CALL" (ATTLIST UTT) +;; (xxml_synth UTT) + (if (string-matches (car (xxml_attval "ENGID" ATTLIST)) "festival.*") + (let ((comstr "")) + (mapcar + (lambda (c) (set! comstr (string-append comstr " " c))) + (xxml_attval "COMMAND" ATTLIST)) + (eval (read-from-string comstr)))) + UTT) + ("(DEFINE" (ATTLIST UTT) +;; (xxml_synth UTT) + (if (not (string-equal "NATIVE" (car (xxml_attval "SCHEME" ATTLIST)))) + (format t "DEFINE: unsupported SCHEME %s, definition ignored\n" + (car (xxml_attval "SCHEME" ATTLIST))) + (lex.add.entry + (list + (car (xxml_attval "WORDS" ATTLIST)) ;; head form + nil ;; pos + (lex.syllabify.phstress (xxml_attval "PRONS" ATTLIST))))) + UTT) + ("(SOUND" (ATTLIST UTT) +;; (xxml_synth UTT) + (if (not soleml_omitted_mode) + (apply_hooks tts_hooks + (eval (list 'Utterance 'Wave + (car (xxml_attval "SRC" ATTLIST)))))) + UTT) + ("(EMPH" (ATTLIST UTT) + ;; Festival is particularly bad at adding specific emphasis + ;; that's what happens when you use statistical methods that + ;; don't include any notion of emphasis + ;; This is *not* recursive + (soleml_push_word_features) + (set! xxml_word_features + (cons (list "EMPH" "1") xxml_word_features)) + UTT) + (")EMPH" (ATTLIST UTT) + (set! xxml_word_features (soleml_pop_word_features)) + UTT) + ("(WORD" (ATTLIST UTT) + ;; a word in-line + (let ((name (xxml_attval "NAME" ATTLIST)) + (pos (xxml_attval "POS" ATTLIST)) + (accent (xxml_attval "ACCENT" ATTLIST)) + (tone (xxml_attval "TONE" ATTLIST)) + (phonemes (xxml_attval "PHONEMES" ATTLIST)) + token) + (utt.item.insert UTT 'Token) ;; add new Token + (set! token (utt.stream.tail UTT 'Token)) + (item.set_name token (car name)) + (if pos (item.set_feat token "pos" (car pos))) + (if accent (item.set_feat token "accent" (car accent))) + (if tone (item.set_feat token "tone" (car tone))) + (if phonemes (item.set_feat token "phonemes" + (format nil "%l" phonemes))) + UTT)) +)) + +(define (soleml_init_func) + "(soleml_init_func) +Initialisation for SOLEML mode" + (voice_soleml) + (set! soleml_previous_elements xxml_elements) + (set! xxml_elements soleml_elements) + (set! xxml_token_hooks soleml_token_function) + (set! soleml_previous_token_to_words english_token_to_words) + (set! english_token_to_words soleml_token_to_words) + (set! token_to_words soleml_token_to_words)) + +(define (soleml_exit_func) + "(soleml_exit_func) +Exit function for SOLEML mode" + (set! xxml_elements soleml_previous_elements) + (set! token_to_words soleml_previous_token_to_words) + (set! english_token_to_words soleml_previous_token_to_words)) + +(define (soleml_token_function si) +"(soleml_token_function si) +This is called for each token found." + (node.append_daughter sole_current_node si)) + +(define (soleml_push_word_features) +"(soleml_push_word_features) +Save current word features on stack." + (set! soleml_word_features_stack + (cons xxml_word_features soleml_word_features_stack))) + +(define (soleml_pop_word_features) +"(soleml_pop_word_features) +Pop word features from stack." + (let ((r (car soleml_word_features_stack))) + (set! soleml_word_features_stack (cdr soleml_word_features_stack)) + r)) + +(define (soleml_conv_attlist alist) +"(soleml_conv_attlist alist) +Flatten alist arguments." + (cond + ((null alist) nil) + ((null (car (cdr (car alist)))) + (soleml_conv_attlist (cdr alist))) + ((equal? (length (car (cdr (car alist)))) 1) + (cons + (list (car (car alist)) (car (car (cdr (car alist))))) + (soleml_conv_attlist (cdr alist)))) + (t + (cons + (list (car (car alist)) (format nil "%l" (car (cdr (car alist))))) + (soleml_conv_attlist (cdr alist)))))) + +(set! tts_text_modes + (cons + (list + 'soleml ;; mode name + (list ;; email mode params + (list 'init_func soleml_init_func) + (list 'exit_func soleml_exit_func) + '(analysis_type xxml) + (list 'filter + (format nil "%s -D %s " sgml_parse_progname libdir)))) + tts_text_modes)) + +(provide 'soleml-mode) diff --git a/lib/speech.properties b/lib/speech.properties new file mode 100644 index 0000000..507a519 --- /dev/null +++ b/lib/speech.properties @@ -0,0 +1,2 @@ +# Register speech engines +cstr.festival.EngineCentral=cstr.festival.jsapi.EngineCentral diff --git a/lib/synthesis.scm b/lib/synthesis.scm new file mode 100644 index 0000000..69c5d56 --- /dev/null +++ b/lib/synthesis.scm @@ -0,0 +1,443 @@ + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + ;; ;; + ;; Centre for Speech Technology Research ;; + ;; University of Edinburgh, UK ;; + ;; Copyright (c) 1996,1997 ;; + ;; All Rights Reserved. ;; + ;; ;; + ;; Permission is hereby granted, free of charge, to use and distribute ;; + ;; this software and its documentation without restriction, including ;; + ;; without limitation the rights to use, copy, modify, merge, publish, ;; + ;; distribute, sublicense, and/or sell copies of this work, and to ;; + ;; permit persons to whom this work is furnished to do so, subject to ;; + ;; the following conditions: ;; + ;; 1. The code must retain the above copyright notice, this list of ;; + ;; conditions and the following disclaimer. ;; + ;; 2. Any modifications must be clearly marked as such. ;; + ;; 3. Original authors' names are not deleted. ;; + ;; 4. The authors' names are not used to endorse or promote products ;; + ;; derived from this software without specific prior written ;; + ;; permission. ;; + ;; ;; + ;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; + ;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; + ;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; + ;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; + ;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; + ;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; + ;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; + ;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; + ;; THIS SOFTWARE. ;; + ;; ;; + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + ;; ;; + ;; Author: Richard Caley (rjc@cstr.ed.ac.uk) ;; + ;; Date: Fri Aug 15 1997 ;; + ;; ------------------------------------------------------------------- ;; + ;; New synthesis mainline. ;; + ;; ;; + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + ;; ;; + ;; Hooks to add to the synthesis process. ;; + ;; ;; + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defvar default_before_synth_hooks nil + "default_before_synth_hooks + The default list of functions to be run on all synthesized utterances + before synthesis starts.") + +(defvar before_synth_hooks default_before_synth_hooks + "before_synth_hooks + List of functions to be run on synthesised utterances before synthesis + starts.") + +(defvar default_after_analysis_hooks nil + "default_after_analysis_hooks + The default list of functions to be run on all synthesized utterances + after analysis but before synthesis.") + +(defvar after_analysis_hooks default_after_analysis_hooks + "after_analysis_hooks + List of functions to be applied after analysis and before synthesis.") + +(defvar default_after_synth_hooks nil + "default_after_synth_hooks + The default list of functions to be run on all synthesized utterances + after Wave_Synth. This will normally be nil but if for some reason you + need to change the gain or rescale *all* waveforms you could set the + function here, in your siteinit.scm.") + +(defvar after_synth_hooks default_after_synth_hooks + "after_synth_hooks + List of functions to be applied after all synthesis modules have been + applied. This is primarily designed to allow waveform manipulation, + particularly resampling and volume changes.") + +(defvar default_access_strategy 'ondemand + "default_access_strategy + How to access units from databases.") + + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + ;; ;; + ;; Macro to define utterance types. ;; + ;; ;; + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defmac (defUttType form) + (list 'defUttType_real + (list 'quote (cadr form)) + (list 'quote (cddr form)))) + +(defvar UttTypes nil + "UttTypes + List of types and functions used by the utt.synth function to call + appropriate methods.") + +(define (defUttType_real type form) + "(defUttType TYPE . BODY) + Define a new utterance type. TYPE is an atomic type that is specified + as the first argument to the function Utterance. BODY is evaluated + with argument utt, when utt.synth is called with an utterance of type + TYPE. You almost always require the function Initialize first. + [see Utterance types]" + ;;; Yes I am cheating a bit with the macro/function name. + ;;; should check about redefining and the syntax of the forms + (set! UttTypes + (cons + (cons type form) + UttTypes)) + type) + + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + ;; ;; + ;; Macro to define synthesis types. ;; + ;; ;; + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defmac (defSynthType form) + (list 'defSynthType_real + (list 'quote (cadr form)) + (list 'quote (cddr form)))) + +(defvar SynthTypes nil + "SynthTypes + List of synthesis types and functions used by the utt.synth function to + call appropriate methods for wave synthesis.") + +(define (defSynthType_real type form) + "(defSynthType TYPE . BODY) + Define a new wave synthesis type. TYPE is an atomic type that + identifies the type of synthesis. BODY is evaluated with argument + utt, when utt.synth is called with an utterance of type TYPE. + [see Utterance types]" + + (set! SynthTypes + (cons + (cons type form) + SynthTypes)) + type) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Some actual Utterance type definitions +;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defUttType Words + (Initialize utt) + (POS utt) + (Phrasify utt) + (Word utt) + (Pauses utt) + (Intonation utt) + (PostLex utt) + (Duration utt) + (Int_Targets utt) + (Wave_Synth utt) + ) + +(defUttType Text + (Initialize utt) + (Text utt) + (Token_POS utt) + (Token utt) + (POS utt) + (Phrasify utt) + (Word utt) + (Pauses utt) + (Intonation utt) + (PostLex utt) + (Duration utt) + (Int_Targets utt) + (Wave_Synth utt) + ) + +(defUttType Tokens ;; This is used in tts_file, Tokens will be preloaded + (Token_POS utt) ;; when utt.synth is called + (Token utt) + (POS utt) + (Phrasify utt) + (Word utt) + (Pauses utt) + (Intonation utt) + (PostLex utt) + (Duration utt) + (Int_Targets utt) + (Wave_Synth utt) + ) + +(defUttType Concept ;; rather gradious name for when information has + (POS utt) ;; been preloaded (probably XML) to give a word + (Phrasify utt) ;; relation (SOLE uses this) + (Word utt) + (Pauses utt) + (Intonation utt) + (PostLex utt) + (Duration utt) + (Int_Targets utt) + (Wave_Synth utt) + ) + +(defUttType Phrase + (Initialize utt) + (Token_POS utt) + (Token utt) + (POS utt) + (Phrasify utt) + (Word utt) + (Pauses utt) + (Intonation utt) + (PostLex utt) + (Duration utt) + (Int_Targets utt) + (Wave_Synth utt) + ) + +(defUttType Segments + (Initialize utt) + (Wave_Synth utt) + ) + +(defUttType Phones + (Initialize utt) + (Fixed_Prosody utt) + (Wave_Synth utt) + ) + +(defUttType SegF0 + (Wave_Synth utt) + ) + +(defUttType Wave + (Initialize utt)) + + + + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + ;; ;; + ;; And some synthesis types. ;; + ;; ;; + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(defSynthType Taylor + (Taylor_Synthesize utt) + ) + +(defSynthType UniSyn + (defvar UniSyn_module_hooks nil) + (Param.def "unisyn.window_name" "hanning") + (Param.def "unisyn.window_factor" 1.0) + (Parameter.def 'us_sigpr 'lpc) + + (apply_hooks UniSyn_module_hooks utt) ;; for processing of diphone names + (us_get_diphones utt) + (us_unit_concat utt) + + (if (not (member 'f0 (utt.relationnames utt))) + (targets_to_f0 utt)) +;; temporary fix + (if (utt.relation.last utt 'Segment) + (set! pm_end (+ (item.feat (utt.relation.last utt 'Segment) "end") 0.02)) + (set! pm_end 0.02)) + + (us_f0_to_pitchmarks utt 'f0 'TargetCoef pm_end) + (us_mapping utt 'segment_single) + (cond + ((string-equal "td_psola" (Parameter.get 'us_sigpr)) + ;; Not in standard distribution, so has to be separate function + (us_tdpsola_synthesis utt 'analysis_period)) + (t + ;; All the rest + (us_generate_wave utt (Parameter.get 'us_sigpr) + 'analysis_period))) +) + +(defSynthType None + ;; do nothing + utt + ) + +(defSynthType Standard + (print "synth method: Standard") + + (let ((select (Parameter.get 'SelectionMethod))) + (if select + (progn + (print "select") + (apply select (list utt)) + ) + ) + ) + + (let ((join (Parameter.get 'JoiningMethod))) + (if join + (progn + (print "join") + (apply join (list utt)) + ) + ) + ) + + (let ((impose (Parameter.get 'ImposeMethod))) + (if impose + (progn + (print "impose") + (apply impose (list utt)) + ) + ) + ) + + (let ((power (Parameter.get 'PowerSmoothMethod))) + (if power + (progn + (print "power") + (apply power (list utt)) + ) + ) + ) + + (let ((wavesynthesis (Parameter.get 'WaveSynthesisMethod))) + (if wavesynthesis + (progn + (print "synthesis") + (apply wavesynthesis (list utt)) + ) + ) + ) + ) + +(defSynthType Minimal + (print "synth method: Minimal") + + (let ((select (Parameter.get 'SelectionMethod))) + (if select + (progn + (print "select") + (apply select (list utt)) + ) + ) + ) + + (let ((wavesynthesis (Parameter.get 'WaveSynthesisMethod))) + (if wavesynthesis + (progn + (print "synthesis") + (apply wavesynthesis (list utt "Unit" "Join" "Wave")) + ) + ) + ) + ) + + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + ;; ;; + ;; Finally the actual driver function. ;; + ;; ;; + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(define (utt.synth utt) + + "(utt.synth UTT) + The main synthesis function. Given UTT it will apply the + functions specified for UTT's type, as defined with deffUttType + and then those demanded by the voice. After modules have been + applied synth_hooks are applied to allow extra manipulation. + [see Utterance types]" + + (apply_hooks before_synth_hooks utt) + + (let ((type (utt.type utt))) + (let ((definition (assoc type UttTypes))) + (if (null? definition) + (error "Unknown utterance type" type) + (let ((body (eval (cons 'lambda + (cons '(utt) (cdr definition)))))) + (body utt))))) + + (apply_hooks after_synth_hooks utt) + utt) + + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + ;; ;; + ;; And a couple of utility expressions. ;; + ;; ;; + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(define (SayText text) +"(SayText TEXT) +TEXT, a string, is rendered as speech." + (utt.play (utt.synth (eval (list 'Utterance 'Text text))))) + +(define (SynthText text) +"(SynthText TEXT) +TEXT, a string, is rendered as speech." + (utt.synth (eval (list 'Utterance 'Text text)))) + +(define (SayPhones phones) +"(SayPhones PHONES) +PHONES is a list of phonemes. This uses the Phones type utterance +to synthesize and play the given phones. Fixed duration specified in +FP_duration and fixed monotone duration (FP_F0) are used to generate +prosody." + (utt.play (utt.synth (eval (list 'Utterance 'Phones phones))))) + + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + ;; ;; + ;; This is the standard synthesis function. The Wave Synthesis may be ;; + ;; more than a simple module ;; + ;; ;; + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + + +(define (Wave_Synth utt) +"(Wave_Synth UTT) + Generate waveform from information in UTT, at least a Segment stream + must exist. The actual form of synthesis used depends on the Parameter + Synth_Method. If it is a function that is applied. If it is atom it + should be a SynthType as defined by defSynthType + [see Utterance types]" + (apply_hooks after_analysis_hooks utt) + (let ((method_val (Parameter.get 'Synth_Method))) + (cond + ((null method_val) + (error "Undefined Synth_Method")) + ((and (symbol? method_val) (symbol-bound? method_val)) + ;; Wish there was a function? + (apply (symbol-value method_val) (list utt))) + ((member (typeof method_val) '(subr closure)) + (apply method_val (list utt))) + (t ;; its a defined synthesis type + (let ((synthesis_modules (assoc_string method_val SynthTypes))) + (if (null? synthesis_modules) + (error (format nil "Undefined SynthType %s\n" method_val)) + (let ((body (eval (cons 'lambda + (cons '(utt) (cdr synthesis_modules)))))) + (body utt))))))) + utt) + +(provide 'synthesis) + + + diff --git a/lib/tilt.scm b/lib/tilt.scm new file mode 100644 index 0000000..92dbec6 --- /dev/null +++ b/lib/tilt.scm @@ -0,0 +1,972 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Author: Alan W Black, Kurt Dusterhoff, Janet Hitzeman +;;; Date: April 1999 +;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Tilt intonation modules, accent/boundary preditions and F0 generation +;;; The F0 generation is done using models as described in +;;; Dusterhoff, K. and Black, A. (1997). "Generating F0 contours for +;;; speech synthesis using the Tilt intonation theory" +;;; (http://www.cstr.ed.ac.uk/awb/papers/esca-int97.ps) +;;; Proceedings of ESCA Workshop of Intonation, pp 107-110, September, +;;; Athens, Greece. +;;; +;;; Intonation_Tilt assigns accents and boundaries by a CART tree +;;; the c and sil nodes are derived directly duration creation +;;; +;;; Int_Targets_Tilt generates the F0 using the CART trees as +;;; described in the paper referenced above. +;;; +;;; THIS CONTAINS *VERY* EXPERIMENTAL CODE +;;; it requires a thoroughly clean up and probably split into +;;; multiple files + +(defvar int_tilt_params nil + "int_tilt_params +Parameters for tilt intonation model.") + +(Parameter.def 'tilt_method 'cart) + +(define (Intonation_Tilt utt) + "(Intonation_Tilt utt) +Assign accent and boundary IntEvents to each syllable, and fill in +spaces with silence and connections." + (let (accent boundary) + ;; Create basic intonation relations + (utt.relation.create utt 'Intonation) + (utt.relation.create utt 'IntonationSyllable) + (mapcar + (lambda (syl) + ;; If first syllable in phrase add phrase_start + (if (string-equal "pau" + (item.feat syl "R:SylStructure.daughter1_to.Segment.p.name")) + (tilt_add_intevent utt syl 'phrase_start)) + + (set! accent (wagon_predict syl tilt_a_cart_tree)) + (set! boundary (wagon_predict syl tilt_b_cart_tree)) +; (format t "%s: accent %s boundary %s\n" +; (item.feat syl "R:WordStructure.root.name") +; accent +; boundary) + (if (not (string-equal accent "0")) + (tilt_add_intevent utt syl accent)) + (if (not (string-equal boundary "0")) + (if (and (string-equal boundary "afb") + (not (string-equal accent "0"))) + (tilt_add_intevent utt syl "fb") ;; can't have a/afb + (tilt_add_intevent utt syl boundary))) + + ;; If last syllable in phrase add phrase_end + (if (string-equal "pau" + (item.feat syl "R:SylStructure.daughtern_to.Segment.n.name")) + (tilt_add_intevent utt syl 'phrase_end))) + (utt.relation.items utt 'Syllable)) +;; (utt.relation.print utt 'Intonation) + utt)) + +(define (tilt_add_intevent utt syl name) +"(tilt_add_intevent utt syl name) +Add a new IntEvent related to syl with name." + (let (ie) + (set! ie (utt.relation.append utt 'Intonation (list name))) + (if (not (item.relation syl 'IntonationSyllable)) + (utt.relation.append utt 'IntonationSyllable syl)) + (item.relation.append_daughter syl 'IntonationSyllable ie) + (if (not (string-matches name "phrase_.*")) + (item.set_feat ie "int_event" 1)) + ie)) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Fo generate through tilt parameters and F0 rendering +;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +(define (Int_Targets_Tilt utt) + "(Int_Targets_Tilt utt) +Assign Tilt parameters to each IntEvent and then generate the +F0 contour and assign targets." + (utt.relation.set_feat utt "Intonation" "intonation_style" "tilt") + (tilt_assign_parameters utt) +; (tilt_F0_and_targets utt) ;; this has to be C++, sorry +; (tilt_map_f0_range utt) + (tilt_to_f0 utt "f0") + (tilt_validate utt) + utt +) + +(define (tilt_validate utt) + "(tilt_validate utt) +Checks that the predicted tilt parameter fall with reasonable +limits and modify them where possible to be more reasonable." + (mapcar + (lambda (ie) + (cond + ((string-equal (item.name ie) "phrase_end") + ;; check previous event does overflow segments + ) + (t + t)) + ) + (utt.relation.items utt 'Intonation)) +) + +(define (tilt_map_f0_range utt) + "(tilt_map_f0_range utt) +In order fo better trained models to be used for voices which don't +have the necessary data to train models from the targets may be mapped +to a different pitch range. Note this is not optimal as pitch ranges +don't map that easily, but the the results can sometimes be better than +using a less sophisticated F0 generation model. The method used +is to define the mean and standard deviation of the speaker the +model was trained on and the mean and standard deciation of the +desired speaker. Mapping is by converting the actual F0 value +to zscores (distance from mean in number of stddev) and back into +the other domain. The variable int_tilt_params is used to find +the values." + (let ((target_f0_mean (car (cdr (assoc 'target_f0_mean int_tilt_params)))) + (target_f0_std (car (cdr (assoc 'target_f0_std int_tilt_params)))) + (model_f0_std (car (cdr (assoc 'model_f0_std int_tilt_params)))) + (model_f0_mean (car (cdr (assoc 'model_f0_mean int_tilt_params))))) + (if target_f0_mean ;; only if one is specified + (lambda (targ) + (item.set_name + targ + (+ target_f0_mean + (* target_f0_std + (/ (- (parse-number (item.name targ)) + model_f0_mean) + model_f0_std))))) + (utt.relation.leafs utt 'Target)))) + +(define (tilt_assign_parameters utt) + "(tilt_assign_parameters utt) +Assigned tilt parameters to IntEvents, depending on the value +of the Parameter tilt_method uses wagon trees (cart) or linear +regression models (lr)." + (let ((method (Parameter.get 'tilt_method))) + (cond + ((equal? method 'cart) + (tilt_assign_parameters_wagon utt)) + ((equal? method 'lr) + (tilt_assign_parameters_lr utt)) + (t + (error "Tilt: unknown tilt param prediction method: " tilt_method))))) + +(define (tilt_assign_parameters_wagon utt) + "(tilt_assign_parameters_wagon utt) +Assing parameters (start_f0, tilt, amplitude, peak_pos and duration) +to each IntEvent. Uses Wagon trees to predict values" + (mapcar + (lambda (ie) + (let ((param_trees (cdr (assoc_string (item.name ie) + tilt_param_trees)))) + (item.set_feat ie "time_path" "IntonationSyllable") + (if (string-equal "1" (item.feat ie "int_event")) + (item.set_function ie "time" "unisyn_tilt_event_position") + (item.set_function ie "time" "unisyn_tilt_phrase_position")) + (cond + ((null param_trees) + (format stderr "Tilt: unknown Intonation type %s, ignored\n" + (item.name ie)) + ;; *need* to assign default values + (item.set_feat ie "ev.f0" 100) + (item.set_feat ie "tilt.amp" 20.0) + (item.set_feat ie "tilt.dur" 0.25) + (item.set_feat ie "tilt.tilt" -0.2) + (item.set_feat ie "rel_pos" 0.0) + ) + (t + (tilt_assign_params_wagon ie param_trees))))) + (utt.relation.items utt 'Intonation))) + +(define (tilt_assign_params_wagon ie trees) + "(tilt_assign_params_wagon ie trees) +Assign the names parameters to ie using the trees and names in +trees." + (mapcar + (lambda (tree) + (let ((val (wagon_predict ie (car (cdr tree))))) + (item.set_feat ie (car tree) val))) + trees)) + +(define (tilt_assign_parameters_lr utt) + "(tilt_assign_parameters_lr utt) +Assing parameters (start_f0, tilt, amplitude, peak_pos and duration) +to each IntEvent. Prediction by linear regression models" + (mapcar + (lambda (ie) + (let ((param_lrmodels (cdr (assoc_string (item.name ie) + tilt_param_lrmodels)))) + (cond + ((null param_lrmodels) + (format stderr "Tilt: unknown IntEvent type %s, ignored\n" + (item.name ie)) + ;; *need* to assign default values + (item.set_feat ie "ev.f0" 100) + (item.set_feat ie "tilt.amp" 20.0) + (item.set_feat ie "tilt.dur" 0.25) + (item.set_feat ie "tilt.tilt" -0.2) + (item.set_feat ie "rel_pos" 0.0) + ) + (t + (tilt_assign_params_lr ie param_lrmodels))))) + (utt.relation.items utt 'IntEvent))) + +(define (tilt_assign_params_lr ie lrmodels) + "(tilt_assign_params_lr ie lrmodels) +Assign the names parameters to ie using the trees and names in +trees." + (mapcar + (lambda (lrm) + (let ((val (lr_predict ie (cdr lrm)))) + (item.set_feat ie (car lrm) val))) + lrmodels)) + +(define (utt.save.tilt_events utt filename) +"(utt.save.til_events UTT FILENAME) +Save tilt events in UTT to FILENAME in a format suitable for +ev_synth." + (let ((fd (fopen filename "w"))) + (format fd "#\n") + (mapcar + (lambda (ie) + (let ((name (item.name ie))) + (cond + ((or (string-equal name "sil") + (string-equal name "c")) + (format fd " %2.4f 100 %s; tilt: %2.6f\n" + (item.feat ie 'end) + name + (item.feat ie "tilt_start_f0"))) + (t ;; accent or boundary + (format fd " %2.4f 100 %s; tilt: %2.6f %2.6f %2.6f %2.6f %2.6f\n" + (item.feat ie 'end) + name + (item.feat ie "ev.f0") + (item.feat ie "tilt.amp") + (item.feat ie "tilt.dur") + (item.feat ie "tilt.tilt") + (item.feat ie "rel_pos")))))) + (utt.relation.items utt 'IntEvent)) + (fclose fd) + utt)) + + +;;;;; +;;; Some features which should be pruned +;;;;; + +(def_feature_docstring 'Syllable.lisp_time_to_next_vowel + "Syllable.lisp_time_to_next_vowel syl + The time from vowel_start to next vowel_start") +(define (time_to_next_vowel syl) + "(time_to_next_vowel syl) + The time from vowel_start to next vowel_start" + (let (ttnv) + (if (string-equal "0" (item.feat syl "n.vowel_start")) + (set! ttnv 0.00) + (set! ttnv (- (item.feat syl "n.vowel_start") + (item.feat syl "vowel_start")))) + ttnv)) + +(def_feature_docstring 'Syllable.lisp_next_stress + "Syllable.lisp_next_stress + Number of syllables to next stressed syllable. 0 if this syllable is + stressed. It is effectively assumed the syllable after the last syllable + is stressed.") +(define (next_stress syl) + (cond + ((null syl) 0) + ((string-equal (item.feat syl 'stress_num) "1") + 0) + (t + (+ 1 (next_stress (item.relation.next syl 'Syllable)))))) + +(def_feature_docstring 'Syllable.lisp_last_stress + "Syllable.lisp_last_stress + Number of syllables from previous stressed syllable. 0 if this syllable + is stressed. It is effectively assumed that the syllable before the + first syllable is stressed.") +(define (last_stress syl) + (cond + ((null syl) 0) + ((string-equal (item.feat syl 'stress_num) "1") + 0) + (t + (+ 1 (last_stress (item.relation.prev syl 'Syllable)))))) + + +(def_feature_docstring 'SylStructure.lisp_length_to_last_seg + "SylStructure.lisp_length_to_last_seg + Length from start of the vowel to start of last segment of syllable.") +(define (length_to_last_seg syl) + (- (item.feat syl "daughtern_to.Segment.start") + (item.feat syl "vowel_start"))) + +(def_feature_docstring 'SylStructure.lisp_get_rhyme_length + "Syllable.lisp_get_rhyme_length + Length from start of the vowel to end of syllable.") +(define (get_rhyme_length syl) + (- (item.feat syl 'end) + (item.feat syl 'vowel_start syl))) + +(def_feature_docstring 'SylStructure.lisp_get_onset_length + "Syllable.lisp_get_onset_length + Length from start of syllable to start of vowel.") +(define (get_onset_length syl) + (cond + ((< (- (item.feat syl 'vowel_start) + (item.feat syl 'start)) + 0.000) + 0.000) ;; just in case + (t + (- (item.feat syl 'vowel_start) + (item.feat syl 'start))))) + +(def_feature_docstring 'Syllable.lisp_tilt_accent + "Syllable.lisp_tilt_accent + Returns \"a\" if there is a tilt accent related to this syllable, 0 + otherwise.") +(define (tilt_accent syl) + (let ((events (item.relation.daughters syl 'IntonationSyllable)) + (r "0")) + (mapcar + (lambda (i) + (if (member_string (item.name i) tilt_accent_list) + (set! r "a"))) + events) + r)) + +(def_feature_docstring 'Syllable.lisp_tilt_boundary + "Syllable.lisp_tilt_boundary + Returns boundary label if there is a tilt boundary related to this +syllable, 0 otherwise.") +(define (tilt_boundary syl) + (let ((events (item.relation.daughters syl 'IntonationSyllable)) + (r "0")) + (mapcar + (lambda (i) + (let ((name (item.name i))) + (if (member_string name tilt_boundary_list) + (cond + ((string-matches name "a.*") + (set! r (string-after name "a"))) + ((string-matches name "m.*") + (set! r (string-after name "m"))) + (t + (set! r name)))))) + events) + r)) + +(def_feature_docstring 'Syllable.lisp_tilt_accented + "Syllable.lisp_tilt_accented + Returns 1 if there is a tilt accent related to this syllable, 0 + otherwise.") +(define (tilt_accented syl) + (let ((events (item.relation.daughters syl 'IntonationSyllable)) + (r "0")) + (mapcar + (lambda (i) + (if (member_string (item.name i) tilt_accent_list) + (set! r "1"))) + events) + r)) + +(def_feature_docstring 'Syllable.lisp_tilt_boundaried + "Syllable.lisp_tilt_boundaried + Returns 1 if there is a tilt boundary related to this syllable, 0 + otherwise.") +(define (tilt_boundaried syl) + (let ((events (item.relation.daughters syl 'IntonationSyllable)) + (r "0")) + (mapcar + (lambda (i) + (if (member_string (item.name i) tilt_boundary_list) + (set! r "1"))) + events) + r)) + +(def_feature_docstring 'SylStructure.lisp_vowel_height + "SylStructure.lisp_vowel_height syl +Classifies vowels as high, low or mid") +(define (vowel_height syl) + (let ((vh (item.feat syl "daughtern.daughter1.daughter1.df.height"))) + vh) +) + +(def_feature_docstring 'SylStructure.lisp_vowel_frontness + "SylStructure.vowel_frontness syl +Classifies vowels as front, back or mid") +(define (vowel_frontness syl) + (let ((vf (item.feat syl "daughtern.daughter1.daughter1.df.front"))) + vf) +) + +(def_feature_docstring 'SylStructure.lisp_vowel_length + "SylStructure.vowel_length syl +Returns the df.length feature of a syllable's vowel") +(define (vowel_length syl) + (let ((vl (item.feat syl "daughtern.daughter1.daughter1.df.length"))) + vl) +) + +(defvar sonority_vless_obst '("f" "h" "hh" "k" "p" "s" "sh" "t" "th" "ch") + "sonority_vless_obst +List of voiceless obstruents for use in sonority scaling (only good w/ radio_speech)" + ) +(defvar sonority_v_obst '("v" "b" "g" "z" "zh" "d" "dh" "jh") + "sonority_v_obst +List of voiced obstruents for use in sonority scaling (only good w/ radio_speech)" + ) +(defvar sonority_nas '("m" "n" "ng" "nx" "em" "en") + "sonority_nas +List of nasals (only good w/ radio_speech)" + ) +(defvar sonority_liq '("r" "l" "er" "el" "axr") + "sonority_liq +List of liquids (only good w/ radio_speech)" + ) +(defvar sonority_glides '("y" "w") + "sonority_glides +List of glides (only good w/ radio_speech)" + ) + +(def_feature_docstring 'SylStructure.lisp_sonority_scale_coda + "SylStructure.sonority_scale_coda syl +Returns value on sonority scale (1 -6, where 6 is most sonorous) +for the coda of a syllable, based on least sonorant portion.") +(define (sonority_scale_coda syl) + (let ((segs (item.daughters (item.daughtern (item.daughtern syl)))) + (scale 6)) + (mapcar + (lambda (seg) + (cond + ((member_string (item.name seg) sonority_vless_obst) + (if (> scale 1) + (set! scale 1))) + ((member_string (item.name seg) sonority_v_obst) + (if (> scale 2) + (set! scale 2))) + ((member_string (item.name seg) sonority_nas) + (if (> scale 3) + (set! scale 3))) + ((member_string (item.name seg) sonority_liq) + (if (> scale 4) + (set! scale 4))) + ((member_string (item.name seg) sonority_glides) + (if (> scale 5) + (set! scale 5))) + (t + (if (> scale 6) + (set! scale 6))) + ) + ) + segs) + scale)) + +(def_feature_docstring 'SylStructure.lisp_sonority_scale_onset + "SylStructure.sonority_scale_onset syl +Returns value on sonority scale (1 -6, where 6 is most sonorous) +for the onset of a syllable, based on least sonorant portion.") +(define (sonority_scale_onset syl) + (if (string-equal "Onset" (item.feat (item.daughter1 syl) "sylval")) + (let ((segs (item.daughters (item.daughter1 syl))) + (scale 6)) + (mapcar + (lambda (seg) + (cond + ((member_string (item.name seg) sonority_vless_obst) + (if (> scale 1) + (set! scale 1))) + ((member_string (item.name seg) sonority_v_obst) + (if (> scale 2) + (set! scale 2))) + ((member_string (item.name seg) sonority_nas) + (if (> scale 3) + (set! scale 3))) + ((member_string (item.name seg) sonority_liq) + (if (> scale 4) + (set! scale 4))) + ((member_string (item.name seg) sonority_glides) + (if (> scale 5) + (set! scale 5))) + (t (set! scale 6)) + ) + ) + segs) + scale) + 0)) + +(def_feature_docstring 'SylStructure.lisp_num_postvocalic_c + "SylStructure.lisp_num_postvocalic_c +Finds the number of postvocalic consonants in a syllable.") +(define (num_postvocalic_c syl) + "(num_postvocalic_c syl) +Finds the number of postvocalic consonants in a syllable." + (let (segs (npc 0)) + (set! segs (item.daughters (item.daughtern (item.daughtern syl)))) + (mapcar + (lambda (seg) + (set! npc (+ npc 1)) + ) + segs) + npc)) + + +(def_feature_docstring 'SylStructure.lisp_syl_numphones + "SylStructure.lisp_syl_numphones syl +Finds the number segments in a syllable.") +(define (syl_numphones syl) + (length (mt_segs_from_syl syl)) + ) + +(def_feature_docstring 'Segment.lisp_pos_in_syl + "Segment.lisp_pos_in_syl seg +Finds the position in a syllable of a segment - returns a number.") +(define (pos_in_syl seg) + (let ((segments (mt_segs_from_syl + (item.relation (item.parent_to + (item.relation seg 'SylStructure) + 'Syllable) + 'SylStructure))) + (seg_count 1)) + (mapcar + (lambda (s) + (if (not (eqv? s seg)) + (set! seg_count (+ 1.0 seg_count)) + nil)) + segs) + seg_count)) + +(def_feature_docstring 'Intonation.lisp_peak_anchor_segment_type + "Intonation.peak_anchor_segment_type ie +Determines whether the segment anchor for a peak +is the first consonant of a syl - C0 -, the +vowel of a syl - V0 -, or segments after that +- C1->X,V1->X. If the segment is in a following syl, +the return value will be preceded by a 1 - e.g. 1V1") +(define (peak_anchor_segment_type ie) + (let ( syl peak_anchor_num numsegs peak_anchor_type) + (set! peak_anchor_num (peak_segment_anchor ie)) + + + (if (> 9 peak_anchor_num) + (set! syl (item.relation + (item.parent (item.relation ie "IntonationSyllable")) + "SylStructure"))) + (if (> 9 peak_anchor_num) + (set! numsegs (item.feat syl "syl_numphones"))) + + (cond + ((< 9 peak_anchor_num) + (set! peak_anchor_type "none")) + ((> 0 peak_anchor_num) + (set! peak_anchor_type + (string-append + "-1" (get_anchor_value (item.prev syl) + (+ peak_anchor_num + (item.feat syl "p.syl_numphones")))))) + ((< peak_anchor_num numsegs) + (set! peak_anchor_type (get_anchor_value syl numsegs))) + ((> peak_anchor_num numsegs) + (set! peak_anchor_type + (string-append + "1" (get_anchor_value (item.next syl) (- peak_anchor_num numsegs))))) + (set! peak_anchor_type "none")) +; (format stderr "pat: %s\n" peak_anchor_type) + peak_anchor_type)) + +(define (get_anchor_value sylSyl seg_num) + "(get_anchor_value sylSyl seg_num) +Gets the c/v value of the segment within a syllable." + (let ((syl (item.relation sylSyl "SylStructure")) + (seg_val "none") segs (ccnt -1) (vcnt -1) (vpis 0)) + (set! segs (mt_segs_from_syl sylSyl)) + (mapcar + (lambda (seg) + (cond + ((string-equal "consonant" (item.feat seg "df.type")) + (set! vcnt (+ 1 vcnt)) + (set! vpis (item.feat seg "pos_in_syl"))) + (t + (set! ccnt (+ 1 ccnt)))) + (cond + ((and + (eq (- seg_num 1.0) (item.feat seg "pos_in_syl")) + ( string-equal "consonant" (item.feat seg "df.type"))) + (set! seg_val (string-append "V" vcnt))) + ((and + (eq (- seg_num 1.0) (item.feat seg "pos_in_syl")) + ( string-equal "vowel" (item.feat seg "df.type"))) + (set! seg_val (string-append "C" (- (item.feat seg "pos_in_syl") + vpis) "V" vcnt))) + (t nil)) + ) + segs) + seg_val)) + +(define (peak_segment_anchor ie) + "peak_segment_anchor ie +Determines what segment acts as the anchor for a peak. +Returns number of segments from start of accented syllable +to peak." +; (format stderr "accent: %s\n" +; (item.name ie)) + (let ((pk_pos (item.feat ie "position")) + (peak_seg_anchor 11)) + (if + (or + (string-equal "phrase_start" (item.name ie)) + (string-equal "phrase_end" (item.name ie)) + (string-equal "pause" (item.name ie))) + (set! peak_seg_anchor 10) + (set! peak_seg_anchor (find_peak_seg_anchor ie pk_pos))) + peak_seg_anchor)) + +(define (find_peak_seg_anchor ie pk_pos) + "find_peak_seg_anchor ie pk_pos +Part of the workings of peak_segment_anchor." + (let (( syl (item.relation + (item.parent (item.relation ie 'IntonationSyllable)) + 'SylStructure)) + (seg_anchor 11)) + (cond + ((not (eq 9.0 (segs_to_peak syl pk_pos))) + (set! seg_anchor (segs_to_peak syl pk_pos))) + + ((and (item.prev syl) + (not (eq 9.0 (segs_to_peak (item.prev syl) pk_pos)))) +; (format stderr "%s\n" (item.name (item.prev syl))) + (set! seg_anchor (* -1 + (- (+ 1 (item.feat syl "p.syl_numphones")) + (segs_to_peak (item.prev syl) pk_pos))))) + + ((and (item.next syl) + (> pk_pos (item.feat syl "n.start"))) +; (format stderr "%s\n" (item.name (item.next syl))) + (set! seg_anchor (+ 1 + (item.feat syl "syl_numphones") + (segs_to_peak (item.next syl) pk_pos)))) + (t + (format stderr "No seg anchor could be found\n"))) +; (format stderr "seg_anchor: %f\n" seg_anchor) + seg_anchor)) + +(define (segs_to_peak sylSyl pk_pos) + "(segs_to_peak sylSyl pk_pos) +Determines the number of segments from the start of a syllable +to an intonation peak" + (let ((syl (item.relation sylSyl "SylStructure")) + (segs_2_peak 9) segs) + (set! segs (mt_segs_from_syl syl)) + (mapcar + (lambda (seg) +; (format stderr "seg_end: %f pk: %f\n" (item.feat seg "end") +; pk_pos) + (if (eq 1.0 (peak_wi_seg seg pk_pos)) + (set! segs_2_peak (item.feat seg "pos_in_syl"))) +; (format stderr "segs_2_peak: %f\n" segs_2_peak) + ) + segs) + segs_2_peak)) + +(define (peak_wi_seg segment pk_pos) + "peak_wi_seg segment pk_pos +Finds if a peak occurs w/i a segment" + (let ((s_start (item.feat segment "start")) + (s_end (item.feat segment "end")) + (ret 0.0)) + (if (and (< s_start pk_pos) + (< pk_pos s_end)) + (set! ret 1.0) + nil) + ret)) + +(defvar tilt_accent_list '("a" "arb" "afb" "m" "mfb" "mrb") + "tilt_accent_list +List of events containing accents in tilt model.") +(defvar tilt_boundary_list '("rb" "arb" "afb" "fb" "mfb" "mrb") + "tilt_boundary_list +List of events containing boundaries in tilt model.") + +(def_feature_docstring 'Intonation.lisp_last_tilt_accent + "Intonation.lisp_last_tilt_accent + Returns the most recent tilt accent.") +(define (last_tilt_accent intev) + (let ((pie (item.relation.prev intev 'Intonation))) + (cond + ((not pie) + "0") + ((member_string (item.name pie) tilt_accent_list) + (item.name pie)) + (t (last_tilt_accent pie))))) + +(def_feature_docstring 'Intonation.lisp_next_tilt_accent + "Intonation.lisp_next_tilt_accent + Returns the next tilt accent.") +(define (next_tilt_accent intev) + (let ((nie (item.relation.next intev 'Intonation))) + (cond + ((not nie) "0") + ((member_string (item.name nie) tilt_accent_list) + (item.name nie)) + (t (next_tilt_accent nie))))) + +(def_feature_docstring 'Intonation.lisp_last_tilt_boundary + "Intonation.lisp_last_tilt_boundary + Returns the most recent tilt boundary.") +(define (last_tilt_boundary intev) + (let ((pie (item.relation.prev intev 'Intonation))) + (cond + ((not pie) "0") + ((member_string (item.name pie) tilt_boundary_list) + (item.name pie)) + (t (last_tilt_boundary pie))))) + +(def_feature_docstring 'Intonation.lisp_next_tilt_boundary + "Intonation.lisp_next_tilt_boundary + Returns the next tilt boundary.") +(define (next_tilt_boundary intev) + (let ((nie (item.relation.next intev 'Intonation))) + (cond + ((not nie) "0") + ((member_string (item.name nie) tilt_boundary_list) + (item.name nie)) + (t (next_tilt_boundary nie))))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Some basic function to metrical tree structure +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +(define (mt_syl_from_seg seg) + (if seg + (item.root (item.relation seg 'SylStructure)) + nil)) +(define (mt_word_from_syl syl) + (if syl + (item.root (item.relation syl 'WordStructure)) + nil)) +(define (mt_word_from_seg seg) + (mt_word_from_syl (mt_syl_from_seg seg))) + +(define (mt_segs_from_syl s) + (cond + ((null s) nil) + ((member_string 'Segment (item.relations s)) + (list s)) + (t + (apply + append + (mapcar mt_segs_from_syl (item.relation.daughters s 'SylStructure)))))) + +(define (sylmtval s) + (let ((syl (mt_syl_from_seg s))) + (if syl + (item.feat syl "MetricalValue") + "0"))) + +(define (sylpmtval s) + (let ((syl (mt_syl_from_seg s))) + (if syl + (item.feat syl "R:MetricalTree.parent.MetricalValue") + "0"))) + +(define (mt_numsyls w) + (let ((s1 (item.daughter1_to (item.relation w 'WordStructure) 'Syllable)) + (sn (item.daughtern_to (item.relation w 'WordStructure) 'Syllable)) + (count 1)) + (while (and s1 (not (equal? s1 sn))) + (set! count (+ 1 count)) + (set! s1 (item.next s1))) + (if s1 + count + 0))) + +(define (mt_seg_numsyls s) + (let ((w (mt_word_from_seg s))) + (if w + (mt_num_syls w) + 0))) + + +;;; These functions should be sort out some time + +;;; Difference between this syl and the next +;;; number of closing brackets, number of opening brackets +;;; difference between them + +(define (mt_close n) + "(mt_close n) +The number of consituents this is the end of, Effectively the +number of closing brackets after this word." + (if (or (not n) (item.next n)) + 0 + (+ 1 (mt_close (item.parent n))))) + +(define (mt_open n) + "(mt_open n) +The number of consituents this is the start of, Effectively the +number of opening brackets before this word." + (if (or (not n) (item.prev n)) + 0 + (+ 1 (mt_open (item.parent n))))) + +(define (mt_postype syl) + "(mt_postype syl) +Returns single, initial, final or middle." + (let ((w (mt_word_from_syl syl)) + (psw (mt_word_from_syl (item.relation.prev syl 'Syllable))) + (nsw (mt_word_from_syl (item.relation.next syl 'Syllable)))) + (cond + ((and (equal? w psw) + (equal? w nsw)) + 'middle) + ((and (not (equal? w psw)) + (not (equal? w nsw))) + 'single) + ((equal? w psw) + 'final) + (t + 'initial)))) + +(define (mt_accent syl) + "(mt_accent syl) +Accent or 0 if none." + (let ((a 0)) + (mapcar + (lambda (i) + (if (string-matches (item.name i) "^a.*") + (set! a "a"))) + (item.relation.daughters syl 'IntonationSyllable)) + a)) + +(define (mt_break syl) + "(mt_break syl) +Break or 0 if none." + (let ((a 0)) + (mapcar + (lambda (i) + (if (string-matches (item.name i) ".*b$") + (set! a (item.name i)))) + (item.relation.daughters syl 'IntonationSyllable)) + a)) + +(define (mt_ssyl_out s) + (cond + ((null s) 0) + ((not (string-equal + "0" (item.feat s "R:WordStructure.root.lisp_word_mt_break"))) + 0) + ((string-equal "s" (item.feat s "MetricalValue")) + (+ 1 (mt_ssyl_out (item.relation.next s 'Syllable)))) + (t + (mt_ssyl_out (item.relation.next s 'Syllable))))) + +(define (mt_num_s s) + "(mt_num_s s) +The number of s MetricalValues from here to a w or top." + (cond + ((null s) 0) + ((string-equal "w" (item.feat s "MetricalValue")) + 0) + (t + (+ 1 (mt_num_s (item.parent s)))))) + +(define (mt_num_w s) + "(mt_num_w s) +The number of w MetricalValues from here to a s or top." + (cond + ((null s) 0) + ((string-equal "s" (item.feat s "MetricalValue")) + 0) + (t + (+ 1 (mt_num_w (item.parent s)))))) + +(define (mt_strong s) + "(mt_strong s) +1 if all MetricalValues a s to a word, 0 otherwise." + (cond + ((string-equal "w" (item.feat s "MetricalValue")) + "0") + ((member_string 'Word (item.relations s)) "1") + (t + (mt_strong (item.relation.parent s 'MetricalTree))))) + +(define (mt_lssp s) + "(mt_lssp s) +1 if last stressed syllable in phrase, 0 otherwise." + (if (and (string-equal "s" (item.feat s "MetricalValue")) + (equal? 0 (mt_ssyl_out s))) + "1" + "0")) + +(define (mt_fssw s) + "(mt_fssw s) +1 if first stressed syllable in word, 0 otherwise." + (if (and (string-equal "s" (item.feat s "MetricalValue")) + (mt_no_stress_before (item.relation.prev s 'Syllable))) + "1" + "0")) + +(define (mt_nfssw s) + "(nfssw s) +1 if second or later stressed syllable in word, 0 otherwise." + (if (and (string-equal "s" (item.feat s "MetricalValue")) + (null (mt_no_stress_before (item.relation.prev s 'Syllable)))) + "1" + "0")) + +(define (mt_no_stress_before ss) + (cond + ((null ss) t) + ((not (string-equal + (item.feat ss "R:WordStructure.root.addr") + (item.feat (item.next ss) "R:WordStructure.root.addr"))) + t) + ((string-equal "s" (item.feat ss "MetricalValue")) + nil) + (t + (mt_no_stress_before (item.prev ss))))) + +(define (word_mt_break w) + (cond + ((string-equal "1" (item.feat w "sentence_end")) + "BB") + ((string-equal "1" (item.feat w "phrase_end")) + "B") + (t + "0"))) + +(provide 'tilt) diff --git a/lib/tobi.scm b/lib/tobi.scm new file mode 100644 index 0000000..f542113 --- /dev/null +++ b/lib/tobi.scm @@ -0,0 +1,338 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; A CART tree for predicting ToBI accents (learned from f2b) +;;; punctuation and minimal pos +;;; + +; NON !H* L+H L*+ +; NONE10265 583 66 40 0 0 10954 [10265/10954] 93.710 +; H* 650 1805 61 57 0 0 2573 [1805/2573] 70.152 +; !H* 317 241 125 42 0 0 725 [125/725] 17.241 +; L+H* 457 486 76 80 0 0 1099 [80/1099] 7.279 +; L* 45 113 14 4 0 0 176 [0/176] 0.000 +; L*+H 6 6 0 1 0 0 13 [0/13] 0.000 +; 11740 3234 342 224 0 0 +;total 15540 correct 12275.000 78.990% + +(set! f2b_int_accent_cart_tree +' +;; these first few lines are hand written to deal with emphasis (from ssml) +((R:SylStructure.parent.R:Token.parent.EMPH is 1) + (((NONE 0.0) (H* 1) (!H* 0.0) (L+H* 0) (L* 0) (L*+H 0) H*)) + ((n.R:SylStructure.parent.R:Token.parent.EMPH is 1) + (((NONE 1.0) (H* 0) (!H* 0.0) (L+H* 0) (L* 0) (L*+H 0) NONE)) + ((p.R:SylStructure.parent.R:Token.parent.EMPH is 1) + (((NONE 1.0) (H* 0) (!H* 0.0) (L+H* 0) (L* 0) (L*+H 0) NONE)) + +((ssyl_in is 10) + (((NONE 0.99726) (H* 0) (!H* 0.00273973) (L+H* 0) (L* 0) (L*+H 0) NONE)) + ((R:SylStructure.parent.gpos is to) + (((NONE 0.995984) (H* 0.00401606) (!H* 0) (L+H* 0) (L* 0) (L*+H 0) NONE)) + ((R:SylStructure.parent.gpos is cc) + (((NONE 0.987768) (H* 0.00611621) (!H* 0) (L+H* 0.00611621) (L* 0) (L*+H 0) NONE)) + ((ssyl_out is 10) + (((NONE 0.927273) (H* 0.0545455) (!H* 0) (L+H* 0.0181818) (L* 0) (L*+H 0) NONE)) + ((R:SylStructure.parent.gpos is in) + (((NONE 0.938322) (H* 0.0353618) (!H* 0.00493421) (L+H* 0.0197368) (L* 0.00164474) (L*+H 0) NONE)) + ((R:SylStructure.parent.gpos is wp) + (((NONE 0.895238) (H* 0.0857143) (!H* 0) (L+H* 0.0190476) (L* 0) (L*+H 0) NONE)) + ((R:SylStructure.parent.gpos is aux) + (((NONE 0.912281) (H* 0.0380117) (!H* 0.00584795) (L+H* 0.0350877) (L* 0.00584795) (L*+H 0.00292398) NONE)) + ((R:SylStructure.parent.gpos is det) + (((NONE 0.898004) (H* 0.0643016) (!H* 0.00332594) (L+H* 0.0332594) (L* 0) (L*+H 0.00110865) NONE)) + ((stress is 0) + (((NONE 0.978415) (H* 0.0144999) (!H* 0.00164772) (L+H* 0.00510793) (L* 0.000329544) (L*+H 0) NONE)) + ((R:SylStructure.parent.R:Word.p.gpos is 0) + (((NONE 0.209877) (H* 0.716049) (!H* 0) (L+H* 0.0617284) (L* 0.0123457) (L*+H 0) H*)) + ((R:SylStructure.parent.gpos is md) + (((NONE 0.693548) (H* 0.177419) (!H* 0.0322581) (L+H* 0.0967742) (L* 0) (L*+H 0) NONE)) + ((p.syl_break is 3) + ((syl_break is 1) + (((NONE 0.4375) (H* 0.416667) (!H* 0) (L+H* 0.135417) (L* 0.0104167) (L*+H 0) NONE)) + (((NONE 0.171171) (H* 0.666667) (!H* 0) (L+H* 0.144144) (L* 0.018018) (L*+H 0) H*))) + ((pp.syl_break is 4) + ((R:SylStructure.parent.R:Word.pp.gpos is in) + (((NONE 0.0980392) (H* 0.803922) (!H* 0) (L+H* 0.0784314) (L* 0.0196078) (L*+H 0) H*)) + ((syl_out is 0) + (((NONE 0.0185185) (H* 0.796296) (!H* 0.037037) (L+H* 0.0925926) (L* 0.0555556) (L*+H 0) H*)) + ((R:SylStructure.parent.R:Word.n.gpos is in) + (((NONE 0.132353) (H* 0.676471) (!H* 0) (L+H* 0.161765) (L* 0.0294118) (L*+H 0) H*)) + ((syl_break is 0) + (((NONE 0.125) (H* 0.633929) (!H* 0.0133929) (L+H* 0.183036) (L* 0.0401786) (L*+H 0.00446429) H*)) + ((n.stress is 0) + (((NONE 0.364865) (H* 0.567568) (!H* 0) (L+H* 0.0540541) (L* 0.0135135) (L*+H 0) H*)) + ((p.syl_break is 0) + (((NONE 0.612903) (H* 0.290323) (!H* 0) (L+H* 0.0967742) (L* 0) (L*+H 0) NONE)) + (((NONE 0.32) (H* 0.44) (!H* 0.02) (L+H* 0.22) (L* 0) (L*+H 0) H*)))))))) + ((ssyl_in is 0) + (((NONE 0.167769) (H* 0.628926) (!H* 0.0214876) (L+H* 0.142975) (L* 0.0363636) (L*+H 0.00247934) H*)) + ((ssyl_out is 4) + (((NONE 0.490385) (H* 0.240385) (!H* 0.0961538) (L+H* 0.163462) (L* 0.00961538) (L*+H 0) NONE)) + ((pp.syl_break is 3) + ((R:SylStructure.parent.R:Word.p.gpos is content) + (((NONE 0.346154) (H* 0.346154) (!H* 0.0769231) (L+H* 0.192308) (L* 0.0384615) (L*+H 0) NONE)) + (((NONE 0.160714) (H* 0.571429) (!H* 0.0178571) (L+H* 0.178571) (L* 0.0714286) (L*+H 0) H*))) + ((syl_in is 2) + ((n.stress is 0) + ((R:SylStructure.parent.R:Word.p.gpos is in) + (((NONE 0.218182) (H* 0.618182) (!H* 0.0363636) (L+H* 0.0909091) (L* 0.0181818) (L*+H 0.0181818) H*)) + ((syl_out is 2) + (((NONE 0.0961538) (H* 0.634615) (!H* 0.0961538) (L+H* 0.134615) (L* 0.0384615) (L*+H 0) H*)) + ((R:SylStructure.parent.R:Word.p.gpos is content) + ((syl_out is 4) + (((NONE 0.56) (H* 0.12) (!H* 0.08) (L+H* 0.24) (L* 0) (L*+H 0) NONE)) + (((NONE 0.262821) (H* 0.378205) (!H* 0.121795) (L+H* 0.192308) (L* 0.0448718) (L*+H 0) H*))) + (((NONE 0.161905) (H* 0.590476) (!H* 0.0285714) (L+H* 0.171429) (L* 0.047619) (L*+H 0) H*))))) + ((n.syl_break is 0) + (((NONE 0.551724) (H* 0.293103) (!H* 0) (L+H* 0.155172) (L* 0) (L*+H 0) NONE)) + (((NONE 0.408451) (H* 0.422535) (!H* 0.056338) (L+H* 0.112676) (L* 0) (L*+H 0) H*)))) + ((R:SylStructure.parent.R:Word.n.gpos is 0) + ((syl_break is 0) + (((NONE 0.105263) (H* 0.315789) (!H* 0.157895) (L+H* 0.421053) (L* 0) (L*+H 0) L+H*)) + (((NONE 0.641509) (H* 0.132075) (!H* 0.132075) (L+H* 0.0943396) (L* 0) (L*+H 0) NONE))) + ((syl_break is 1) + ((ssyl_in is 3) + (((NONE 0.638889) (H* 0.152778) (!H* 0.125) (L+H* 0.0833333) (L* 0) (L*+H 0) NONE)) + ((p.syl_break is 0) + (((NONE 0.551402) (H* 0.186916) (!H* 0.158879) (L+H* 0.0841122) (L* 0.0186916) (L*+H 0) NONE)) + ((n.stress is 0) + ((pp.syl_break is 0) + (((NONE 0.413043) (H* 0.184783) (!H* 0.152174) (L+H* 0.23913) (L* 0.0108696) (L*+H 0) NONE)) + (((NONE 0.2125) (H* 0.3375) (!H* 0.1875) (L+H* 0.2125) (L* 0.05) (L*+H 0) H*))) + (((NONE 0.449153) (H* 0.245763) (!H* 0.101695) (L+H* 0.20339) (L* 0) (L*+H 0) NONE))))) + ((syl_out is 4) + ((nn.syl_break is 0) + ((pp.syl_break is 0) + (((NONE 0.45614) (H* 0.210526) (!H* 0.192982) (L+H* 0.140351) (L* 0) (L*+H 0) NONE)) + (((NONE 0.288462) (H* 0.25) (!H* 0.0961538) (L+H* 0.346154) (L* 0) (L*+H 0.0192308) L+H*))) + (((NONE 0.163934) (H* 0.459016) (!H* 0.131148) (L+H* 0.245902) (L* 0) (L*+H 0) H*))) + ((syl_out is 5) + ((R:SylStructure.parent.R:Word.p.gpos is content) + (((NONE 0.372881) (H* 0.20339) (!H* 0.169492) (L+H* 0.220339) (L* 0.0338983) (L*+H 0) NONE)) + (((NONE 0.0961538) (H* 0.673077) (!H* 0.115385) (L+H* 0.0961538) (L* 0.0192308) (L*+H 0) H*))) + ((R:SylStructure.parent.R:Word.pp.gpos is in) + ((syl_in is 4) + (((NONE 0.352113) (H* 0.422535) (!H* 0.15493) (L+H* 0.0704225) (L* 0) (L*+H 0) H*)) + ((syl_in is 3) + (((NONE 0.290323) (H* 0.467742) (!H* 0.0806452) (L+H* 0.145161) (L* 0.016129) (L*+H 0) H*)) + ((pp.syl_break is 0) + (((NONE 0.465517) (H* 0.293103) (!H* 0.172414) (L+H* 0.0689655) (L* 0) (L*+H 0) NONE)) + ((R:SylStructure.parent.R:Word.p.gpos is content) + (((NONE 0.18) (H* 0.36) (!H* 0.28) (L+H* 0.14) (L* 0.04) (L*+H 0) H*)) + (((NONE 0.0877193) (H* 0.22807) (!H* 0.368421) (L+H* 0.298246) (L* 0.0175439) (L*+H 0) !H*)))))) + ((ssyl_out is 2) + ((p.syl_break is 0) + (((NONE 0.634921) (H* 0.174603) (!H* 0.0793651) (L+H* 0.111111) (L* 0) (L*+H 0) NONE)) + ((pp.syl_break is 0) + (((NONE 0.388889) (H* 0.148148) (!H* 0.148148) (L+H* 0.259259) (L* 0.0185185) (L*+H 0.037037) NONE)) + (((NONE 0.294118) (H* 0.137255) (!H* 0.215686) (L+H* 0.333333) (L* 0.0196078) (L*+H 0) L+H*)))) + ((R:SylStructure.parent.R:Word.pp.gpos is to) + (((NONE 0.0877193) (H* 0.350877) (!H* 0.210526) (L+H* 0.315789) (L* 0.0350877) (L*+H 0) H*)) + ((syl_break is 3) + ((pp.syl_break is 0) + (((NONE 0.478261) (H* 0.141304) (!H* 0.195652) (L+H* 0.184783) (L* 0) (L*+H 0) NONE)) + (((NONE 0.217822) (H* 0.366337) (!H* 0.257426) (L+H* 0.128713) (L* 0.029703) (L*+H 0) H*))) + ((syl_in is 7) + ((n.stress is 0) + ((R:SylStructure.parent.R:Word.n.gpos is content) + (((NONE 0.117647) (H* 0.220588) (!H* 0.441176) (L+H* 0.176471) (L* 0.0441176) (L*+H 0) !H*)) + (((NONE 0.415385) (H* 0.0461538) (!H* 0.2) (L+H* 0.246154) (L* 0.0923077) (L*+H 0) NONE))) + (((NONE 0.716981) (H* 0.113208) (!H* 0.0943396) (L+H* 0.0754717) (L* 0) (L*+H 0) NONE))) + ((R:SylStructure.parent.R:Word.n.gpos is cc) + (((NONE 0.292308) (H* 0.184615) (!H* 0.276923) (L+H* 0.246154) (L* 0) (L*+H 0) NONE)) + ((nn.syl_break is 3) + (((NONE 0.2) (H* 0.333333) (!H* 0.283333) (L+H* 0.15) (L* 0.0333333) (L*+H 0) H*)) + ((ssyl_in is 4) + (((NONE 0.383838) (H* 0.151515) (!H* 0.212121) (L+H* 0.20202) (L* 0.050505) (L*+H 0) NONE)) + ((p.syl_break is 0) + ((n.syl_break is 1) + (((NONE 0.526316) (H* 0.210526) (!H* 0.0921053) (L+H* 0.171053) (L* 0) (L*+H 0) NONE)) + ((ssyl_in is 3) + (((NONE 0.509804) (H* 0.0980392) (!H* 0.215686) (L+H* 0.156863) (L* 0.0196078) (L*+H 0) NONE)) + ((pp.syl_break is 0) + (((NONE 0.506667) (H* 0.173333) (!H* 0.106667) (L+H* 0.2) (L* 0.0133333) (L*+H 0) NONE)) + ((ssyl_in is 1) + (((NONE 0.1) (H* 0.4) (!H* 0.266667) (L+H* 0.188889) (L* 0.0444444) (L*+H 0) H*)) + (((NONE 0.326316) (H* 0.210526) (!H* 0.221053) (L+H* 0.189474) (L* 0.0526316) (L*+H 0) NONE)))))) + ((R:SylStructure.parent.R:Word.p.gpos is in) + (((NONE 0.0625) (H* 0.296875) (!H* 0.265625) (L+H* 0.328125) (L* 0.046875) (L*+H 0) L+H*)) + ((syl_in is 6) + (((NONE 0.271739) (H* 0.152174) (!H* 0.358696) (L+H* 0.184783) (L* 0.0326087) (L*+H 0) !H*)) + ((syl_out is 2) + (((NONE 0.111111) (H* 0.361111) (!H* 0.319444) (L+H* 0.138889) (L* 0.0555556) (L*+H 0.0138889) H*)) + ((syl_in is 4) + (((NONE 0.224) (H* 0.152) (!H* 0.328) (L+H* 0.24) (L* 0.056) (L*+H 0) !H*)) + ((n.stress is 0) + ((syl_in is 3) + (((NONE 0.0833333) (H* 0.333333) (!H* 0.233333) (L+H* 0.216667) (L* 0.133333) (L*+H 0) H*)) + (((NONE 0.283465) (H* 0.188976) (!H* 0.23622) (L+H* 0.204724) (L* 0.0708661) (L*+H 0.015748) NONE))) + (((NONE 0.305263) (H* 0.284211) (!H* 0.210526) (L+H* 0.178947) (L* 0.0210526) (L*+H 0) NONE)))))))))))))))))))))))))))))))))))))))) +) + +; NON L-L L-H H-L +; NONE13017 0 0 0 0 0 13017 [13017/13017] 100.000 +; H- 339 81 0 1 1 0 422 [81/422] 19.194 +; L- 223 52 0 5 0 0 280 [0/280] 0.000 +; L-L% 17 0 0 1057 96 0 1170 [1057/1170] 90.342 +; L-H% 16 0 0 457 139 0 612 [139/612] 22.712 +; H-L% 5 0 0 30 4 0 39 [0/39] 0.000 +; 13617 133 0 1550 240 0 +;total 15540 correct 14294.000 91.982% +(set! f2b_int_tone_cart_tree +'((lisp_syl_yn_question is 1) + (((H-H% 1.0) H-H%)) +((R:SylStructure.parent.gpos is cc) + (((NONE 0.996942) (H- 0.0030581) (L- 0) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((ssyl_in is 10) + (((NONE 0.989041) (H- 0.00273973) (L- 0) (L-L% 0.00273973) (L-H% 0.00547945) (H-L% 0) NONE)) + ((R:SylStructure.parent.gpos is md) + (((NONE 0.986014) (H- 0) (L- 0) (L-L% 0.00699301) (L-H% 0.00699301) (H-L% 0) NONE)) + ((p.old_syl_break is 4) + (((NONE 0.99462) (H- 0.00239091) (L- 0.00119546) (L-L% 0) (L-H% 0.00119546) (H-L% 0.000597729) NONE)) + ((R:SylStructure.parent.gpos is det) + (((NONE 0.984635) (H- 0.00512164) (L- 0.00384123) (L-L% 0.00384123) (L-H% 0.00256082) (H-L% 0) NONE)) + ((n.old_syl_break is 3) + (((NONE 0.981848) (H- 0.00495049) (L- 0.00330033) (L-L% 0.00660066) (L-H% 0.00330033) (H-L% 0) NONE)) + ((n.old_syl_break is 4) + (((NONE 0.986982) (H- 0.000591716) (L- 0.0100592) (L-L% 0.00118343) (L-H% 0.00118343) (H-L% 0) NONE)) + ((R:SylStructure.parent.gpos is in) + (((NONE 0.977865) (H- 0.00390625) (L- 0.00390625) (L-L% 0.0078125) (L-H% 0.00651042) (H-L% 0) NONE)) + ((old_syl_break is 4) + ((R:SylStructure.parent.R:Word.n.gpos is 0) + (((NONE 0) (H- 0.00892857) (L- 0) (L-L% 0.982143) (L-H% 0.00892857) (H-L% 0) L-L%)) + ((R:SylStructure.parent.R:Word.p.gpos is aux) + (((NONE 0) (H- 0) (L- 0) (L-L% 0.761905) (L-H% 0.238095) (H-L% 0) L-L%)) + ((R:SylStructure.parent.R:Word.n.gpos is det) + (((NONE 0) (H- 0) (L- 0) (L-L% 0.652542) (L-H% 0.347458) (H-L% 0) L-L%)) + ((ssyl_in is 4) + (((NONE 0) (H- 0) (L- 0) (L-L% 0.682243) (L-H% 0.313084) (H-L% 0.0046729) L-L%)) + ((syl_in is 6) + (((NONE 0) (H- 0) (L- 0.00649351) (L-L% 0.688312) (L-H% 0.298701) (H-L% 0.00649351) L-L%)) + ((R:SylStructure.parent.R:Word.n.gpos is aux) + (((NONE 0) (H- 0) (L- 0) (L-L% 0.464286) (L-H% 0.535714) (H-L% 0) L-H%)) + ((syl_in is 5) + (((NONE 0) (H- 0) (L- 0) (L-L% 0.666667) (L-H% 0.322034) (H-L% 0.0112994) L-L%)) + ((sub_phrases is 2) + (((NONE 0) (H- 0) (L- 0) (L-L% 0.696429) (L-H% 0.267857) (H-L% 0.0357143) L-L%)) + ((R:SylStructure.parent.R:Word.p.gpos is det) + (((NONE 0) (H- 0) (L- 0) (L-L% 0.628866) (L-H% 0.350515) (H-L% 0.0206186) L-L%)) + ((sub_phrases is 0) + ((R:SylStructure.parent.R:Word.n.gpos is in) + ((n.old_syl_break is 0) + (((NONE 0) (H- 0) (L- 0) (L-L% 0.68254) (L-H% 0.31746) (H-L% 0) L-L%)) + (((NONE 0) (H- 0.0147059) (L- 0) (L-L% 0.338235) (L-H% 0.632353) (H-L% 0.0147059) L-H%))) + ((n.stress is 0) + (((NONE 0) (H- 0) (L- 0.0108303) (L-L% 0.599278) (L-H% 0.32491) (H-L% 0.064982) L-L%)) + (((NONE 0) (H- 0) (L- 0) (L-L% 0.386364) (L-H% 0.579545) (H-L% 0.0340909) L-H%)))) + (((NONE 0) (H- 0) (L- 0.00456621) (L-L% 0.652968) (L-H% 0.324201) (H-L% 0.0182648) L-L%)))))))))))) + ((R:SylStructure.parent.gpos is pps) + (((NONE 0.988764) (H- 0.011236) (L- 0) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((syl_in is 0) + (((NONE 0.984848) (H- 0.0126263) (L- 0.00252525) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((R:SylStructure.parent.gpos is content) + ((R:SylStructure.parent.R:Word.nn.gpos is 0) + (((NONE 0.967914) (H- 0.0106952) (L- 0.0213904) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((pp.old_syl_break is 4) + (((NONE 0.972315) (H- 0.0232558) (L- 0.00442968) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((syl_in is 1) + (((NONE 0.951163) (H- 0.0372093) (L- 0.0116279) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((nn.old_syl_break is 4) + (((NONE 0.956244) (H- 0.0127621) (L- 0.0291705) (L-L% 0) (L-H% 0) (H-L% 0.00182315) NONE)) + ((R:SylStructure.parent.R:Word.nn.gpos is in) + (((NONE 0.941919) (H- 0.0378788) (L- 0.020202) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((R:SylStructure.parent.R:Word.p.gpos is cc) + (((NONE 0.919643) (H- 0.0714286) (L- 0.00892857) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((nn.old_syl_break is 3) + (((NONE 0.927273) (H- 0.0472727) (L- 0.0254545) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((R:SylStructure.parent.R:Word.nn.gpos is cc) + (((NONE 0.921569) (H- 0.0588235) (L- 0.0196078) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((ssyl_in is 0) + (((NONE 0.911591) (H- 0.0825147) (L- 0.00589391) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((R:SylStructure.parent.R:Word.nn.gpos is to) + (((NONE 0.912281) (H- 0.0350877) (L- 0.0526316) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((R:SylStructure.parent.R:Word.pp.gpos is to) + (((NONE 0.894737) (H- 0.0526316) (L- 0.0526316) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((R:SylStructure.parent.R:Word.p.gpos is in) + (((NONE 0.888554) (H- 0.0662651) (L- 0.0451807) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((R:SylStructure.parent.R:Word.pp.gpos is in) + (((NONE 0.875817) (H- 0.0718954) (L- 0.0522876) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((syl_in is 2) + (((NONE 0.869942) (H- 0.0867052) (L- 0.0433526) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((R:SylStructure.parent.R:Word.nn.gpos is aux) + (((NONE 0.854839) (H- 0.0967742) (L- 0.0483871) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((sub_phrases is 1) + (((NONE 0.836538) (H- 0.0721154) (L- 0.0913462) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((R:SylStructure.parent.R:Word.pp.gpos is det) + (((NONE 0.832402) (H- 0.0949721) (L- 0.0726257) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((ssyl_in is 4) + (((NONE 0.793103) (H- 0.103448) (L- 0.103448) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((n.old_syl_break is 0) + (((NONE 0.850816) (H- 0.0839161) (L- 0.0652681) (L-L% 0) (L-H% 0) (H-L% 0) NONE)) + ((R:SylStructure.parent.R:Word.n.gpos is content) + (((NONE 0.889447) (H- 0.0753769) (L- 0.0251256) (L-L% 0) (L-H% 0) (H-L% 0.0100503) NONE)) + ((old_syl_break is 3) + (((NONE 0) (H- 0.609023) (L- 0.390977) (L-L% 0) (L-H% 0) (H-L% 0) H-)) + (((NONE 1) (H- 0) (L- 0) (L-L% 0) (L-H% 0) (H-L% 0) NONE))))))))))))))))))))))) + (((NONE 0.978947) (H- 0.0131579) (L- 0.00789474) (L-L% 0) (L-H% 0) (H-L% 0) NONE))))))))))))))) + +) + +(defvar tobi_support_yn_questions t + "tobi_support_yn_questions +If set a crude final rise will be added at utterance that are judged +to be yesy/no questions. Namely ending in a ? and not starting with +a wh-for word.") + +(define (first_word syl) + (let ((w (item.relation.parent syl 'SylStructure))) + (item.relation.first w 'Word))) + +(define (syl_yn_question syl) +"(syl_yn_question utt syl) +Return 1 if this is the last syllable in a yes-no question. Basically +if it ends in question mark and doesn't start with a wh-woerd. This +isn't right but it depends on how much you want rising intonation." + (if (and + tobi_support_yn_questions + (member_string (item.feat syl "syl_break") '("4" "3")) + (not (member_string + (downcase (item.name (first_word syl))) + '("how" "why" "which" "who" "what" "where" "when"))) + (string-matches + (item.feat syl "R:SylStructure.parent.R:Token.parent.punc") + ".*\\?.*")) + "1" + "0")) + +(provide 'tobi) diff --git a/lib/tobi_rules.scm b/lib/tobi_rules.scm new file mode 100644 index 0000000..1b3e4e5 --- /dev/null +++ b/lib/tobi_rules.scm @@ -0,0 +1,1002 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Authors: Robert A. J. Clark and Alan W Black +;;; Modifications and Checking: +;;; Gregor Moehler (moehler@ims.uni-stuttgart.de) +;;; Matthew Stone (mdstone@cs.rutgers.edu) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Generate F0 points from tobi labels using rules given in: +;;; Jilka, Moehler & Dogil (forthcomming in Speech Communications) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; *** Converted to new Relation architecture -- but not checked yet -- awb +;;; -> crude (beta) checking: gm in Dec. 98 +;;; +;;; -> fixed TAKEOVER bug that used time value +;;; as pitch target (!) - MDS 1/02 +;;; -> hacked around bunches of target overlap problems - MDS 1/02 +;;; -> added primitive pitch range controls +;;; +;;; Known problems and bugs: +;;; Can't currently use voicing intervals which cross syllable boundaries, +;;; so pre/post-nuclear tones are currently places 0.2s before/after the +;;; nuclear tone even if no voicing occurs. Failing this they default a +;;; percentage of the voicing for that syllable. +;;; +;;; Don't know about target points ahead of the current syllable. +;;; (As you need to know what comes before them to calculate them) +;;; So: post accent tones are placed 0.2 ahead if following syllable exists +;;; ends before 0.2 from starred target and is not accented +;;; The H-target of the H+!H* is 0.2 sec instead of 0.15 sec before +;;; starred tone. +;;; +;;; Multi-utterance input has not been tested. +;;; +;;; !H- does not generate any targets +;;; +;;; Unfortunaltely some other modules may decide to put pauses in the +;;; middle of a phrase +;;; +;;; valleys are not tested yet +;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; To use this in a voice +;;; (require 'tobi_rules) +;;; And in the voice call +;;; (setup_tobi_f0_method) +;;; Set the following for your speaker's F0 range +;;; (Parameter.set 'Default_Topline 146) +;;; (Parameter.set 'Default_Start_Baseline 61) +;;; (Parameter.set 'Valley_Dip 75) + +;; level of debug printout +(set! printdebug 0) + +(define (setup_tobi_f0_method) + "(setup_tobi_f0_method) +Set up parameters for current voice to use the implementaion +of ToBI labels to F0 targets by rule." + (Parameter.set 'Int_Method Intonation_Tree) + (Parameter.set 'Int_Target_Method Int_Targets_General) + (set! int_accent_cart_tree no_int_cart_tree) ; NONE always + (set! int_tone_cart_tree no_int_cart_tree) ; NONE always + (set! int_general_params + (list + (list 'targ_func tobi_f0_targets))) ; we will return a list of f0 targets here + + (Parameter.set 'Phrase_Method 'cart_tree) + (set! phrase_cart_tree tobi_label_phrase_cart_tree) ; redefines the phrasebreak tree + t) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;;;;; +;;;;;; Define and set the new f0 rules +;;;;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;;; Set global parameters +;;; You may want to reset these for different speakers + +(Parameter.set 'Default_Topline 146) ;146 +(Parameter.set 'Default_Start_Baseline 61) ;61 +(Parameter.set 'Current_Topline (Parameter.get 'Default_Topline)) +(Parameter.set 'Current_Start_Baseline (Parameter.get 'Default_Start_Baseline)) +(Parameter.set 'Current_End_Baseline (Parameter.get 'Current_Start_Baseline)) +(Parameter.set 'Downstep_Factor 0.70) +(Parameter.set 'Valley_Dip 75) +;;; function to add target points on a given syllable and fill in +;;; targets where necessary + +(define (tobi_f0_targets utt syl) + "(tobi_f0_targets UTT ITEM) + Returns a list of targets for the given syllable." + (if (and (>= printdebug 1) + (not(equal? 0 (item.feat syl "R:Intonation.daughter1.name")))) + (format t "### %l (%.2f %.2f) %l ptarg: %l ###\n" (item.name syl) + (item.feat syl "syllable_start")(item.feat syl "syllable_end") + (item.feat syl "R:Intonation.daughter1.name") (ttt_last_target_time syl))) + + ;; only continue if there is a Word related to this syllable + ;; I know there always should be, but there might be a bug elsewhere + (cond + ((not(equal? 0 (item.feat syl "R:SylStructure.parent.name"))) + + ; get current label. This assumes that there is only one accent and + ; one endtone on a syllable. Although there can be one of each. + (let ((voicing (ttt_get_voice_times syl)) ; voicing interval + (pvoicing (ttt_get_voice_times ; previous voicing + (item.relation.prev syl 'Syllable))) + (nvoicing (ttt_get_voice_times ; next voicing + (item.relation.next syl 'Syllable)))) + + ; if first syl of phrase set Phrase_Start and Phrase_End parameters + ; and reset downstep (currently does so on big and little breaks.) + ; only assignes Default values at this stage + ; maybe trained from CART later - first steps now - MDS + ; following Moehler and Mayer, SSW 2001 + (if (eq 0 (item.feat syl 'syl_in)) ;; GM maybe something better needed here? + (progn + (Parameter.set 'Phrase_Start (item.feat syl 'R:SylStructure.parent.R:Phrase.last.word_start)) + (Parameter.set 'Phrase_End (item.feat syl 'R:SylStructure.parent.R:Phrase.last.word_end)) + (Parameter.set 'Current_Topline + (/ (* (wagon syl ttt_topline_tree) + (Parameter.get 'Default_Topline)) 100)) + (Parameter.set 'Current_Start_Baseline + (/ (* (wagon syl ttt_baseline_tree) + (Parameter.get 'Default_Start_Baseline)) 100)) + (Parameter.set 'Current_End_Baseline + (Parameter.get 'Current_Start_Baseline)) + (if (>= printdebug 3) + (begin + (print (format nil "new range: %f %f %f" + (Parameter.get 'Current_Topline) + (Parameter.get 'Current_Start_Baseline) + (Parameter.get 'Current_End_Baseline) )))) )) + + ; do stuff (should go only if there is an accent/boundary?) + (let ((new_targets + (ttt_to_targets syl (wagon syl ttt_starttone_tree) + voicing + pvoicing + nvoicing + 'Starttones))) + + (set! new_targets (append new_targets + (ttt_to_targets syl (wagon syl ttt_accent_tree) + voicing + pvoicing + nvoicing + 'Accents))) + + (set! new_targets (append new_targets + (ttt_to_targets syl (wagon syl ttt_endtone_tree) + voicing + pvoicing + nvoicing + 'Endtones))) + + (if (and(not(equal? new_targets nil)) + (>= printdebug 2)) + (begin + (format t ">> Targets: %l\n" new_targets) + (format t ">> LastTarget: %l\n" (last new_targets)) + )) + + new_targets))))) + + +;;; CART tree to specify no accents + +(set! no_int_cart_tree +' +((NONE))) + +;;; +;;; Relate phrasing to boundary tones. +;;; Added downstepped tones - MDS + +(set! tobi_label_phrase_cart_tree +' +((tone in ("L-" "H-" "!H-")) + ((B)) + ((tone in ("H-H%" "H-L%" "!H-L%" "L-L%" "L-H%")) + ((BB)) + ((NB))))) + +;;; +;;; The other functions +;;; + +;;; process a list of relative targets and convert to actual targets + +(define (ttt_to_targets syl rlist voicing pvoicing nvoicing type) + "Takes a list of target sets and returns a list of targets." + (if (or (and (>= printdebug 2) + rlist (atom (caar rlist)) + (not (equal? 'NONE (caar rlist))) (not (equal? '(NONE) (caar rlist)))) + (>= printdebug 3)) + (begin (print "Entering ttt_to_targets with:") + (print (format nil "rlist: %l vc: %l pvc: %l nvc: %l type: %s" rlist voicing pvoicing nvoicing type)))) +(cond + ;; nowt + ((eq (length rlist) 0) ()) + ;; a single target set + ((atom (car (car rlist))) + (cond + ((eq type 'Accents) + (ttt_accent_set_to_targets syl rlist voicing pvoicing nvoicing)) + ((eq type 'Starttones) + (ttt_bound_set_to_targets syl rlist voicing pvoicing)) + ((eq type 'Endtones) + (ttt_bound_set_to_targets syl rlist voicing pvoicing)) + (t (error "unknown target set encountered in ttt_to_targets")))) + ;; list of target sets + ((atom (car (car (car rlist)))) + (append (ttt_to_targets syl (cdr rlist) voicing pvoicing nvoicing type) + (ttt_to_targets syl (car rlist) voicing pvoicing nvoicing type))) + ;; error + (t (error "something strange has happened in ttt_to_targets")))) + + +;; process a starttone/endtone target set. + +(define (ttt_bound_set_to_targets syl tset voicing pvoicing) + "takes a start/endtone target set and returns a list of target points." + (if (>= printdebug 3) (begin + (print "Entering ttt_bound_set_to_targets with:") + (pprintf (format nil "tset: %l vc: %l pvc: %l" tset voicing pvoicing)))) + (cond + ;; usually target given is NONE. (also ignore unknown!) + ((or (eq (car (car tset)) 'NONE) + (eq (car (car tset)) 'UNKNOWN)) + nil) + ;; a pair of target pairs + ((eq (length tset) 2) + (list (ttt_get_target (car tset) voicing) + (ttt_get_target (car (cdr tset)) voicing))) + ;; single target pair + ((eq (length tset) 1) + (cond + ;; an actual target pair + ((not (null (cdr (car tset)))) + (list (ttt_get_target (car tset) voicing))) + ;; a TAKEOVER marker + ((eq (car (car tset)) 'TAKEOVER) + (list (list (ttt_interval_percent voicing 0) + (ttt_last_target_value syl)))) + (t (error "unknown target pair in ttt_bound_set_to_targets")))) + (t (error "unknown target set type in ttt_bound_set_to_targets")))) + + +;; process an accent target set. + +(define (ttt_accent_set_to_targets syl tset voicing pvoicing nvoicing) + "takes a accent target set and returns a list of target points." + (if (>= printdebug 3) (begin + (print "Entering ttt_accent_set_to_targets with:") + (pprintf (format nil "tset: %l vc: %l pvc: %l nvc: %l" tset voicing pvoicing nvoicing)))) + (cond + ;; single target in set + ((null (cdr tset)) + (cond + ; target given is NONE. + ((or (eq (car (car tset)) 'NONE) + (eq (car (car tset)) 'UNKNOWN)) nil) + ; V1 marker + ((eq (car (car tset)) 'V1) + (let ((target_time (+ (/ (- (next_accent_start syl) + (ttt_last_target_time syl)) + 2.0) + (ttt_last_target_time syl)))) + (list (list target_time (ttt_accent_pitch (Parameter.get 'Valley_Dip) target_time))))) + ; V2 marker + ((eq (car (car tset)) 'V2) + (let ((target_time (+ (ttt_last_target_time syl) 0.25))) + (list (list target_time (ttt_accent_pitch (Parameter.get 'Valley_Dip) target_time))))) + ; V3 marker + ((eq (car (car tset)) 'V3) + (let ((target_time (- (next_accent_start syl) 0.25))) + (list (list target_time (ttt_accent_pitch (Parameter.get 'Valley_Dip) target_time))))) + ; single target pair + (t (list (ttt_get_target (car tset) voicing))))) + ;; a pair of targets + ((length tset 2) + (cond + ;; a *ed tone with PRE type tone (as in L+H*) + ((eq (car (car tset)) 'PRE) + (let ((star_target (ttt_get_target (car (cdr tset)) voicing)) + (last_target (parse-number(ttt_last_target_time syl)))) + (cond + ; normal 0.2s case (currently doesn't check for voicing) + ((and (eqv? 0 (ip_initial syl)) + (> (- (car star_target) 0.2) last_target)) + (list (list (- (car star_target) 0.2) + (ttt_accent_pitch (car (cdr (car tset))) + (- (car star_target) 0.2))) ; the time + star_target)) + + ; 90% prev voiced if not before last target - Added back in MDS, + ; with parse-number added and new check for ip_initial + ((and (eqv? 0 (ip_initial syl)) + (> (parse-number (ttt_interval_percent pvoicing 90)) + (parse-number (ttt_last_target_time syl)))) + (list (list (ttt_interval_percent pvoicing 90) + (ttt_accent_pitch (car (cdr (car tset))) + (ttt_interval_percent pvoicing 90))) + star_target)) + + ; otherwise (UNTESTED) [NOTE: Voicing for this syllable only] + (t + (list (list (ttt_interval_percent voicing 20) + (ttt_accent_pitch (car (cdr (car tset))) + (ttt_interval_percent voicing 20))) + star_target))))) + ; a *ed tone with POST type tone (as L*+H) + ((eq (car(car(cdr tset))) 'POST) + (let ((star_target (ttt_get_target (car tset) voicing)) + (next_target nil ) ; interesting problem + (next_syl (item.next syl))) + + (cond + ; normal 0.2s case (UNTESTED) + ((and (not (equal? next_syl nil)) + (eq 0 (item.feat next_syl "accented"))) + (cond + ((< (+ (car star_target) 0.2) (item.feat next_syl "syllable_end")) + (list star_target + (list (+ (car star_target) 0.2) + (ttt_accent_pitch (car (cdr (car (cdr tset)))) + (+ (car star_target) 0.2) )))) + (t + + (list star_target + (list (ttt_interval_percent nvoicing 90) + (ttt_accent_pitch (car (cdr (car (cdr tset)))) + (ttt_interval_percent nvoicing 90) )))))) + + ; 20% next voiced (BUG: Can't do this as the next target hasn't been + ; calculated yet!) + (nil nil) + ;otherwise (UNTESTED) + (t (list star_target + (list (ttt_interval_percent voicing 90) + (ttt_accent_pitch (car (cdr (car (cdr tset)))) + (ttt_interval_percent voicing 90) ))))))) + + (t + ;; This case didn't use to happen, but now must + ;; to avoid +H's clobbering endtones - MDS's hack. + (list (ttt_get_target (car tset) voicing) + (ttt_get_target (cadr tset) voicing))))) + + + ;; something else... + (t (error (format nil "unknown accent set in ttt_accent_set_to_targets: %l" tset))))) + + + +(define (ttt_get_target pair voicing) + "Returns actual target pair, usually for a stared tone." + (if (>= printdebug 4) (begin + (print "Entering ttt_get_target with:") + (pprintf pair) (pprintf voicing))) + (list (ttt_interval_percent voicing (car pair)) + (ttt_accent_pitch (car (cdr pair)) + (ttt_interval_percent voicing (car pair))))) + +(define (ttt_accent_pitch value time) + "Converts a accent pitch entry to a pitch value." + (if (>= printdebug 4) (begin + (print "Entering ttt_accent_pitch with:") + (pprintf value))) + (cond + ;; a real value + ((number? value) + (ttt_interval_percent (list (ttt_get_current_baseline time) + (Parameter.get 'Current_Topline)) + value)) + ;; Downstep then Topline + ((eq value 'DHIGH) + (progn + (Parameter.set 'Current_Topline (+ (ttt_get_current_baseline time) + (* (Parameter.get 'Downstep_Factor) + (- (Parameter.get 'Current_Topline) + (ttt_get_current_baseline time))))) + (ttt_interval_percent (list (ttt_get_current_baseline time) + (Parameter.get 'Current_Topline)) + 100))) + + ;; Unknown + (t (error "Unknown accent pitch value encountered")))) + + +(define (ttt_get_current_baseline v) + "Returns the current declined baseline at time v." + (if (>= printdebug 4) (begin + (print "Entering ttt_get_current_baseline with:") + (pprintf v))) + (let ((h (Parameter.get 'Current_Start_Baseline)) + (l (Parameter.get 'Current_End_Baseline)) + (e (Parameter.get 'Phrase_End)) + (s (Parameter.get 'Phrase_Start))) + (- h (* (/ (- h l) (- e s)) (- v s))))) + +;;; find the time n% through an inteval + +(define (ttt_interval_percent pair percent) + "Returns the time that is percent percent thought the pair." + (if (>= printdebug 4) (begin + (print "Entering ttt_interval_percent with:") + (pprintf (format nil "%l, %l" pair percent)))) + (cond + ; no pair given: just return nil + ((null pair) nil) + ; otherwise do the calculation + (t (let ((start (car pair)) + (end (car(cdr pair)))) + (+ start (* (- end start) (/ percent 100))))))) + + +;;; Getting start and end voicing times in a syllable + +(define (ttt_get_voice_times syl_item) + "Returns a pair of start time of first voiced phone in syllable and +end of last voiced phone in syllable, or nil if syllable is nil" + (cond + ((null syl_item) nil) + (t (let ((segs (item.relation.daughters syl_item "SylStructure"))) + (list + (item.feat (ttt_first_voiced segs) "segment_start") + (item.feat (ttt_first_voiced (reverse segs)) "end")))))) + +(define (ttt_first_voiced segs) + "Returns first segment that is voiced (vowel or voiced consonant) +returns last segment if all are unvoiced." + (cond + ((null (cdr segs)) + (car segs)) ;; last possibility + ((equal? "+" (item.feat (car segs) "ph_vc")) + (car segs)) + ((equal? "+" (item.feat (car segs) "ph_cvox")) + (car segs)) + (t + (ttt_first_voiced (cdr segs))))) + +;;; ttt_last_target has bifurcated into +;;; ttt_last_target_time and +;;; ttt_last_target_value +;;; to fix a place where f0 was set to last target time! +;;; - MDS + +(define (ttt_last_target_time syl) + "Returns the end of the most recent previous target +in the utterance or nil if there is not one present +" + (if (>= printdebug 3) + (begin (print "Entering ttt_last_target_time") + (print syl)) + ) + (let ((target (ttt_last_target syl))) + (if (null? target) + nil + (item.feat target "R:Target.daughter1.pos")))) + +(define (ttt_last_target_value syl) + "Returns the pitch of the most recent previous target +in the utterance or nil if there is not one present +" + (if (>= printdebug 3) + (begin (print "Entering ttt_last_target_time") + (print syl)) + ) + (let ((target (ttt_last_target syl))) + (if (null? target) + nil + (item.feat target "R:Target.daughter1.f0")))) + +;; Changed to scan through segments in the segment relation, +;; to catch (notional) targets on pauses. - MDS +;; +;;; associated segments are: +;;; - the segments in the word +;;; - subsequent segments not in the syllable structure +;;; and on the first word, preceding segments +;;; not in the syllable structure + +(define (ttt_collect_following seg accum) + (if (or (null? seg) + (not (null? (item.relation seg 'SylStructure)))) + accum + (ttt_collect_following (item.next seg) + (cons seg accum)))) + + +(define (ttt_last_target syl) + "Returns the most recent previous target +in the utterance or nil if there is not one present +" +(if (>= printdebug 3) + (begin (print "Entering ttt_last_target") + (print syl)) + ) + (let ((prev_syl (item.relation.prev syl 'Syllable))) + (cond +; ((symbol-bound? 'new_targets) (last (caar new_targets))) + ((null prev_syl) nil) + ((ttt_last_target_segs + (ttt_collect_following + (item.relation.next + (item.relation.daughtern prev_syl "SylStructure") + "Segment") + (reverse (item.relation.daughters prev_syl "SylStructure"))))) + ;list of segments of prev. syllable + ;in reverse order, with pauses + ;prepended. + (t (ttt_last_target prev_syl))))) + +(define (ttt_last_target_segs segs) + "Returns the first target no earlier than seg +or nil if there is not one +" +(if (>= printdebug 4) + (begin (print "Entering ttt_last_target_segs with:") + (pprintf (format nil "%l" segs)) +)) + (cond + ((null segs) nil) + ((and (> (parse-number + (item.feat (car segs) "R:Target.daughter1.f0")) 0) + (eq 0 (item.feat (car segs) "R:SylStructure.parent.lisp_lh_condition")) + (eq 0 (item.feat (car segs) "R:SylStructure.parent.lisp_hl_condition")) + (eq 0 (item.feat (car segs) "R:SylStructure.parent.lisp_valley_condition"))) + (car segs)) + + (t (ttt_last_target_segs (cdr segs))))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;;;;; +;;;;;; CART TREES (ttt - tobi to target) +;;;;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;;; +;;; Return a list of target lists. A target list comprises of a list +;;; of related targets (ie for the L and H in L+H*), just to confuse +;;; matters each target is also a list! (pos pitch) +;;; + + +(set! ttt_endtone_tree ; BUG: does it check the current syl for last accent? + ' + ((tobi_endtone is NONE) ; ususally none + ((((NONE)))) + ((tobi_endtone is "H-H%") ; H-H% + ((((100 120)))) + ((tobi_endtone is "L-L%") ; L-L% + ((((100 -20)))) + ((tobi_endtone is "L-H%") ; L-H% + ((lisp_last_accent > 2) + ((lisp_last_accent_type is "L*") + ((((0 25) (100 40)))) ; paper says 80 but AWB had 40 + ((((0 0) (100 40))))) + ((lisp_last_accent_type is "L*") + ((((100 40)))) + ((((50 0) (100 40)))))) + ((tobi_endtone is "H-L%") ; H-L% + ((lisp_last_accent_type is "L*") + ((tobi_accent is"L*") + ((((50 100) (100 100)))) + ((((0 100) (100 100))))) + ((((100 100))))) + ((tobi_endtone is "!H-L%") ; !H-L% + ((lisp_last_accent_type is "L*") + ((tobi_accent is"L*") + ((((50 DHIGH) (100 100)))) + ((((0 DHIGH) (100 100))))) + ((((100 DHIGH))))) + ((tobi_endtone is "H-") + ((((100 100)))) + ((tobi_endtone is "!H-") + ((((100 DHIGH)))) + ((tobi_endtone is "L-") + ((((100 0)))) + ((((UNKNOWN)))))))))))))) + +(set! ttt_starttone_tree + ' + ((lisp_ip_initial = 1) + ((tobi_endtone is "%H") + ((((0 100)))) + ((p.tobi_endtone in ("H-" "!H-" "L-")) + ((((TAKEOVER)))) ; takeover case + ((tobi_accent is NONE) + ((lisp_next_accent > 2) ; default cases (dep. on whether next target is low) + ((lisp_next_accent_type in ("L*" "L*+H" "L*+!H" "L+H*" "L+!H*" "L-" "L-H%" "L-L%")) + ((((0 50)(100 25)))) + ((((0 50)(100 75))))) + ((lisp_next_accent_type in ("L*" "L*+H" "L*+!H" "L+H*" "L+!H*" "L-" "L-H%" "L-L%")) + ((((0 30)))) + ((((0 70)))))) + ((tobi_accent in ("L*" "L*+H" "L*+!H" "L+H*" "L+!H*" "L-" "L-H%" "L-L%")) + ((((0 30)))) + ((((0 70)))))))) + ((((NONE)))))) ; otherwise (and usually) nothing. + +;; Redone after Jilka, Moehler and Dogil +;; - But treating one-syllable-ip's like +;; last-syllable-of-ip's in cases of +;; two tone switches per syllable (e.g. H* L-H%). +;; - And (hack) a 70% target for the initial +;; H*'s of phrases when the next accent is L+H* +;; - MDS + +(set! ttt_accent_tree + ' + ((tobi_accent is "H*" ) ; H* + ((lisp_ip_final = 1) + ((lisp_ip_one_syllable_case = 1) + ((((50 100)))) + ((((25 100))))) + ((lisp_hstar_weak_target = 1) + ((((60 70)))) + ((lisp_ip_initial = 1) + ((((85 100)))) + ((((60 100))))))) + + ((tobi_accent is "!H*" ) ; !H* + ((lisp_ip_final = 1) + ((lisp_ip_one_syllable_case = 1) + ((((50 DHIGH)))) + ((((25 DHIGH))))) + ((lisp_ip_initial = 1) + ((((85 DHIGH)))) + ((((60 DHIGH)))))) + + ((tobi_accent is "L*" ) ; L* + ((lisp_ip_final = 1) + ((lisp_ip_one_syllable_case = 1) + ((((50 0)))) + ((((25 0))))) + ((lisp_ip_initial = 1) + ((((85 0)))) + ((((60 0)))))) + + ((tobi_accent is "L+H*") ; L+H* + ((lisp_ip_final = 1) + ((lisp_ip_one_syllable_case = 1) + ((((PRE 20) (50 100)))) ; JMD estimated 70 + ((((PRE 20) (25 100))))) + ((lisp_ip_initial = 1) + ((((PRE 20) (90 100)))) + ((((PRE 20) (75 100)))))) + + ((tobi_accent is "L+!H*") ; L+!H* + ((lisp_ip_final = 1) + ((lisp_ip_one_syllable_case = 1) + ((((PRE 20) (70 DHIGH)))) + ((((PRE 20) (25 DHIGH))))) + ((lisp_ip_initial = 1) + ((((PRE 20) (90 DHIGH)))) + ((((PRE 20) (75 DHIGH)))))) + + ((tobi_accent is "L*+H") ; L*+H + ((lisp_ip_final = 1) + ((lisp_ip_one_syllable_case = 1) + ((((35 0) (80 100)))) ; POST would clobber endtones + ((((15 0) (40 100))))) ; POST would clobber endtones - MDS + ((lisp_ip_initial = 1) + ((((55 0) (POST 100)))) + ((((40 0) (POST 100)))))) + + ((tobi_accent is "L*+!H") ; L*+!H + ((lisp_ip_final = 1) + ((lisp_ip_one_syllable_case = 1) + ((((35 0) (80 DHIGH)))) ; POST would clobber endtones - MDS + ((((15 0) (40 DHIGH))))) ; POST would clobber endtones - MDS + ((lisp_ip_initial = 1) + ((((55 0) (POST DHIGH)))) + ((((40 0) (POST DHIGH)))))) + + ((tobi_accent is "H+!H*") ; H+!H* + ((lisp_ip_final = 1) + ((lisp_ip_one_syllable_case = 1) + ((((PRE 143) (60 DHIGH)))) ; the 143 is a hack to level out the downstep + ((((PRE 143) (20 DHIGH))))) + ((lisp_ip_initial = 1) + ((((PRE 143) (90 DHIGH)))) + ((((PRE 143) (60 DHIGH)))))) + + ((lisp_lh_condition = 1) + ((((100 75)))) + ((lisp_lh_condition = 2) + ((((0 90)))) + ((lisp_hl_condition = 1) + ((((100 25)))) + ((lisp_valley_condition = 1) + ((((V1 85)))) + ((lisp_valley_condition = 2) + ((((V2 70)))) + ((lisp_valley_condition = 3) + ((((V3 70)))) + ((tobi_accent is NONE) ; usually we find no accent + ((((NONE)))) + ((((UNKNOWN)))))))))))))))))))) ; UNKNOWN TARGET FOUND + +;;; Cart tree to "predict" pitch range +;;; Right now just accesses a feature +;;; "register" following Moehler & Mayer 2001. +;;; Register must be one of +;;; H - primary high register (default): 133% lowest, 92% highest +;;; H-H - expanded high register: 134% lowest, 100% highest +;;; H-L - lowered high register: 128% lowest, 87% highest +;;; L - primary low register: 100% lowest, 73% highest +;;; L-L and HL-L - low compressed: 100% lowest, 66% highest +;;; HL - expanded register: 100% lowest, 84% highest +;;; HL-H - complete register: 100% lowest, 96% highest +;;; For their speaker, ,BASELINE was 42% of PEAK + +(set! ttt_topline_tree + ' + ((R:SylStructure.parent.register is "H") + (92) + ((R:SylStructure.parent.register is "H-H") + (100) + ((R:SylStructure.parent.register is "H-L") + (87) + ((R:SylStructure.parent.register is "L") + (73) + ((R:SylStructure.parent.register is "L-L") + (66) + ((R:SylStructure.parent.register is "HL") + (84) + ((R:SylStructure.parent.register is "HL-H") + (96) + (92))))))))) + +(set! ttt_baseline_tree + ' + ((R:SylStructure.parent.register is "H") + (133) + ((R:SylStructure.parent.register is "H-H") + (134) + ((R:SylStructure.parent.register is "H-L") + (128) + ((R:SylStructure.parent.register is "L") + (100) + ((R:SylStructure.parent.register is "L-L") + (100) + ((R:SylStructure.parent.register is "HL") + (100) + ((R:SylStructure.parent.register is "HL-H") + (100) + (133))))))))) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;;;;; +;;;;;; Lisp Feature functions. +;;;;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(define (valley_condition syl) +"(valley_condition syl) +Function to determine if a lowered target between two high target points +is needed in this syllable. +Returns: 0 - no target required + 1 - the single target case + 2 - the first of the two target case + 3 - the second of the two target case +" +(if (>= printdebug 4) + (begin (print "Entering valley_condition"))) +(cond + ((and (eq 0 (item.feat syl 'accented)) + (string-matches (next_accent_type syl) + "\\(H\\*\\|H\\-\\|H\\-L\\%\\|H\\-H\\%\\|\\!H\\*\\|\\!H\\-\\|\\!H\\-L\\%\\|\\!H\\-H\\%\\)") + (string-matches (last_accent_type syl) + "\\(H\\*\\|L\\+H\\*\\|L\\*\\+H\\\\|\\!H\\*\\|L\\+\\!H\\*\\|L\\*\\+\\!H\\)")) + ;GM: excluded %H (returns nil for last target) + (let ((nas (next_accent_start syl)) + (syls (item.feat syl 'syllable_start)) + (syle (item.feat syl 'syllable_end)) + (las (ttt_last_target_time syl))) + (if (>= printdebug 3) + (begin (print (format nil "nas: %l syls: %l syle %l las %l" nas syls syle las)))) + (cond + ((and (< (- nas las) 0.5) + (> (- nas las) 0.25) + (< syls (+ (/ (- nas las) 2.0) (ttt_last_target_time syl))) + (> syle (+ (/ (- nas las) 2.0) (ttt_last_target_time syl)))) 1) + ((and (> (- nas las) 0.5) + (< syls (+ (ttt_last_target_time syl) 0.25)) + (> syle (+ (ttt_last_target_time syl) 0.25))) 2) + ((and (> (- nas las) 0.5) + (< syls (- nas 0.25)) + (> syle (- nas 0.25))) 3) + (t 0)))) + (t 0))) + + + +(define (lh_condition syl) +"(lh_condition syl) +Function to determine the need for extra target points between an L and an H +Returns: 1 - first extra target required + 2 - second extra target required + 0 - no target required. +" +(if (>= printdebug 4) + (begin (print "Entering LH_condition"))) +(cond + ((and (eq 0 (item.feat syl 'accented)) + (string-matches (last_accent_type syl) "\\(L\\*\\)") + (string-matches (next_accent_type syl) + "\\(H\\*\\|H\\-\\|H\\-L\\%\\|H\\-H\\%\\)")) + (cond + ((and (eq 1 (last_accent syl)) + (< 2 (next_accent syl))) 1) + ((and (< 2 (last_accent syl)) + (eq 1 (next_accent syl))) 2) + (t 0))) + (t 0))) + +(define (hl_condition syl) +"(lh_condition syl) +Function to determine the need for extra target points between an H and an L +Returns: 1 - extra target required + 0 - no target required. +" +(if (>= printdebug 4) + (begin (print "Entering HL_condition"))) +(cond + ((and (eq 0 (item.feat syl 'accented)) + (string-matches (next_accent_type syl) + "\\(L\\*\\|L\\+H\\*\\|L\\+\\!H\\*\\|L\\*\\+H\\|L\\*\\+!H\\|L\\-\\|L\\-L\\%\\|L-H\\%\\)") + (string-matches (last_accent_type syl) + "\\(H\\*\\|L\\+H\\*\\|L\\*\\+H\\\\|\\!H\\*\\|L\\+\\!H\\*\\|L\\*\\+\\!H\\|\\%H\\)") + ;MDS: added !H's + (eq 1 (last_accent syl)) + + ;; fall faster! -MDS + (<= 2 (next_accent syl))) 1) + (t 0))) + +(define (next_accent syl) +"(next_accent syl) +Wrapper for c++ func ff_next_accent. +Returns the number of the syllables to the next accent in the following format. +0 - no next accent +1 - next syllable +2 - next next syllable +etc..." +(if (>= printdebug 4) + (begin (print "Entering next_accent"))) +(cond + ((eq 0 (next_accent_type syl)) 0) + (t (+ (item.feat syl 'next_accent) 1)))) + +;; Fixed bug that crashed complex phrase tones. - MDS +;; Not sure how else to get a big number... +(define infinity (/ 1 0)) + +;; Modified to include current accent as well -MDS + +(define (last_accent syl) +"(last_accent syl) +Wrapper for c++ func ff_last_accent. +Returns the number of the syllables to the previous accent in the following format. +0 - accent on current syllable +1 - prev syllable +2 - prev to prev syllable +etc... +infinity - no previous syllable" +(if (>= printdebug 4) + (begin (print "Entering last_accent"))) +(cond + ((not (equal? "NONE" (item.feat syl 'tobi_accent))) 0) + ((equal? 0 (last_accent_type syl)) infinity) + (t (+ (item.feat syl 'last_accent) 1)))) + +(define (next_accent_type syl) +"(next_accent_type syl) +Returns the type of the next accent." +(cond + ((not (eq 0 (item.feat syl "n.R:Intonation.daughter1.name"))) + (item.feat syl "n.R:Intonation.daughter1.name")) + ((eq 0 (item.feat syl 'syl_out)) 0) ;;GM real ip_final would be better + (t (next_accent_type (item.relation.next syl 'Syllable))))) + +(define (last_accent_type syl) +"(last_accent_type syl) +Returns the type of the last (previous) accent." +(if (>= printdebug 4) + (begin (print "Entering last_accent_type"))) +(cond + ((not (equal? "NONE" (item.feat syl 'p.tobi_endtone))) + (item.feat syl 'R:Syllable.p.tobi_endtone)) + ((not (equal? "NONE" (item.feat syl 'p.tobi_accent))) + (item.feat syl 'R:Syllable.p.tobi_accent)) + ((eq 0 (item.feat syl 'syl_in)) 0) ;;GM real ip_initial would be better + (t (last_accent_type (item.prev syl 'Syllable))))) + +(define (next_accent_start syl) +"(next_accent_start syl) +Returns the start time of the vowel of next accented syllable" +(if (>= printdebug 4) + (begin (print "Entering next_accent_start"))) +(cond + ((not (eq 0 (item.feat syl "n.R:Intonation.daughter1.name"))) + (item.feat syl "R:Syllable.n.syllable_start")) ;;GM vowel start would be better + ((eq 0 (item.feat syl 'syl_out)) 0) + (t (next_accent_start (item.relation.next syl 'Syllable))))) + +; new features (not used yet) + +(define (ip_final syl) + "(ip_final SYL) + returns 1 if the syllable is the final syllable of an + ip (intermediate phrase)" + (cond + ((or (equal? 0 (item.feat syl "syl_out")) + (equal? "L-" (item.feat syl "tobi_endtone")) + (equal? "H-" (item.feat syl "tobi_endtone")) + (equal? "!H-" (item.feat syl "tobi_endtone"))) 1) + (t 0))) + +(define (ip_initial syl) + "(ip_initial SYL) + returns 1 if the syllable is the initial syllable of an + ip (intermediate phrase)" + (cond + ((equal? 0 (item.feat syl "syl_in")) + 1) + ((equal? 1 (ip_final (item.relation.prev syl 'Syllable))) + 1) + (t 0))) + +;; NEXT TWO FUNCTIONS ARE NEW - MDS +(define (ip_one_syllable_case syl) + "(ip_one_syllable_case SYL) + returns true if the syllable is the initial syllable of an + ip (intermediate phrase) and doesn't itself contain a complex + tone that starts opposite this syllable's accent" + (if (eqv? 0 (ip_initial syl)) + 0 + (let ((accent (item.feat syl "tobi_accent")) + (tone (item.feat syl "tobi_endtone"))) + (cond + ((and (equal? tone "L-H%") + (or (equal? accent "H*") + (equal? accent "!H*") + (equal? accent "L+H*") + (equal? accent "L+!H*") + (equal? accent "L*+H") + (equal? accent "L*+!H*") + (equal? accent "H+!H*"))) + 0) + ((and (or (equal? tone "H-L%") + (equal? tone "!H-L%")) + (equal? accent "L*")) + 0) + (t + 1))))) + +(define (hstar_weak_target syl) + (if (and (equal? 0 (item.feat syl 'asyl_in)) + (member (next_accent_type syl) + (list "L*" "L*+H" "L*+!H" "L+H*" "L+!H*"))) + 1 + 0)) + +(provide 'tobi_rules) diff --git a/lib/token.scm b/lib/token.scm new file mode 100644 index 0000000..e2c40a6 --- /dev/null +++ b/lib/token.scm @@ -0,0 +1,639 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Various tokenizing functions and customization + +(define (Token utt) + "(Token UTT) +Build a Word stream from the Token stream, analyzing compound words +numbers etc as tokens into words. Respects the Parameter Language +to choose the appropriate token to word module." + (let ((rval (apply_method 'Token_Method utt)) ;; might be defined + (language (Parameter.get 'Language))) + (cond + (rval rval) ;; newer style + ((or (string-equal "britishenglish" language) + (string-equal "english" language) + (string-equal "americanenglish" language)) + (Token_English utt)) + ((string-equal "welsh" language) + (Token_Welsh utt)) + (t + (Token_Any utt))))) + +(define (remove_leadtrail_underscores name) + "(remove_leadtrail_underscores name) +Get rid of leading and trailing underscores that may be used for emphasis, +not this is called when there are underscores at the beginning and end but +there may not be an equal number of them." + (let ((se (symbolexplode name))) + (while (string-equal "_" (car se)) + (set! se (cdr se))) + (set! se (reverse se)) + (while (string-equal "_" (car se)) + (set! se (cdr se))) + (apply string-append (reverse se)))) + +(define (english_token_to_words token name) +"(english_token_to_words TOKEN NAME) +Returns a list of words for NAME from TOKEN. This allows the +user to customize various non-local, multi-word, context dependent +translations of tokens into words. If this function is unset only +the builtin translation rules are used, if this is set the builtin +rules are not used unless explicitly called. [see Token to word rules]" + (cond + ((string-matches name "[A-Z]*[\\$#\\\\Y£][0-9,]+\\(\\.[0-9]+\\)?") + ;; Some form of money (pounds or type of dollars) + (let (amount type currency) + (cond + ((string-matches name ".*\\$.*") + (set! amount (string-after name "$")) + (set! type (string-before name "$")) + (set! currency "dollar")) + ((string-matches name ".*£.*") + (set! amount (string-after name "£")) + (set! type (string-before name "£")) + (set! currency "pound")) + ((string-matches name ".*#.*") + (set! amount (string-after name "#")) + (set! type (string-before name "#")) + (set! currency "pound")) + ((string-matches name ".*Y[0-9].*") + (set! amount (string-after name "Y")) + (set! type (string-before name "Y")) + (set! currency "yen")) + ((string-matches name ".*\\\\.*") + (set! amount (string-after name "\\")) + (set! type (string-before name "\\")) + (set! currency "yen")) + (t + ;; who knows + (set! amount (string-after name "$")) + (set! type (string-before name "$")) + (set! currency "dollar"))) + (cond + ((string-matches (item.feat token "n.name") + ".*illion.?") + (append ;; "billions and billions" - Sagan + (builtin_english_token_to_words token amount) + (list (item.feat token "n.name")) ;; illion + (token_money_expand type) + (list (string-append currency "s")))) + ((string-matches amount ".*\\...$") + (append ;; exactly two places after point + (builtin_english_token_to_words token (string-before amount ".")) + (token_money_expand type) + (if (or (string-matches amount "1\\..*") + (string-equal currency "yen")) + (list currency) + (list (string-append currency "s"))) + (if (not (string-matches name ".*\\.00$")) + (builtin_english_token_to_words + token (remove_leading_zeros (string-after amount "."))) + nil))) + (t + (append ;; nothing after point or lots after point + (builtin_english_token_to_words token amount) + (token_money_expand type) + (if (or (string-matches amount "1") + (string-equal currency "yen")) + (list currency) + (list (string-append currency "s")))))))) + ((and (string-matches name ".*illion.?") + (string-matches (item.feat token "p.name") + "[A-Z]*[\\$#][0-9,]+\\(\\.[0-9]+\\)?")) + nil ;; dealt with on the previous symbol + ) + ((string-matches name "[1-9][0-9]*/[1-9][0-9]*") + (let ((numerator (string-before name "/")) + (denominator (string-after name "/")) + ) + (cond + ((string-matches name "1/2") + (list "half")) + ((string-matches denominator "4") + (append + (builtin_english_token_to_words token numerator) + (list "quarter") + (if (string-equal numerator "1") + (list '((name "'s")(pos nnp))) + nil))) + (t + (append + (builtin_english_token_to_words token numerator) + (begin + (item.set_feat token "token_pos" "ordinal") + (builtin_english_token_to_words token denominator)) + (if (string-equal numerator "1") + nil + (list '((name "'s")(pos nnp))))))))) + ((and (string-matches name "No") + (item.next token) + (string-matches (item.feat token "n.name") + "[0-9]+")) + (list + "number")) + ((string-matches name ".*%$") + (append + (token_to_words token (string-before name "%")) + (list "percent"))) + ((string-matches name "[0-9]+s") ;; e.g. 1950s + (item.set_feat token "token_pos" "year") ;; reasonable guess + (append + (builtin_english_token_to_words token (string-before name "s")) + (list '((name "'s")(pos nnp))) ;; will get assimilated by postlexical rules + )) + ((string-matches name "[0-9]+'s") ;; e.g. 1950's + (item.set_feat token "token_pos" "year") ;; reasonable guess + (append + (builtin_english_token_to_words token (string-before name "'s")) + (list '((name "'s")(pos nnp))) ;; will get assimilated by postlexical rules + )) + ((and (string-matches name ".*s$") + (string-equal (item.feat token "punc") "'")) + ;; potential possessive or may be end of a quote + (if (token_no_starting_quote token) + (item.set_feat token "punc" "")) + (builtin_english_token_to_words token name)) + ((and (string-equal name "A") ;; letter or determiner + (or (string-matches (item.feat token "p.name") "[a-z].*") + (string-matches (item.feat token "n.name") "[A-Z].*"))) + (list (list '(name "a")(list 'pos token.letter_pos)))) + ((member_string name english_homographs) + (list (list (list 'name name) + (list 'hg_pos (item.feat token "token_pos"))))) + ((string-matches name "__*[^_][^_]*_*_") ;; _emphasis_ + (english_token_to_words + token + (remove_leadtrail_underscores name) + )) + ((string-matches name "[0-9]?[0-9][:\\.][0-9][0-9][AaPp][Mm]") ;; time + ;; must be am/pm present for . to be acceptable separator + (let (hours mins half sep (ttime (downcase name))) + (if (string-matches ttime ".*:.*") + (set! sep ":") + (set! sep ".")) + (set! hours (string-before ttime sep)) + (set! mins (string-after ttime sep)) + (if (string-matches ttime ".*am") + (set! sep "am") + (set! sep "pm")) + (set! mins (string-before mins sep)) + (append + (builtin_english_token_to_words token hours) + (cond + ((string-equal mins "00") + nil) + ((string-matches mins "0.") + (cons + "oh" + (builtin_english_token_to_words token (string-after mins "0")))) + (t + (builtin_english_token_to_words token mins))) + (if (string-equal sep "am") + (builtin_english_token_to_words token "A.M") + (builtin_english_token_to_words token "P.M"))))) + ((string-matches name "[0-9]?[0-9]:[0-9][0-9]") ;; time + (append + (builtin_english_token_to_words + token (remove_leading_zeros (string-before name ":"))) + (cond + ((string-equal "00" (string-after name ":")) + nil) + ((string-matches (string-after name ":") "0.") + (cons + "oh" + (builtin_english_token_to_words + token + (remove_leading_zeros (string-after name ":"))))) + (t + (builtin_english_token_to_words + token + (string-after name ":")))))) + ((string-matches name "[0-9][0-9]:[0-9][0-9]:[0-9][0-9]") ;; exact time + (append + (builtin_english_token_to_words + token (remove_leading_zeros (string-before name ":"))) + (list "hours") + (builtin_english_token_to_words + token (remove_leading_zeros + (string-before (string-after name ":") ":"))) + (list "minutes" "and") + (builtin_english_token_to_words + token (remove_leading_zeros + (string-after (string-after name ":") ":"))) + (list "seconds"))) + ((string-matches name "[0-9][0-9]?/[0-9][0-9]?/[0-9][0-9]\\([0-9][0-9]\\)?") + ;; date, say it as numbers to avoid American/British problem + (let ((num1 (string-before name "/")) + (num2 (string-before (string-after name "/") "/")) + (year (string-after (string-after name "/") "/")) + day month) + (item.set_feat token "token_pos" "cardinal") + (set! day (builtin_english_token_to_words token num1)) + (set! month (builtin_english_token_to_words token num2)) + (item.set_feat token "token_pos" "year") + (append + day + month + (list '((name ",")(pbreak_scale 0.9))) + (builtin_english_token_to_words token year)))) + ((string-matches name "[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]") + (item.set_feat token "token_pos" "digits") ;; canonical phone number + (append + (builtin_english_token_to_words token (string-before name "-")) + (list '((name ",")(pbreak_scale 1.0))) + (builtin_english_token_to_words token (string-after name "-")))) + ((string-matches name "[0-9]+-[0-9]+-[-0-9]+") + ;; long distance number + (let ((r '(dummy)) (remainder name)) + (item.set_feat token "token_pos" "digits") + (while (> (length remainder) 0) + (if (string-matches remainder "[0-9]+") + (set! r (append r + (builtin_english_token_to_words + token remainder))) + (set! r (append r + (builtin_english_token_to_words + token (string-before remainder "-"))))) + (set! remainder (string-after remainder "-")) + (if (> (length remainder) 0) + (set! r (append r (list '((name ",")(pbreak_scale 1.0))))))) + (cdr r)) + ) + ((and (string-matches name "[0-9][0-9][0-9]") + (string-matches (item.feat token "n.name") + "[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]")) + (item.set_feat token "token_pos" "digits") + (builtin_english_token_to_words token name)) + ((string-matches name "[0-9]+-[0-9]+") + (let ((tokpos)) + (item.set_name token (string-before name "-")) + (set! tokpos (wagon token + (car (cdr (assoc "[0-9]+" token_pos_cart_trees))))) + (item.set_feat token "token_pos" (car tokpos)) + (append + (builtin_english_token_to_words token (string-before name "-")) + (list "to") + (builtin_english_token_to_words token (string-after name "-"))))) + ((string-matches name "\\(II?I?\\|IV\\|VI?I?I?\\|IX\\|X[VIX]*\\)") + ;; Roman numerals + (let ((tp (item.feat token "token_pos"))) + (cond + ((string-matches tp "century");; always believe this + (item.set_feat token "token_pos" "ordinal") + (if (or (string-equal "1" (tok_rex token)) + (item.feat token "p.lisp_tok_rex_names")) + (append + (list "the") + (builtin_english_token_to_words + token (tok_roman_to_numstring name))) + (builtin_english_token_to_words + token (tok_roman_to_numstring name)))) + ((string-matches name "[IVX]");; be *very* wary of this one + (if (and (string-equal + "1" (item.feat token "p.lisp_tok_section_name")) + (string-matches tp "number")) + (builtin_english_token_to_words + token (tok_roman_to_numstring name)) + (tok_string_as_letters name))) + ((string-matches tp "number") + (item.set_feat token "token_pos" "cardinal") + (builtin_english_token_to_words + token (tok_roman_to_numstring name))) + (t;; else its a letter + (tok_string_as_letters name))))) + ((and (string-matches name "pp") + (string-matches (item.feat token "n.name") + "[0-9]+-[0-9]+")) + (list "pages")) + ((and (string-matches name "ss") + (string-matches (item.feat token "n.name") + "[0-9]+-[0-9]+")) + (list "sections")) + ((string-matches name "_____+") + (list "line" "of" "underscores")) + ((string-matches name "=====+") + (list "line" "of" "equals")) + ((string-matches name "-----+") + (list "line" "of" "hyphens")) + ((string-matches name "\\*\\*\\*\\*\\*+") + (list "line" "of" "asterisks")) + ((string-matches name "--+") + (list '((name ",")(pbreak_scale 1.0)))) + ((string-matches name ".*--+.*") + (append + (builtin_english_token_to_words token (string-before name "--")) + (list '((name ",")(pbreak_scale 1.0))) + (builtin_english_token_to_words token (string-after name "--")))) + ((string-matches name "[A-Z][A-Z]?&[A-Z][A-Z]?") + (append + (tok_string_as_letters (string-before name "&")) + (list "and") + (tok_string_as_letters (string-after name "&")))) + ((and (string-equal name "Ms") + (string-matches (item.feat token "n.name") "[A-Z][^A-Z]*")) + (list "mizz")) + ((or (string-matches name "[A-Z][A-Z]+s") + (string-matches name "[BCDEFGHJKLMNOPQRSTVWXYZ]+s")) + (append + (builtin_english_token_to_words token (string-before name "s")) + (list '((name "'s")(pos nnp))) ;; will get assimilated by postlexical rules + )) + ((string-matches name "<.*@.*>") ;; quoted e-mail + (append + (builtin_english_token_to_words + token (string-after (string-before name "@") "<")) + (list "at") + (builtin_english_token_to_words + token (string-before (string-after name "@") ">")))) + ((string-matches name ".*@.*") ;; e-mail + (append + (builtin_english_token_to_words + token (string-before name "@")) + (list "at") + (builtin_english_token_to_words + token (string-after name "@") ">"))) + ((string-matches name "\\([dD][Rr]\\|[Ss][tT]\\)") + (if (string-equal (item.feat token "token_pos") "street") + (if (string-matches name "[dD][rR]") + (list "drive") + (list "street")) + (if (string-matches name "[dD][rR]") ;; default on title side + (list "doctor") + (list "saint")))) + ((string-matches name "[Cc]alif") ;; hopelessly specific ... + (list + "california")) + (t + (builtin_english_token_to_words token name)))) + +;;; This is set as the default +(defvar token_to_words english_token_to_words) + +(defvar token.punctuation "\"'`.,:;!?(){}[]" + "token.punctuation +A string of characters which are to be treated as punctuation when +tokenizing text. Punctuation symbols will be removed from the text +of the token and made available through the \"punctuation\" feature. +[see Tokenizing]") +(defvar token.prepunctuation "\"'`({[" + "token.prepunctuation +A string of characters which are to be treated as preceding punctuation +when tokenizing text. Prepunctuation symbols will be removed from the text +of the token and made available through the \"prepunctuation\" feature. +[see Tokenizing]") +(defvar token.whitespace " \t\n\r" + "token.whitespace +A string of characters which are to be treated as whitespace when +tokenizing text. Whitespace is treated as a separator and removed +from the text of a token and made available through the \"whitespace\" +feature. [see Tokenizing]") +(defvar token.singlecharsymbols "" + "token.singlecharsymbols +Characters which have always to be split as tokens. This would be +usual is standard text, but is useful in parsing some types of +file. [see Tokenizing]") + +(defvar token.letter_pos 'nn + "token.letter_pos +The part of speech tag (valid for your part of speech tagger) for +individual letters. When the tokenizer decide to pronounce a token +as a list of letters this tag is added to each letter in the list. +Note this should be from the part of speech set used in your tagger +which may not be the same one that appears in the actual lexical +entry (if you map them afterwards). This specifically allows \"a\" +to come out as ae rather than @.") + +(defvar token.unknown_word_name "unknown" + "token.unknown_word_name +When all else fails and a pronunciation for a word or character can't +be found this word will be said instead. If you make this \"\" them +the unknown word will simple be omitted. This will only +really be called when there is a bug in the lexicon and characters +are missing from the lexicon. Note this word should be in the lexicon.") + +(def_feature_docstring + 'Token.punc + "Token.punc +Succeeding punctuation symbol found after token in original +string/file.") +(def_feature_docstring + 'Token.whitespace + "Token.whitespace +Whitespace found before token in original string/file.") +(def_feature_docstring + 'Token.prepunctuation + "Token.prepunctuation +Preceeding puctuation symbol found before token in original string/file.") + +(require 'tokenpos) +;;; +;;; Token pos are gross level part of speech tags which help decide +;;; pronunciation of tokens (particular expansion of Tokens into words) +;;; The most obvious example is identifying number types (ordinals, +;;; years, digits or numbers). +;;; +(defvar english_token_pos_cart_trees + '( + ;; Format is (Regex Tree) + ("[0-9]+" + ((lisp_num_digits < 3.8) + ((p.lisp_token_pos_guess is month) + ((lisp_month_range is 0) ((year)) ((ordinal))) + ((n.lisp_token_pos_guess is month) + ((lisp_month_range is 0) ((cardinal)) ((ordinal))) + ((n.lisp_token_pos_guess is numeric) + ((lisp_num_digits < 2) + ((p.lisp_token_pos_guess is numeric) + ((pp.lisp_token_pos_guess is sym) ((digits)) ((cardinal))) + ((cardinal))) + ((nn.lisp_token_pos_guess is sym) ((cardinal)) ((digits)))) + ((lisp_num_digits < 2) + ((nn.lisp_token_pos_guess is numeric) + ((n.lisp_token_pos_guess is sym) + ((lisp_month_range is 0) ((digits)) ((cardinal))) + ((cardinal))) + ((cardinal))) + ((name < 302.3) + ((p.lisp_token_pos_guess is flight) + ((digits)) + ((n.lisp_token_pos_guess is sym) + ((p.lisp_token_pos_guess is sym) ((digits)) ((cardinal))) + ((cardinal)))) + ((p.lisp_token_pos_guess is a) + ((digits)) + ((n.lisp_token_pos_guess is sym) + ((nn.lisp_token_pos_guess is sym) + ((name < 669.2) ((digits)) ((cardinal))) + ((cardinal))) + ((name < 373.2) + ((cardinal)) + ((name < 436.2) + ((name < 392.6) ((digits)) ((cardinal))) + ((name < 716.5) + ((cardinal)) + ((name < 773.6) + ((p.lisp_token_pos_guess is _other_) ((digits)) ((cardinal))) + ((cardinal))))))))))))) + ((p.lisp_token_pos_guess is numeric) + ((pp.lisp_token_pos_guess is month) + ((year)) + ((nn.lisp_token_pos_guess is numeric) ((cardinal)) ((digits)))) + ((nn.lisp_token_pos_guess is numeric) + ((n.lisp_token_pos_guess is month) + ((cardinal)) + ((n.lisp_token_pos_guess is numeric) + ((digits)) + ((p.lisp_token_pos_guess is _other_) ((cardinal)) ((year))))) + ((p.lisp_token_pos_guess is _other_) + ((lisp_num_digits < 4.4) + ((name < 2959.6) + ((name < 1773.4) ((cardinal)) ((year))) + ((cardinal))) + ((pp.lisp_token_pos_guess is _other_) ((digits)) ((cardinal)))) + ((n.lisp_token_pos_guess is to) + ((year)) + ((p.lisp_token_pos_guess is sym) + ((pp.lisp_token_pos_guess is sym) + ((cardinal)) + ((lisp_num_digits < 4.6) ((year)) ((digits)))) + ((lisp_num_digits < 4.8) + ((name < 2880) + ((name < 1633.2) + ((name < 1306.4) ((cardinal)) ((year))) + ((year))) + ((cardinal))) + ((cardinal))))))))) + ) + ("\\(II?I?\\|IV\\|VI?I?I?\\|IX\\|X[VIX]*\\)";; Roman numerals + ((p.lisp_tok_rex_names is 0) + ((lisp_num_digits is 5) + ((number)) + ((lisp_num_digits is 4) + ((number)) + ((nn.lisp_num_digits is 13) + ((number)) + ((p.lisp_num_digits is 7) + ((number)) + ((p.lisp_tok_section_name is 0) + ((lisp_tok_rex is 0) + ((lisp_num_digits is 3) + ((p.lisp_num_digits is 4) + ((number)) + ((nn.lisp_num_digits is 4) + ((number)) + ((n.lisp_num_digits is 4) + ((number)) + ((pp.lisp_num_digits is 3) + ((number)) + ((p.lisp_num_digits is 2) + ((letter)) + ((nn.lisp_num_digits is 2) + ((letter)) + ((n.cap is 0) ((letter)) ((number))))))))) + ((nn.lisp_num_digits is 11) + ((letter)) + ((lisp_num_digits is 1) + ((pp.lisp_num_digits is 9) + ((letter)) + ((p.lisp_num_digits is 9) + ((letter)) + ((n.lisp_num_digits is 6) + ((letter)) + ((pp.lisp_num_digits is 6) + ((letter)) + ((pp.cap is 0) + ((n.cap is 0) + ((p.lisp_num_digits is 1) + ((letter)) + ((n.lisp_num_digits is 4) ((letter)) ((letter)))) + ((letter))) + ((letter))))))) + ((p.lisp_num_digits is 10) + ((number)) + ((n.lisp_num_digits is 8) + ((number)) + ((pp.lisp_num_digits is 9) + ((number)) + ((nn.lisp_num_digits is 5) + ((number)) + ((n.lisp_num_digits is 4) ((number)) ((letter)))))))))) + ((letter))) + ((number))))))) + ((century)))) + ("\\([dD][Rr]\\|[Ss][tT]\\)" + ((n.name is 0) + ((p.cap is 1) + ((street)) + ((p.name matches "[0-9]*\\(1[sS][tT]\\|2[nN][dD]\\|3[rR][dD]\\|[0-9][tT][hH]\\)") + ((street)) + ((title)))) + ((punc matches ".*,.*") + ((street)) + ((p.punc matches ".*,.*") + ((title)) + ((n.cap is 0) + ((street)) + ((p.cap is 0) + ((p.name matches "[0-9]*\\(1[sS][tT]\\|2[nN][dD]\\|3[rR][dD]\\|[0-9][tT][hH]\\)") + ((street)) + ((title))) + ((pp.name matches "[1-9][0-9]+") + ((street)) + ((title))))))))) + ("lead" + ((p.name in (was were had been having has is are)) + ((led)) + ((liid)))) + ("read" + ((p.name in (to)) + ((riid)) + ((red)))) + )) + +(defvar english_homographs + '("lead" "read") + "english_homographs +A list of tokens that are dealt with by a homograph disambiguation tree +in english_token_pos_cart_trees.") + +(defvar token_pos_cart_trees + english_token_pos_cart_trees + "token_pos_cart_trees +This is a list of pairs or regex plus CART tree. Tokens that match +the regex will have the CART tree aplied, setting the result as +the token_pos feature on the token. The list is checked in order +and only the first match will be applied.") + +(provide 'token) diff --git a/lib/tokenpos.scm b/lib/tokenpos.scm new file mode 100644 index 0000000..9470eb4 --- /dev/null +++ b/lib/tokenpos.scm @@ -0,0 +1,265 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Functions used in identifying token types. +;;; + +(defvar token_most_common +'( +sym numeric month to day in the of on and writes a years from +for jst at million by is was gmt page he that than more since as when +with but after about or his i has it date no died number bst who miles +university some people an only w year have ago were are pages up days +months hours minutes through out had which least hi last now ft this +all one its there between cents until over will before past they +nearly times tim message so lbs just if age we during she billion then +other be time new her first states not you members under would many +says degrees two next fax week while bush been around including back +campaign american within publisher flight points even early later +world countries every edt can president most could their what them +former began women killed another also received long americans pounds +do dear said km made into did dead war tel still old x took total men +like f am less c well late down weeks end chapter among place house +away him election death almost students state soviet where version +summer man s nation because washington top though m id est these spent +seats gnu estimated those lost ian high each copies children acres +tons son per my found won off seconds power nations federal born +presidential much city begin p name different whose three home hello) + +"token_most_common +A list of (English) words which were found to be most common in +an text database and are used as discriminators in token analysis.") + +(define (token_pos_guess sc) +"(tok_pos sc) +Returns a general pos for sc's name. + numeric All digits + number float or comma'd numeric + sym Contains at least one non alphanumeric + month has month name (or abbrev) + day has day name (or abbrev) + rootname else downcased alphabetic. +Note this can be used to find token_pos but isn't used directly as +its not disciminatory enough." + (let ((name (downcase (item.name sc)))) + (cond + ((string-matches name "[0-9]+") + 'numeric) + ((or (string-matches name "[0-9]+\\.[0-9]+") + (string-matches name + "[0-9][0-9]?[0-9]?,\\([0-9][0-9][0-9],\\)*[0-9][0-9][0-9]")) + 'number) + ((string-matches name ".*[^A-Za-z0-9].*") + 'sym) + ((member_string name '(jan january feb february mar march + apr april may jun june + jul july aug august sep sept september + oct october nov november dec december)) + 'month) + ((member_string name '(sun sunday mon monday tue tues tuesday + wed wednesday thu thurs thursday + fri friday sat saturday)) + 'day) + ((member_string name token_most_common) + name) + (t + '_other_)))) + +(define (token_no_starting_quote token) + "(token_no_starting_quote TOKEN) +Check to see if a single quote (or backquote) appears as prepunctuation +in this token or any previous one in this utterance. This is used to +disambiguate ending single quote as possessive or end quote." + (cond + ((null token) + t) + ((string-matches (item.feat token "prepunctuation") "[`']") + nil) + (t + (token_no_starting_quote (item.relation.prev token "Token"))))) + +(define (token_zerostart sc) +"(zerostart sc) +Returns, 1 if first char of sc's name is 0, 0 otherwise." + (if (string-matches (item.name sc) "^0.*") + "1" + "0")) + +(define (tok_roman_to_numstring roman) + "(tok_roman_to_numstring ROMAN) +Takes a string of roman numerals and converts it to a number and +then returns the printed string of that. Only deals with numbers up to 50." + (let ((val 0) (chars (symbolexplode roman))) + (while chars + (cond + ((equal? (car chars) 'X) + (set! val (+ 10 val))) + ((equal? (car chars) 'V) + (set! val (+ 5 val))) + ((equal? (car chars) 'I) + (cond + ((equal? (car (cdr chars)) 'V) + (set! val (+ 4 val)) + (set! chars (cdr chars))) + ((equal? (car (cdr chars)) 'X) + (set! val (+ 9 val)) + (set! chars (cdr chars))) + (t + (set! val (+ 1 val)))))) + (set! chars (cdr chars))) + (format nil "%d" val))) + +(define (num_digits sc) +"(num_digits SC) +Returns number of digits (actually chars) is SC's name." + (string-length (format nil "%s" (item.name sc)))) + +(define (month_range sc) +"(month_range SC) +1 if SC's name is > 0 and < 32, 0 otherwise." + (let ((val (parse-number (item.name sc)))) + (if (and (> val 0) (< val 32)) + "1" + "0"))) + +(define (remove_leading_zeros name) + "(remove_leading_zeros name) +Remove leading zeros from given string." + (let ((nname name)) + (while (string-matches nname "^0..*") + (set! nname (string-after nname "0"))) + nname)) + +(define (token_money_expand type) +"(token_money_expand type) +Convert shortened form of money identifier to words if of a known type." + (cond + ((string-equal type "HK") + (list "Hong" "Kong")) + ((string-equal type "C") + (list "Canadian")) + ((string-equal type "A") + (list "Australian")) + ((< (length type) 4) + (mapcar + (lambda (letter) + (list (list 'name letter) + (list 'pos token.letter_pos))) + (symbolexplode type))) + (t + (list type)))) + +(define (find_month_from_number token string-number) + "(find_month_from_number token string-number) +Find the textual representation of the month from the given string number" + (let ((nnum (parse-number string-number))) + (cond + ((equal? 1 nnum) (list "January")) + ((equal? 2 nnum) (list "February")) + ((equal? 3 nnum) (list "March")) + ((equal? 4 nnum) (list "April")) + ((equal? 5 nnum) (list "May")) + ((equal? 6 nnum) (list "June")) + ((equal? 7 nnum) (list "July")) + ((equal? 8 nnum) (list "August")) + ((equal? 9 nnum) (list "September")) + ((equal? 10 nnum) (list "October")) + ((equal? 11 nnum) (list "November")) + ((equal? 12 nnum) (list "December")) + (t + (cons "month" + (builtin_english_token_to_words token string-number)))))) + +(define (tok_allcaps sc) + "(tok_allcaps sc) +Returns 1 if sc's name is all capitals, 0 otherwise" + (if (string-matches (item.name sc) "[A-Z]+") + "1" + "0")) + +(define (tok_section_name sc) + "(tok_section_name sc) +Returns 1 if sc's name is in list of things that are section/chapter +like." + (if (member_string + (downcase (item.name sc)) + '(chapter section part article phrase verse scene act book + volume chap sect art vol war fortran saturn + trek)) + "1" + "0")) + +(define (tok_string_as_letters name) + "(tok_string_as_letters NAME) +Return list of letters marked as letter part of speech made +by exploding NAME." + (mapcar + (lambda (letter) + (list (list 'name letter) + (list 'pos token.letter_pos))) + (symbolexplode name))) + +(define (tok_rex sc) + "(tok_rex sc) +Returns 1 if King like title is within 3 tokens before or 2 after." + (let ((kings '(king queen pope duke tsar emperor shah ceasar + duchess tsarina empress baron baroness + count countess))) + (if (or (member_string + (downcase (item.feat sc "R:Token.pp.name")) + kings) + (member_string + (downcase (item.feat sc "R:Token.pp.p.name")) + kings) + (member_string + (downcase (item.feat sc "R:Token.n.name")) + kings)) + "1" + "0"))) + +(define (tok_rex_names sc) + "(tok_rex sc) +Returns 1 if this is a King-like name." + (if (and + (member_string + (downcase (item.name sc)) + '(louis henry charles philip george edward pius william richard + ptolemy john paul peter nicholas + alexander frederick james alfonso ivan napolean leo + gregory catherine alexandria pierre elizabeth mary)) + (or (string-equal "" (item.feat sc "punc")) + (string-equal "0" (item.feat sc "punc")))) + "1" + "0")) + +(provide 'tokenpos) diff --git a/lib/tts.scm b/lib/tts.scm new file mode 100644 index 0000000..2888967 --- /dev/null +++ b/lib/tts.scm @@ -0,0 +1,304 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Various tts functions and hooks + +;;; Once the utterance is built these functions synth and play it +(defvar tts_hooks (list utt.synth utt.play) + "tts_hooks +Function or list of functions to be called during text to speech. +The function tts_file, chunks data into Utterances of type Token and +applies this hook to the utterance. This typically contains the utt.synth +function and utt.play. [see TTS]") + +;;; This is used to define utterance breaks in tts on files +(defvar eou_tree + '((lisp_max_num_tokens > 200) + ((1)) + ((n.whitespace matches ".*\n.*\n\\(.\\|\n\\)*");; significant break (2 nls) + ((1)) + ((name matches "--+") + ((1)) + ((punc matches ".*[\\?:!;].*") + ((1)) + ((punc matches ".*\\..*") + ((punc matches "..+");; longer punctuation string + ((punc matches "\\..*,") ;; for U.S.S.R., like tokens + ((0)) + ((1))) + ;; This is to distinguish abbreviations vs periods + ;; These are heuristics + ((name matches "\\(.*\\..*\\|[A-Z][A-Za-z]?[A-Za-z]?\\|etc\\)");; an abbreviation + ((n.whitespace is " ") + ((0));; if abbrev single space isn't enough for break + ((n.name matches "[A-Z].*") + ((1)) + ((0)))) + ((n.whitespace is " ");; if it doesn't look like an abbreviation + ((n.name matches "[A-Z].*");; single space and non-cap is no break + ((1)) + ((0))) + ((1))))) + ((0))))))) + "eou_tree +End of utterance tree. A decision tree used to determine if the given +token marks the end of an utterance. It may look one token ahead to +do this. [see Utterance chunking]") + +(define (max_num_tokens x) + "(num_tokens x) +This is probably controversial, but its good to have a maximum number +of tokens in an utterance. You really dont want to wait on very long +utterances, some utts can be thousands of words long, these maybe +shouldn't be spoken, but we do have to deal with them." + (let ((c 1) (y x)) + (while y + (set! c (+ 1 c)) + (set! y (item.prev y))) + c)) + +;;; The program used to parse stml files +;;; Needs version 1.0 to allow -D option to work +(defvar sgml_parse_progname "nsgmls-1.0" + "sgml_parse_progname +The name of the program to use to parse SGML files. Typically this is +nsgml-1.0 from the sp SGML package. [see XML/SGML requirements]") + +;;; When PHRASE elements are specified in an utterance in STML +;;; no other method for phrase prediction is to be used, so we +;;; use the following tree +(set! stml_phrase_cart_tree +'((R:Token.parent.pbreak is B) + ((B)) + ((n.name is 0) + ((B)) + ((NB))))) + +(define (xxml_synth utt) +"(xxml_synth UTT) +This applies the xxml_hooks (mode specific) and tts_hooks to the +given utterance. This function should be called from xxml element +definitions that signal an utterance boundary." + (cond + ((or (not utt) + (not (utt.relation utt 'Token))) ;; no tokens + nil) + (t + (apply_hooks xxml_hooks utt) + (apply_hooks tts_hooks utt) + (set! utt nil) ;; not enough ... + (gc) + utt)) +) + +(define (xxml_attval ATTNAME ATTLIST) +"(xxml_attval ATTNAME ATTLIST) +Returns attribute value of ATTNAME in ATTLIST or nil if it doesn't +exists." + (cond + ((not ATTLIST) + nil) + ((string-equal ATTNAME (car (car ATTLIST))) + (car (cdr (car ATTLIST)))) + (t + (xxml_attval ATTNAME (cdr ATTLIST))))) + +(defvar xxml_word_features nil + "xxml_word_features +An assoc list of features to be added to the current word when +in xxml parse mode.") + +(defvar xxml_token_hooks nil + "xxml_token_hooks +Functions to apply to each token.") + +(defvar xxml_hooks nil + "xxml_hooks + Function or list of functions to be applied to an utterance when + parsed with xxML, before tts_hooks.") + +(defvar xxml_elements nil + "xxml_elements +List of Scheme actions to perform on finding xxML tags.") + +(defvar xml_dtd_dir libdir + "xml_dtd_dir +The directory holding standard DTD form the xml parser.") + +(set! tts_fnum 1) +(define (save_tts_output utt) + (let ((fname (string-append "tts_file_" tts_fnum ".wav"))) + (format stderr "festival: saving waveform in %s\n" fname) + (utt.save.wave utt fname) + (set! tts_fnum (+ 1 tts_fnum)) + utt)) + +(define (save_waves_during_tts) + "(save_waves_during_tts) +Save each waveform in the current directory in files \"tts_file_XXX.wav\". +use (save_waves_during_tts_STOP) to stop saving waveforms" + (if (not (member save_tts_output tts_hooks)) + (set! tts_hooks (append tts_hooks (list save_tts_output)))) + t) + +(define (save_waves_during_tts_STOP) + "(save_waves_during_tts_STOP) +Stop saving waveforms when doing tts." + (if (member save_tts_output tts_hooks) + (set! tts_hooks (delq save_tts_output tts_hooks))) + t) + +(define (tts file mode) + "(tts FILE MODE) + Convert FILE to speech. MODE identifies any special treatment + necessary for FILE. This is simply a front end to tts_file but + puts the system in async audio mode first. [see TTS]" + (audio_mode 'async) + (if mode + (tts_file file mode) + (tts_file file (tts_find_text_mode file auto-text-mode-alist))) +;; (audio_mode 'sync) ;; Hmm this is probably bad +) + +(define (tts_text string mode) + "(tts_text STRING mode) +Apply tts on given string. That is, segment it into utterances and +apply tts_hooks to each utterance. This is naively done by saving the +string to a file and calling tts_file on that file. This differs from +SayText which constructs a single utterance for the whole given text." + (let ((tmpfile (make_tmp_filename)) + (fd)) + (set! fd (fopen tmpfile "wb")) + (format fd "%s" string) + (fclose fd) + (audio_mode 'async) + (tts_file tmpfile mode) + (delete-file tmpfile))) + +(define (save_record_wave utt) +"Saves the waveform and records its so it can be joined into a +a single waveform at the end." + (let ((fn (make_tmp_filename))) + (utt.save.wave utt fn) + (set! wavefiles (cons fn wavefiles)) + utt)) + +(define (combine_waves) + "(combine_waves) +Join all the waves together into the desired output file +and delete the intermediate ones." + (let ((wholeutt (Utterance Text ""))) + (mapcar + (lambda (d) + (utt.import.wave wholeutt d t) + (delete-file d)) + (reverse wavefiles)) + wholeutt)) + +(define (tts_textall string mode) + "(tts_textall STRING MODE) +Apply tts to STRING. This function is specifically designed for +use in server mode so a single function call may synthesize the string. +This function name maybe added to the server safe functions." + (if (not (string-equal mode "nil")) + (begin + ;; a mode has been specified so do something different + (let ((tmpfile (make_tmp_filename)) + (fd)) + (set! fd (fopen tmpfile "wb")) + (format fd "%s" string) + (fclose fd) + (set! tts_hooks (list utt.synth save_record_wave)) + (set! wavefiles nil) + (tts_file tmpfile mode) + (delete-file tmpfile) + (utt.send.wave.client (combine_waves)) + )) + ;; Simple fundamental mode + (utt.send.wave.client + (utt.synth + (eval (list 'Utterance 'Text string)))))) + +;; Function to interface with app_festival for asterisk +;; See http://www.asterisk.org +(define (tts_textasterisk string mode) + "(tts_textasterisk STRING MODE) +Apply tts to STRING. This function is specifically designed for +use in server mode so a single function call may synthesize the string. +This function name may be added to the server safe functions." + (utt.send.wave.asterisk + (utt.synth + (eval (list 'Utterance 'Text string))))) + + + +(define (tts_return_to_client) + "(tts_return_to_client) +This function is called by clients who wish to return waveforms of +their text samples asynchronously. This replaces utt.play in tts_hooks +with utt.send.wave.client." + (if (not (member utt.send.wave.client tts_hooks)) + (set! tts_hooks + (append (delq utt.play tts_hooks) + (list utt.send.wave.client))))) + +(defvar tts_text_modes nil +"tts_text_modes +An a-list of text modes data for file type specific tts functions. +See the manual for an example. [see Text modes]") + +(define (tts_find_text_mode file alist) + "(find_text_mode FILE ALIST) +Search through ALIST for one that matches FILE. Returns nil if +nothing macthes." + (cond + ((null alist) nil) ;; can't find a match + ((string-matches file (string-append ".*" (car (car alist)) ".*")) + (cdr (car alist))) + (t + (tts_find_text_mode file (cdr alist))))) + +(defvar auto-text-mode-alist + (list + (cons "\\.sable$" 'sable) + (cons "\\.ogi" 'ogimarkup) + (cons "\\.email" 'email) + (cons "" 'fundamental) + ) + "auto-text-mode-alist +Following Emacs' auto-mode-alist thios provides a mechanism for auto +selecting a TTS text mode based on the filename being analyzed. Its +format is exactly the same as Emacs in that it consists of an alist of +dotted pairs of regular expression and text mode name.") + +(provide 'tts) diff --git a/lib/unilex_phones.scm b/lib/unilex_phones.scm new file mode 100644 index 0000000..25e905e --- /dev/null +++ b/lib/unilex_phones.scm @@ -0,0 +1,189 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 2003, 2004 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; unilex phoneset +;;; + + +(defPhoneSet + unilex + ;;; Phone Features + (;; vowel or consonant + (vc + -) + ;; vowel length: short long dipthong schwa + (vlng s l d a 0) + ;; vowel height: high mid low + (vheight 1 2 3 0) + ;; vowel frontness: front mid back + (vfront 1 2 3 0) + ;; lip rounding + (vrnd + - 0) + ;; consonant type: stop fricative affricative nasal liquid approximant + (ctype s f a n l t r 0) + ;; place of articulation: labial alveolar palatal labio-dental + ;; dental velar glottal + (cplace l a p b d v g 0) + ;; consonant voicing + (cvox + - 0) + ) + ( + (SIL - 0 0 0 0 0 0 -) ;; slience ... + (# - 0 0 0 0 0 0 -) ;; slience ... + (B_10 - 0 0 0 0 0 0 -) ;; Pauses + (B_20 - 0 0 0 0 0 0 -) ;; Pauses + (B_30 - 0 0 0 0 0 0 -) ;; Pauses + (B_40 - 0 0 0 0 0 0 -) ;; Pauses + (B_50 - 0 0 0 0 0 0 -) ;; Pauses + (B_100 - 0 0 0 0 0 0 -) ;; Pauses + (B_150 - 0 0 0 0 0 0 -) ;; Pauses + (B_200 - 0 0 0 0 0 0 -) ;; Pauses + (B_250 - 0 0 0 0 0 0 -) ;; Pauses + (B_300 - 0 0 0 0 0 0 -) ;; Pauses + (B_400 - 0 0 0 0 0 0 -) ;; Pauses + (IGNORE - 0 0 0 0 0 0 -) ;; Pauses + + + ;; insert the phones here, see examples in + ;; festival/lib/*_phones.scm + + ;(name vc vling vheight vfront vrnd ctype cplace cvox) + + ;;; Rob guesed these values for Edinburgh English + ;;; Not to be taken too seriously. + + (p - 0 0 0 0 s l -) + (t - 0 0 0 0 s a -) + (? - 0 0 0 0 s g +) ;;; ??? + (t^ - 0 0 0 0 t a +) ;;; ??? + (k - 0 0 0 0 s v -) + (x - 0 0 0 0 f v -) + (b - 0 0 0 0 s l +) + (d - 0 0 0 0 s a +) + (g - 0 0 0 0 s v +) + (ch - 0 0 0 0 a p -) + (jh - 0 0 0 0 a p +) + (s - 0 0 0 0 f a -) + (z - 0 0 0 0 f a +) + (sh - 0 0 0 0 f p -) + (zh - 0 0 0 0 f p +) + (f - 0 0 0 0 f b -) + (v - 0 0 0 0 f b +) + (th - 0 0 0 0 f d -) + (dh - 0 0 0 0 f d +) + (h - 0 0 0 0 f 0 -) ;;; ??? + (m - 0 0 0 0 n l +) + (m! - 0 0 0 0 n l +) + (n - 0 0 0 0 n a +) + (n! - 0 0 0 0 n a +) + (ng - 0 0 0 0 n v +) + (l - 0 0 0 0 r a +) + (ll - 0 0 0 0 r a +) + (lw - 0 0 0 0 r a +) + (l! - 0 0 0 0 r a +) + (r - 0 0 0 0 r a +) + (y - 0 0 0 0 l p +) + (w - 0 0 0 0 l l +) + (hw - 0 0 0 0 l l +) + (e + s 2 1 - 0 0 0) + (ao + s 3 1 - 0 0 0) + (a + s 3 1 - 0 0 0) + (ah + s 3 1 - 0 0 0) + (oa + s 3 1 - 0 0 0) + (aa + s 3 1 - 0 0 0) + (ar + s 3 1 - 0 0 0) + (eh + s 3 1 - 0 0 0) ;;; ? + (oul + d 2 3 + 0 0 0) ;;; ? + (ou + d 2 3 + 0 0 0) + (ouw + d 2 3 + 0 0 0) + (oou + l 3 3 + 0 0 0) + (o + l 3 3 + 0 0 0) + (au + l 3 3 + 0 0 0) + (oo + l 3 3 + 0 0 0) + (or + l 3 3 + 0 0 0) + (our + d 2 3 + 0 0 0) + (ii + l 1 1 - 0 0 0) + (ihr + s 1 1 - 0 0 0) + (iy + l 1 1 - 0 0 0) + (i + s 1 1 - 0 0 0) + (ie + l 1 1 - 0 0 0) ;;; ? + (iii + s 1 1 - 0 0 0) ;;; was ii; + (@r + a 2 2 - r a +) + (@ + a 2 2 - 0 0 0) + (uh + s 2 2 - 0 0 0) + (uhr + s 2 2 - 0 0 0) + (u + l 1 3 + 0 0 0) + (uu + l 1 3 + 0 0 0) + (iu + l 1 3 + 0 0 0) + (uuu + l 1 3 + 0 0 0) ;;; was uu; + (uw + l 1 3 + 0 0 0) ;;; ??? + (uul + l 1 3 + 0 0 0) ;;; ??? + (ei + d 2 1 - 0 0 0) + (ee + d 2 1 - 0 0 0) + (ai + d 3 2 - 0 0 0) ;;; ??? + (ae + d 3 2 - 0 0 0) ;;; ??? + (aer + d 3 2 - 0 0 0) ;;; ??? + (aai + d 3 2 - 0 0 0) ;;; ??? + (oi + d 2 3 + 0 0 0) ;;; ??? + (oir + d 2 3 + 0 0 0) ;;; ??? + (ow + d 3 2 - 0 0 0) + (owr + d 3 2 - 0 0 0) ;;; ??? + (oow + d 3 2 - 0 0 0) ;;; ??? + (i@ + l 1 1 - 0 0 0) ;;; iy + @ ? + (ir + s 1 1 - 0 0 0) + (irr + s 1 1 - 0 0 0) ;;; was ir; + (iir + s 1 1 - 0 0 0) + (@@r + a 2 2 - 0 0 0) + (er + s 2 1 - 0 0 0) + (eir + s 2 1 - 0 0 0) ;;; ??? + (ur + s 1 3 + 0 0 0) ;;; ??? + (urr + s 1 3 + 0 0 0) ;;; ??? + (iur + s 1 3 + 0 0 0) ;;; ??? + ) +) + +(PhoneSet.silences '( # SIL)) + +(define (unilex::select_phoneset) + "(unilex::select_phoneset) +Set up phone set for unilex" + (Parameter.set 'PhoneSet 'unilex) + (PhoneSet.select 'unilex) +) + +(define (unilex::reset_phoneset) + "(unilex::reset_phoneset) +Reset phone set for unilex." + t +) + +(provide 'unilex_phones) diff --git a/lib/voices.scm b/lib/voices.scm new file mode 100644 index 0000000..af3f908 --- /dev/null +++ b/lib/voices.scm @@ -0,0 +1,360 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Preapre to access voices. Searches down a path of places. +;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(define current-voice nil + "current-voice + The name of the current voice.") + +;; The path to search for voices is created from the load-path with +;; an extra list of directories appended. + +(defvar system-voice-path '( ) + "system-voice-path + Additional directory not near the load path where voices can be + found, this can be redefined in lib/sitevars.scm if desired.") + +(defvar system-voice-path-multisyn '( ) + "system-voice-path-multisyn + Additional directory not near the load path where multisyn voices can be + found, this can be redefined in lib/sitevars.scm if desired.") + +(defvar voice-path + (remove-duplicates + (append (mapcar (lambda (d) (path-append d "voices/")) load-path) + (mapcar (lambda (d) (path-as-directory d)) system-voice-path) + )) + + "voice-path + List of places to look for voices. If not set it is initialised from + load-path by appending \"voices/\" to each directory with + system-voice-path appended.") + +(defvar voice-path-multisyn + (remove-duplicates + (append (mapcar (lambda (d) (path-append d "voices-multisyn/")) load-path) + (mapcar (lambda (d) (path-as-directory d)) system-voice-path-multisyn) + )) + + "voice-path-multisyn + List of places to look for multisyn voices. If not set it is initialised from + load-path by appending \"voices-multisyn/\" to each directory with + system-voice-path-multisyn appended.") + + +;; Declaration of voices. When we declare a voice we record the +;; directory and set up an autoload for the vocie-selecting function + +(defvar voice-locations () + "voice-locations + Association list recording where voices were found.") + +(defvar voice-location-trace nil + "voice-location-trace + Set t to print voice locations as they are found") + +(define (voice-location name dir doc) + "(voice-location NAME DIR DOCSTRING) + Record the location of a voice. Called for each voice found on voice-path. + Can be called in site-init or .festivalrc for additional voices which + exist elsewhere." + (let ((func_name (intern (string-append "voice_" name))) + ) + + (set! name (intern name)) + (set! voice-locations (cons (cons name dir) voice-locations)) + (eval (list 'autoload func_name (path-append dir "festvox/" name) doc)) + (if voice-location-trace + (format t "Voice: %s %s\n" name dir) + ) + ) + ) + +(define (voice-location-multisyn name rootname dir doc) + "(voice-location NAME ROOTNAME DIR DOCSTRING) + Record the location of a voice. Called for each voice found on voice-path. + Can be called in site-init or .festivalrc for additional voices which + exist elsewhere." + (let ((func_name (intern (string-append "voice_" name))) + ) + + (set! name (intern name)) + (set! voice-locations (cons (cons name dir) voice-locations)) + (eval (list 'autoload func_name (path-append dir "festvox/" rootname) doc)) + (if voice-location-trace + (format t "Voice: %s %s\n" name dir) + ) + ) + ) + + + +(define (current_voice_reset) +"(current_voice_reset) +This function is called at the start of defining any new voice. +It is design to allow the previous voice to reset any global +values it has messed with. If this variable value is nil then +the function wont be called.") + +(define (voice_reset) +"(voice_reset) +This resets all variables back to acceptable values that may affect +voice generation. This function should always be called at the +start of any function defining a voice. In addition to reseting +standard variables the function current_voice_reset will be called. +This should always be set by the voice definition function (even +if it does nothing). This allows voice specific changes to be reset +when a new voice is selection. Unfortunately I can't force this +to be used." + (Parameter.set 'Duration_Stretch 1.0) + (set! after_synth_hooks default_after_synth_hooks) + + ;; The follow are reset to allow existing voices to continue + ;; to work, new voices should be setting these explicitly + (Parameter.set 'Token_Method 'Token_English) + (Parameter.set 'POS_Method Classic_POS) + (Parameter.set 'Phrasify_Method Classic_Phrasify) + (Parameter.set 'Word_Method Classic_Word) + (Parameter.set 'Pause_Method Classic_Pauses) + (Parameter.set 'PostLex_Method Classic_PostLex) + + (set! diphone_module_hooks nil) + (set! UniSyn_module_hooks nil) + + (if current_voice_reset + (current_voice_reset)) + (set! current_voice_reset nil) +) + + +(defvar Voice_descriptions nil + "Internal variable containing list of voice descriptions as +decribed by proclaim_voice.") + +(define (proclaim_voice name description) +"(proclaim_voice NAME DESCRIPTION) +Describe a voice to the systen. NAME should be atomic name, that +conventionally will have voice_ prepended to name the basic selection +function. OPTIONS is an assoc list of feature and value and must +have at least features for language, gender, dialect and +description. The first there of these are atomic, while the description +is a text string describing the voice." + (let ((voxdesc (assoc name Voice_descriptions))) + (if voxdesc + (set-car! (cdr voxdesc) description) + (set! Voice_descriptions + (cons (list name description) Voice_descriptions)))) +) + +(define (voice.description name) +"(voice.description NAME) +Output description of named voice. If the named voice is not yet loaded +it is loaded." + (let ((voxdesc (assoc name Voice_descriptions)) + (cv current-voice)) + (if (null voxdesc) + (unwind-protect + (begin + (voice.select name) + (voice.select cv) ;; switch back to current voice + (set! voxdesc (assoc name Voice_descriptions))))) + (if voxdesc + voxdesc + (begin + (format t "SIOD: unknown voice %s\n" name) + nil)))) + +(define (voice.select name) +"(voice.select NAME) +Call function to set up voice NAME. This is normally done by +prepending voice_ to NAME and call it as a function." + (eval (list (intern (string-append "voice_" name))))) + +(define (voice.describe name) +"(voice.describe NAME) +Describe voice NAME by saying its description. Unfortunately although +it would be nice to say that voice's description in the voice itself +its not going to work cross language. So this just uses the current +voice. So here we assume voices describe themselves in English +which is pretty anglo-centric, shitsurei shimasu." + (let ((voxdesc (voice.description name))) + (let ((desc (car (cdr (assoc 'description (car (cdr voxdesc))))))) + (cond + (desc (tts_text desc nil)) + (voxdesc + (SayText + (format nil "A voice called %s exist but it has no description" + name))) + (t + (SayText + (format nil "There is no voice called %s defined" name))))))) + +(define (voice.list) +"(voice.list) +List of all (potential) voices in the system. This checks the voice-location +list of potential voices found be scanning the voice-path at start up time. +These names can be used as arguments to voice.description and +voice.describe." + (mapcar car voice-locations)) + +;; Voices are found on the voice-path if they are in directories of the form +;; DIR/LANGUAGE/NAME + +(define (search-for-voices) + "(search-for-voices) + Search down voice-path to locate voices." + + (let ((dirs voice-path) + (dir nil) + languages language + voices voicedir voice + ) + (while dirs + (set! dir (car dirs)) + (setq languages (directory-entries dir t)) + (while languages + (set! language (car languages)) + (set! voices (directory-entries (path-append dir language) t)) + (while voices + (set! voicedir (car voices)) + (set! voice (path-basename voicedir)) + (if (string-matches voicedir ".*\\..*") + nil + (voice-location + voice + (path-as-directory (path-append dir language voicedir)) + "voice found on path") + ) + (set! voices (cdr voices)) + ) + (set! languages (cdr languages)) + ) + (set! dirs (cdr dirs)) + ) + ) + ) + +;; A single file is allowed to define multiple multisyn voices, so this has +;; been adapted for this. Rob thinks this is just evil, but couldn't think +;; of a better way. +(define (search-for-voices-multisyn) + "(search-for-voices-multisyn) + Search down multisyn voice-path to locate multisyn voices." + (let ((dirs voice-path-multisyn) + (dir nil) + languages language + voices voicedir voice voice-list + ) + (while dirs + (set! dir (car dirs)) + (set! languages (directory-entries dir t)) + (while languages + (set! language (car languages)) + (set! voices (directory-entries (path-append dir language) t)) + (while voices + (set! voicedir (car voices)) + (set! voice (path-basename voicedir)) + (if (string-matches voicedir ".*\\..*") + nil + (begin + ;; load the voice definition file, but don't evaluate it! + (set! voice-def-file (load (path-append dir language voicedir "festvox" + (string-append voicedir ".scm")) t)) + ;; now find the "proclaim_voice" lines and register these voices. + (mapcar + (lambda (line) + (if (string-matches (car line) "proclaim_voice") + (voice-location-multisyn (intern (cadr (cadr line))) voicedir (path-append dir language voicedir) "registerd multisyn voice"))) + voice-def-file) + )) + (set! voices (cdr voices))) + (set! languages (cdr languages))) + (set! dirs (cdr dirs))))) + +(search-for-voices) +(search-for-voices-multisyn) + +;; We select the default voice from a list of possibilities. One of these +;; had better exist in every installation. + +(define (no_voice_error) + (format t "\nWARNING\n") + (format t "No default voice found in %l\n" voice-path) + (format t "either no voices unpacked or voice-path is wrong\n") + (format t "Scheme interpreter will work, but there is no voice to speak with.\n") + (format t "WARNING\n\n")) + +(defvar voice_default 'no_voice_error + "voice_default +A variable whose value is a function name that is called on start up to +the default voice. [see Site initialization]") + +(defvar default-voice-priority-list + '(kal_diphone + cmu_us_bdl_arctic_hts + cmu_us_jmk_arctic_hts + cmu_us_slt_arctic_hts + cmu_us_awb_arctic_hts +; cstr_rpx_nina_multisyn ; restricted license (lexicon) +; cstr_rpx_jon_multisyn ; restricted license (lexicon) +; cstr_edi_awb_arctic_multisyn ; restricted license (lexicon) +; cstr_us_awb_arctic_multisyn + ked_diphone + don_diphone + rab_diphone + en1_mbrola + us1_mbrola + us2_mbrola + us3_mbrola + gsw_diphone ;; not publically distributed + el_diphone + ) + "default-voice-priority-list + List of voice names. The first of them available becomes the default voice.") + +(let ((voices default-voice-priority-list) + voice) + (while (and voices (eq voice_default 'no_voice_error)) + (set! voice (car voices)) + (if (assoc voice voice-locations) + (set! voice_default (intern (string-append "voice_" voice))) + ) + (set! voices (cdr voices)) + ) + ) + + +(provide 'voices) diff --git a/missing b/missing new file mode 100755 index 0000000..7789652 --- /dev/null +++ b/missing @@ -0,0 +1,190 @@ +#! /bin/sh +# Common stub for a few missing GNU programs while installing. +# Copyright (C) 1996, 1997 Free Software Foundation, Inc. +# Franc,ois Pinard , 1996. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2, or (at your option) +# any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA +# 02111-1307, USA. + +if test $# -eq 0; then + echo 1>&2 "Try \`$0 --help' for more information" + exit 1 +fi + +case "$1" in + + -h|--h|--he|--hel|--help) + echo "\ +$0 [OPTION]... PROGRAM [ARGUMENT]... + +Handle \`PROGRAM [ARGUMENT]...' for when PROGRAM is missing, or return an +error status if there is no known handling for PROGRAM. + +Options: + -h, --help display this help and exit + -v, --version output version information and exit + +Supported PROGRAM values: + aclocal touch file \`aclocal.m4' + autoconf touch file \`configure' + autoheader touch file \`config.h.in' + automake touch all \`Makefile.in' files + bison create \`y.tab.[ch]', if possible, from existing .[ch] + flex create \`lex.yy.c', if possible, from existing .c + lex create \`lex.yy.c', if possible, from existing .c + makeinfo touch the output file + yacc create \`y.tab.[ch]', if possible, from existing .[ch]" + ;; + + -v|--v|--ve|--ver|--vers|--versi|--versio|--version) + echo "missing - GNU libit 0.0" + ;; + + -*) + echo 1>&2 "$0: Unknown \`$1' option" + echo 1>&2 "Try \`$0 --help' for more information" + exit 1 + ;; + + aclocal) + echo 1>&2 "\ +WARNING: \`$1' is missing on your system. You should only need it if + you modified \`acinclude.m4' or \`configure.in'. You might want + to install the \`Automake' and \`Perl' packages. Grab them from + any GNU archive site." + touch aclocal.m4 + ;; + + autoconf) + echo 1>&2 "\ +WARNING: \`$1' is missing on your system. You should only need it if + you modified \`configure.in'. You might want to install the + \`Autoconf' and \`GNU m4' packages. Grab them from any GNU + archive site." + touch configure + ;; + + autoheader) + echo 1>&2 "\ +WARNING: \`$1' is missing on your system. You should only need it if + you modified \`acconfig.h' or \`configure.in'. You might want + to install the \`Autoconf' and \`GNU m4' packages. Grab them + from any GNU archive site." + files=`sed -n 's/^[ ]*A[CM]_CONFIG_HEADER(\([^)]*\)).*/\1/p' configure.in` + test -z "$files" && files="config.h" + touch_files= + for f in $files; do + case "$f" in + *:*) touch_files="$touch_files "`echo "$f" | + sed -e 's/^[^:]*://' -e 's/:.*//'`;; + *) touch_files="$touch_files $f.in";; + esac + done + touch $touch_files + ;; + + automake) + echo 1>&2 "\ +WARNING: \`$1' is missing on your system. You should only need it if + you modified \`Makefile.am', \`acinclude.m4' or \`configure.in'. + You might want to install the \`Automake' and \`Perl' packages. + Grab them from any GNU archive site." + find . -type f -name Makefile.am -print | + sed 's/\.am$/.in/' | + while read f; do touch "$f"; done + ;; + + bison|yacc) + echo 1>&2 "\ +WARNING: \`$1' is missing on your system. You should only need it if + you modified a \`.y' file. You may need the \`Bison' package + in order for those modifications to take effect. You can get + \`Bison' from any GNU archive site." + rm -f y.tab.c y.tab.h + if [ $# -ne 1 ]; then + eval LASTARG="\${$#}" + case "$LASTARG" in + *.y) + SRCFILE=`echo "$LASTARG" | sed 's/y$/c/'` + if [ -f "$SRCFILE" ]; then + cp "$SRCFILE" y.tab.c + fi + SRCFILE=`echo "$LASTARG" | sed 's/y$/h/'` + if [ -f "$SRCFILE" ]; then + cp "$SRCFILE" y.tab.h + fi + ;; + esac + fi + if [ ! -f y.tab.h ]; then + echo >y.tab.h + fi + if [ ! -f y.tab.c ]; then + echo 'main() { return 0; }' >y.tab.c + fi + ;; + + lex|flex) + echo 1>&2 "\ +WARNING: \`$1' is missing on your system. You should only need it if + you modified a \`.l' file. You may need the \`Flex' package + in order for those modifications to take effect. You can get + \`Flex' from any GNU archive site." + rm -f lex.yy.c + if [ $# -ne 1 ]; then + eval LASTARG="\${$#}" + case "$LASTARG" in + *.l) + SRCFILE=`echo "$LASTARG" | sed 's/l$/c/'` + if [ -f "$SRCFILE" ]; then + cp "$SRCFILE" lex.yy.c + fi + ;; + esac + fi + if [ ! -f lex.yy.c ]; then + echo 'main() { return 0; }' >lex.yy.c + fi + ;; + + makeinfo) + echo 1>&2 "\ +WARNING: \`$1' is missing on your system. You should only need it if + you modified a \`.texi' or \`.texinfo' file, or any other file + indirectly affecting the aspect of the manual. The spurious + call might also be the consequence of using a buggy \`make' (AIX, + DU, IRIX). You might want to install the \`Texinfo' package or + the \`GNU make' package. Grab either from any GNU archive site." + file=`echo "$*" | sed -n 's/.*-o \([^ ]*\).*/\1/p'` + if test -z "$file"; then + file=`echo "$*" | sed 's/.* \([^ ]*\) *$/\1/'` + file=`sed -n '/^@setfilename/ { s/.* \([^ ]*\) *$/\1/; p; q; }' $file` + fi + touch $file + ;; + + *) + echo 1>&2 "\ +WARNING: \`$1' is needed, and you do not seem to have it handy on your + system. You might have modified some files without having the + proper tools for further handling them. Check the \`README' file, + it often tells you about the needed prerequirements for installing + this package. You may also peek at any GNU archive site, in case + some other package would contain this missing \`$1' program." + exit 1 + ;; +esac + +exit 0 diff --git a/mkinstalldirs b/mkinstalldirs new file mode 100755 index 0000000..7752276 --- /dev/null +++ b/mkinstalldirs @@ -0,0 +1,40 @@ +#! /bin/sh +# mkinstalldirs --- make directory hierarchy +# Author: Noah Friedman +# Created: 1993-05-16 +# Public domain + +# $Id: mkinstalldirs,v 1.1 2001/04/04 13:12:35 awb Exp $ + +errstatus=0 + +for file +do + set fnord `echo ":$file" | sed -ne 's/^:\//#/;s/^://;s/\// /g;s/^#/\//;p'` + shift + + pathcomp= + for d + do + pathcomp="$pathcomp$d" + case "$pathcomp" in + -* ) pathcomp=./$pathcomp ;; + esac + + if test ! -d "$pathcomp"; then + echo "mkdir $pathcomp" + + mkdir "$pathcomp" || lasterr=$? + + if test ! -d "$pathcomp"; then + errstatus=$lasterr + fi + fi + + pathcomp="$pathcomp/" + done +done + +exit $errstatus + +# mkinstalldirs ends here diff --git a/packaging/festival-1.95-audsp.patch b/packaging/festival-1.95-audsp.patch new file mode 100644 index 0000000..106b62c --- /dev/null +++ b/packaging/festival-1.95-audsp.patch @@ -0,0 +1,11 @@ +--- src/arch/festival/audspio.cc ++++ src/arch/festival/audspio.cc +@@ -108,7 +108,7 @@ + { + audio = ft_get_param("Audio_Method"); + command = ft_get_param("Audio_Command"); +- audfds = pipe_open("audsp"); ++ audfds = pipe_open("/usr/lib/festival/audsp"); + if (audio != NIL) + audsp_send(EST_String("method ")+get_c_string(audio)); + if (command != NIL) diff --git a/packaging/festival-1.95-examples.patch b/packaging/festival-1.95-examples.patch new file mode 100644 index 0000000..1af4629 --- /dev/null +++ b/packaging/festival-1.95-examples.patch @@ -0,0 +1,12 @@ +--- festival/examples/Makefile ++++ festival/examples/Makefile +@@ -54,8 +54,7 @@ + + $(ALL) : % : %.sh + rm -f $@ +- @echo "#!/bin/sh" >$@ +- @echo "\"true\" ; exec "$(FESTIVAL_HOME)/bin/festival --script '$$0 $$*' >>$@ ++ @echo "#!/usr/bin/festival --script" >$@ + cat $< >>$@ + chmod +x $@ + diff --git a/packaging/festival-1.95-libdir.patch b/packaging/festival-1.95-libdir.patch new file mode 100644 index 0000000..dbd1476 --- /dev/null +++ b/packaging/festival-1.95-libdir.patch @@ -0,0 +1,10 @@ +--- config/project.mak ++++ config/project.mak +@@ -113,6 +113,6 @@ + DOCXX_DIRS = $(TOP)/src + MODULE_TO_DOCXX = perl $(TOP)/src/modules/utilities/extract_module_doc++.prl + +-FTLIBDIR = $(FESTIVAL_HOME)/lib ++FTLIBDIR = /usr/share/festival + + diff --git a/packaging/festival-1.96-chroot.patch b/packaging/festival-1.96-chroot.patch new file mode 100644 index 0000000..ec6554c --- /dev/null +++ b/packaging/festival-1.96-chroot.patch @@ -0,0 +1,111 @@ +--- src/main/festival_main.cc ++++ src/main/festival_main.cc +@@ -39,6 +39,10 @@ + /* */ + /*=======================================================================*/ + #include ++#include ++#include ++#include ++#include + + using namespace std; + +@@ -75,6 +79,9 @@ + EST_StrList files; + int real_number_of_files = 0; + int heap_size = FESTIVAL_HEAP_SIZE; ++ unsigned int uid = -1; ++ unsigned int gid = -1; ++ struct passwd *pw; + + if (festival_check_script_mode(argc,argv) == TRUE) + { // Need to check this directly as in script mode args are +@@ -106,6 +113,9 @@ + " english, spanish and welsh are available\n"+ + "--server Run in server mode waiting for clients\n"+ + " of server_port (1314)\n"+ ++ "--chroot Run server in chroot\n"+ ++ "--uid Run server as given user\n"+ ++ "--gid Run server with this group\n"+ + "--script \n"+ + " Used in #! scripts, runs in batch mode on\n"+ + " file and passes all other args to Scheme\n"+ +@@ -123,6 +133,77 @@ + exit(0); + } + ++ if( al.present( "--uid" ) ) ++ { ++ EST_String b = al.sval( "--uid" ); ++ ++ pw = getpwnam( b.str() ); ++ if( pw != NULL ) ++ { ++ uid = pw->pw_uid; ++ gid = pw->pw_gid; ++ } ++ else ++ { ++ printf("unknow user\n"); ++ festival_error(); ++ } ++ } ++ ++ if( al.present( "--gid" ) ) ++ { ++ gid = al.ival( "--gid" ); ++ if( al.present( "--uid" ) ) ++ { ++ printf( "useless without --uid\n" ); ++ festival_error(); ++ } ++ } ++ ++ if( al.present( "--chroot" ) ) ++ { ++ if( !al.present( "--uid" ) ) ++ { ++ printf( "chroot only makes sense in combination with uid switching\n" ); ++ festival_error(); ++ } ++ ++ EST_String a = al.sval( "--chroot" ); ++ printf( "chroot to %s\n", a.str() ); ++ if( chdir( a.str() ) ) ++ { ++ festival_error(); ++ } ++ if( chroot( a.str() ) ) ++ { ++ festival_error(); ++ } ++ if( chdir( "/" ) ) ++ { ++ festival_error(); ++ } ++ } ++ ++ if( al.present( "--uid" ) ) ++ { ++ if( setgroups( 1, &gid ) < 0 ) ++ { ++ festival_error(); ++ } ++ ++ if( setgid( gid ) != 0 ) ++ { ++ printf( "can't setgid\n" ); ++ festival_error(); ++ } ++ ++ if( setuid( uid ) != 0 ) ++ { ++ printf( "can't setuid\n" ); ++ festival_error(); ++ } ++ } ++ + if (al.present("--libdir")) + festival_libdir = wstrdup(al.val("--libdir")); + else if (getenv("FESTLIBDIR") != 0) diff --git a/packaging/festival-no-LD_LIBRARY_PATH-extension.patch b/packaging/festival-no-LD_LIBRARY_PATH-extension.patch new file mode 100644 index 0000000..054b61e --- /dev/null +++ b/packaging/festival-no-LD_LIBRARY_PATH-extension.patch @@ -0,0 +1,69 @@ +Index: festival/src/scripts/shared_script +=================================================================== +--- festival.orig/src/scripts/shared_script ++++ festival/src/scripts/shared_script +@@ -1,24 +1,5 @@ + #!/bin/sh + +-# Festival shared script +- +-extend() { +- var="$1" +- extra="$2" +- eval "val=\$$var" +- +- if [ -n "$val" ] +- then +- val="$extra:$val" +- else +- val="$extra" +- fi +- eval "$var='$val'" +- eval "export $var" +- } +- +-extend __LDVAR__ "__EST__/lib:__LDPATH__" +- + exec __MAIN__/__PROGRAM__ "$@" + + exit 0 +Index: festival/src/scripts/shared_setup_prl +=================================================================== +--- festival.orig/src/scripts/shared_setup_prl ++++ festival/src/scripts/shared_setup_prl +@@ -1,10 +1,2 @@ + +-if (defined($ENV{LD_LIBRARY_PATH})) +- { +- $ENV{LD_LIBRARY_PATH} = "__TOP__/lib:__LDPATH__:$ENV{LD_LIBRARY_PATH}"; +- } +-else +- { +- $ENV{LD_LIBRARY_PATH} = "__TOP__/lib"; +- } + +Index: festival/src/scripts/shared_setup_sh +=================================================================== +--- festival.orig/src/scripts/shared_setup_sh ++++ festival/src/scripts/shared_setup_sh +@@ -1,20 +1,2 @@ + +-# festival shared setup +- +-extend() { +- var="$1" +- extra="$2" +- eval "val=\$$var" +- +- if [ -n "$val" ] +- then +- val="$extra:$val" +- else +- val="$extra" +- fi +- eval "$var='$val'" +- eval "export $var" +- } +- +-extend LD_LIBRARY_PATH "__EST__/lib:__LDPATH__" + diff --git a/packaging/festival-safe-temp-file.patch b/packaging/festival-safe-temp-file.patch new file mode 100644 index 0000000..1fc899d --- /dev/null +++ b/packaging/festival-safe-temp-file.patch @@ -0,0 +1,27 @@ +Index: festival/src/scripts/festival_server.sh +=================================================================== +--- festival.orig/src/scripts/festival_server.sh ++++ festival/src/scripts/festival_server.sh +@@ -210,14 +210,19 @@ trap "handle_term" 0 + + if $show + then +- create_server_startup $port $server_log /tmp/$$ 3>/dev/null ++ tmpfile=`mktemp -q` ++ if test $? -ne 0; then ++ echo "Error while getting configuration." ++ exit 1 ++ fi ++ create_server_startup $port $server_log "$tmpfile" 3>/dev/null + fl=false + while read l + do + if $fl ; then echo $l ; fi + if [ "$l" = ";---" ] ; then fl=true ; fi +- done ++.I sound.wav ++ ++ ++.SH DESCRIPTION ++ ++This script is part of the festival text-to-speech system. ++It is a wrapper for festival's Scheme code for easy usage in TTS ++scripts. ++ ++.SH OPTIONS +--- festival/doc/text2wave.options ++++ festival/doc/text2wave.options +@@ -0,0 +1,47 @@ ++.\" ++.\".SH OPTIONS ++.\" -mode Explicit tts mode. ++.\" -o ofile File to save waveform (default is stdout). ++.\" -otype Output waveform type: ulaw, snd, aiff, riff, nist etc. ++.\" (default is riff) ++.\" -F Output frequency. ++.\" -scale Volume factor ++.\" -eval File or lisp s-expression to be evaluated before ++.\" synthesis. ++ ++.TP ++.B \-mode ++.I string ++.br ++Explicit tts mode. ++.TP ++.B \-o ++.I ofile ++.br ++File to save waveform to. ++.br ++The default is ++.B stdout. ++.TP ++.B \-otype ++.I string ++.br ++Output waveform type: ulaw, snd, aiff, riff, nist etc. ++.br ++The default is ++.B riff. ++.TP ++.B \-f ++.I integer ++.br ++Output frequency. ++.TP ++.B \-scale ++.I float ++.br ++Volume factor. ++.TP ++.B \-eval ++.I "string" ++.br ++File or lisp s-expression to be evaluated before synthesis. +--- festival/doc/text2wave.tail ++++ festival/doc/text2wave.tail +@@ -0,0 +1,26 @@ ++ ++.SH BUGS ++More than you can imagine. ++ ++A manual with much detail (though not complete) is available ++in distributed as part of the system and is also accessible at ++.br ++http://www.cstr.ed.ac.uk/projects/festival/manual/ ++ ++Although we cannot guarantee the time required to fix bugs, we ++would appreciated it if they were reported to ++.br ++festival-bug@cstr.ed.ac.uk ++ ++.SH AUTHOR ++Alan W Black, Richard Caley and Paul Taylor ++.br ++(C) Centre for Speech Technology Research, 1996-1998 ++.br ++University of Edinburgh ++.br ++80 South Bridge ++.br ++Edinburgh EH1 1HN ++.br ++http://www.cstr.ed.ac.uk/projects/festival.html diff --git a/packaging/festival-use-pacat.patch b/packaging/festival-use-pacat.patch new file mode 100644 index 0000000..5c357df --- /dev/null +++ b/packaging/festival-use-pacat.patch @@ -0,0 +1,14 @@ +diff -up festival/lib/init.scm.use-pacat festival/lib/init.scm +--- festival/lib/init.scm.use-pacat 2008-10-27 21:35:08.000000000 -0400 ++++ festival/lib/init.scm 2008-10-27 21:41:08.000000000 -0400 +@@ -140,6 +140,10 @@ + (require 'token) + (require 'tts) + ++;;; Default to using pulseaudio (bug 467531) ++(Parameter.def 'Audio_Command "pacat --channels=1 --rate=$SR $FILE -n Festival --stream-name=Speech") ++(Parameter.set 'Audio_Method 'Audio_Command) ++ + ;;; + ;;; Local site initialization, if the file exists load it + ;;; diff --git a/packaging/festival.changes b/packaging/festival.changes new file mode 100644 index 0000000..551db9d --- /dev/null +++ b/packaging/festival.changes @@ -0,0 +1,3 @@ +* Fri Aug 31 21:11:28 UTC 2012 - jimmy.huang@intel.com +- Initial import to Tizen. + diff --git a/packaging/festival.spec b/packaging/festival.spec new file mode 100644 index 0000000..5ce48d7 --- /dev/null +++ b/packaging/festival.spec @@ -0,0 +1,155 @@ +Name: festival +Version: 2.1 +Release: 1 +Group: System/Libraries +License: MIT and GPL+ and TCL +Url: http://www.cstr.ed.ac.uk/projects/festival/ +Summary: A free speech synthesis and text-to-speech system +Source0: http://www.cstr.ed.ac.uk/downloads/festival/2.1/festival-%{version}-release.tar.gz +Source1: http://www.cstr.ed.ac.uk/downloads/festival/2.1/speech_tools-%{version}-release.tar.gz +Source2: http://www.cstr.ed.ac.uk/downloads/festival/2.1/festlex_CMU.tar.gz +Source3: http://www.cstr.ed.ac.uk/downloads/festival/2.1/festvox_kallpc16k.tar.gz +Source4: http://www.cstr.ed.ac.uk/downloads/festival/2.1/festlex_POSLEX.tar.gz +Patch0: festival-1.95-examples.patch +Patch1: festival-text2wave-manpage.patch +Patch2: festival-1.95-libdir.patch +Patch3: festival-1.95-audsp.patch +Patch4: festival-1.96-chroot.patch +Patch5: festival-no-LD_LIBRARY_PATH-extension.patch +Patch6: festival-safe-temp-file.patch +# Use pulseaudio +Patch7: festival-use-pacat.patch +Patch101: speech_tools-undefined-operation.patch +Patch102: speech_tools-1.2.95-config.patch +Patch103: speech_tools-no-LD_LIBRARY_PATH-extension.patch +Patch104: speech_tools-gcc47.patch +BuildRequires: pkgconfig(ncurses) + +%description +Festival is a general multi-lingual speech synthesis system developed +at CSTR. It offers a full text to speech system with various APIs, as +well as an environment for development and research of speech synthesis +techniques. It is written in C++ with a Scheme-based command interpreter +for general control. + +%package devel +Summary: Development Package for Festival +License: MIT +Requires: %{name} = %{version} + +%description devel +Files needed for developing software that uses Festival. + +%prep +%setup -q -b 1 -b 2 -b 3 -b 4 -q -n festival +%patch0 -p1 +%patch1 -p1 +%patch2 +%patch3 +%patch4 +%patch5 -p1 +%patch6 -p1 +%patch7 -p1 -b .use-pacat +cd ../speech_tools +%patch101 -p1 +%patch102 +%patch103 -p1 +%patch104 -p1 + +%build +# festival +./configure --prefix=%_prefix \ + --libdir=%_libdir \ + --datadir=%_datadir/festival \ + --sysconfdir=%_sysconfdir +# speech tools +cd ../speech_tools +./configure --prefix=%_prefix \ + --libdir=%_libdir \ + --datadir=%_datadir/festival \ + --sysconfdir=%_sysconfdir +make CC="gcc -fPIC $RPM_OPT_FLAGS" CXX="g++ $RPM_OPT_FLAGS -fPIC -Wno-non-template-friend -ffriend-injection -fno-strict-aliasing" +cd ../festival +make CC="gcc -fPIC $RPM_OPT_FLAGS" CXX="g++ $RPM_OPT_FLAGS -fPIC -Wno-non-template-friend -ffriend-injection -fno-strict-aliasing" +make doc + +%install +%make_install +cd ../speech_tools +%make_install +cd ../festival +# install binarys +install -D bin/text2wave $RPM_BUILD_ROOT%_bindir/text2wave +install -m 755 bin/festival* $RPM_BUILD_ROOT%_bindir/ +install -m 755 examples/saytime $RPM_BUILD_ROOT%_bindir/ +# install manpages +install -D -m 644 doc/festival.1 $RPM_BUILD_ROOT%_mandir/man1/festival.1 +install -m 644 doc/festival_client.1 $RPM_BUILD_ROOT%_mandir/man1/ +install -m 644 doc/text2wave.1 $RPM_BUILD_ROOT%_mandir/man1/ +# install configs +install -D lib/festival.scm $RPM_BUILD_ROOT%_sysconfdir/festival.scm +# install dictionarys +install -D lib/dicts/cmu/cmudict-0.4.out $RPM_BUILD_ROOT%_datadir/%name/dicts/cmu/cmudict-0.4.out +install -m 644 lib/dicts/cmu/*.scm $RPM_BUILD_ROOT%_datadir/%name/dicts/cmu/ +install -m 644 lib/dicts/wsj.wp39.poslexR $RPM_BUILD_ROOT%_datadir/%name/dicts/ +install -m 644 lib/dicts/wsj.wp39.tri.ngrambin $RPM_BUILD_ROOT%_datadir/%name/dicts/ +# install voices +mkdir -p $RPM_BUILD_ROOT/usr/share/festival/voices/english/kal_diphone/festvox +mkdir -p $RPM_BUILD_ROOT/usr/share/festival/voices/english/kal_diphone/group +cp lib/voices/english/kal_diphone/group/* $RPM_BUILD_ROOT/usr/share/festival/voices/english/kal_diphone/group/ +cp lib/voices/english/kal_diphone/festvox/*.scm $RPM_BUILD_ROOT/usr/share/festival/voices/english/kal_diphone/festvox +# install data +cp lib/*.scm $RPM_BUILD_ROOT/usr/share/festival/ +cp lib/*.ngrambin $RPM_BUILD_ROOT/usr/share/festival/ +cp lib/*.gram $RPM_BUILD_ROOT/usr/share/festival/ +cp lib/*.el $RPM_BUILD_ROOT/usr/share/festival/ +install -D lib/etc/unknown_Linux/audsp $RPM_BUILD_ROOT/usr/lib/festival/audsp +# install libs +install -D src/lib/libFestival.a $RPM_BUILD_ROOT/%_libdir/libFestival.a +# install includes +mkdir -p $RPM_BUILD_ROOT%_includedir/ +install -m 644 src/include/*.h $RPM_BUILD_ROOT%_includedir/ +cd ../speech_tools +# install includes +mkdir -p $RPM_BUILD_ROOT%_includedir/instantiate +mkdir -p $RPM_BUILD_ROOT%_includedir/ling_class +mkdir -p $RPM_BUILD_ROOT%_includedir/rxp +mkdir -p $RPM_BUILD_ROOT%_includedir/sigpr +mkdir -p $RPM_BUILD_ROOT%_includedir/unix +install -m 644 include/*h $RPM_BUILD_ROOT%_includedir +install -m 644 include/instantiate/*h $RPM_BUILD_ROOT%_includedir/instantiate +install -m 644 include/ling_class/*h $RPM_BUILD_ROOT%_includedir/ling_class +install -m 644 include/rxp/*h $RPM_BUILD_ROOT%_includedir/rxp +install -m 644 include/sigpr/*h $RPM_BUILD_ROOT%_includedir/sigpr +install -m 644 include/unix/*h $RPM_BUILD_ROOT%_includedir/unix +# install libs +install -m 644 lib/lib*.a $RPM_BUILD_ROOT%_libdir +# install init script +# install -m 755 -D %{S:6} $RPM_BUILD_ROOT/etc/init.d/%name +# install -d $RPM_BUILD_ROOT%_sbindir +# ln -sf ../../etc/init.d/%name $RPM_BUILD_ROOT/usr/sbin/rc%name +# installl sysconfig file +#install -m 644 -D %{S:5} $RPM_BUILD_ROOT/var/adm/fillup-templates/sysconfig.%name + +%clean +rm -rf $RPM_BUILD_ROOT + +%files +%defattr(-,root,root) +%doc COPYING README INSTALL examples/*.text examples/ex1.* examples/*.scm examples/*.dtd +%_sysconfdir/festival.scm +#%_sysconfdir/init.d/%name +%_bindir/festival +%_bindir/festival_client +%_bindir/festival_server +%_bindir/festival_server_control +%_bindir/text2wave +%_bindir/saytime +%_prefix/lib/festival +%_datadir/festival +%_mandir/man1/* + +%files devel +%defattr(-,root,root) +%_includedir/* +%_libdir/lib*.a diff --git a/packaging/festlex_CMU.tar.gz b/packaging/festlex_CMU.tar.gz new file mode 100644 index 0000000..077035c Binary files /dev/null and b/packaging/festlex_CMU.tar.gz differ diff --git a/packaging/festlex_POSLEX.tar.gz b/packaging/festlex_POSLEX.tar.gz new file mode 100644 index 0000000..0277be2 Binary files /dev/null and b/packaging/festlex_POSLEX.tar.gz differ diff --git a/packaging/festvox_kallpc16k.tar.gz b/packaging/festvox_kallpc16k.tar.gz new file mode 100644 index 0000000..8ef11de Binary files /dev/null and b/packaging/festvox_kallpc16k.tar.gz differ diff --git a/packaging/rcfestival b/packaging/rcfestival new file mode 100644 index 0000000..c3acb11 --- /dev/null +++ b/packaging/rcfestival @@ -0,0 +1,125 @@ +#! /bin/sh +# Copyright (c) 2006 Andreas Schneider . +# All rights reserved. +# +# This file and all modifications and additions to the pristine +# package are under the same license as the package itself. +# +# /etc/init.d/festival +# and its symbolic link +# /usr/sbin/rcfestival +# +### BEGIN INIT INFO +# Provides: festival +# Required-Start: $syslog $remote_fs +# Should-Start: $time +# Required-Stop: $syslog $remote_fs +# Should-Stop: $time +# Default-Start: 3 5 +# Default-Stop: 0 1 2 6 +# Short-Description: festival daemon providing full text-to-speech system +# Description: Start festival to allow applications to access a +# text-to-speech system with various APIs, as well as an environment +# for development and research of speech synthesis techniques. It is +# written in C++ and has a Scheme-based command interpreter for +# general control. +### END INIT INFO + +# Check for missing binaries (stale symlinks should not happen) +# Note: Special treatment of stop for LSB conformance +FESTIVAL_PID=/var/run/festival.pid +FESTIVAL_BIN=/usr/bin/festival +test -x $FESTIVAL_BIN || { echo "$FESTIVAL_BIN not installed"; + if [ "$1" = "stop" ]; then exit 0; + else exit 5; fi; } + +# Check for existence of needed config file and read it +FESTIVAL_CONFIG=/etc/sysconfig/festival +test -r $FESTIVAL_CONFIG || { echo "$FESTIVAL_CONFIG not existing"; + if [ "$1" = "stop" ]; then exit 0; + else exit 6; fi; } + +# Read config +. $FESTIVAL_CONFIG + +# Source LSB init functions +. /etc/rc.status + +FESTIVAL_OPTIONS="" + +function prepare_chroot +{ + for configfile in /etc/festival.scm $FESTIVAL_CHROOT_FILES; do + test -d ${CHROOT_PREFIX}/${configfile%/*} || mkdir -p ${CHROOT_PREFIX}/${configfile%/*} + cp -auL ${configfile} ${CHROOT_PREFIX}/${configfile%/*} + done + FESTIVAL_OPTIONS="${FESTIVAL_OPTIONS} --chroot ${CHROOT_PREFIX}" +} + +if( test "${FESTIVAL_RUN_CHROOTED}" = "yes" ) +then + FESTIVAL_OPTIONS="$FESTIVAL_OPTIONS --libdir / --uid festival" + CHROOT_PREFIX="/usr/share/festival/" + prepare_chroot +else +# FESTIVAL_OPTIONS="$FESTIVAL_OPTIONS" + CHROOT_PREFIX="" +fi + + +# Reset status of this service +rc_reset + +case "$1" in + start) + echo -n "Starting festival " + + /sbin/startproc -p $FESTIVAL_PID $FESTIVAL_BIN $FESTIVAL_OPTIONS --server >/dev/null 2>&1 + rc_status -v + ;; + stop) + echo -n "Shutting down festival " + /sbin/killproc -TERM $FESTIVAL_BIN + rc_status -v + ;; + try-restart|condrestart) + if test "$1" = "condrestart"; then + echo "${attn} Use try-restart ${done}(LSB)${attn} rather than condrestart ${warn}(RH)${norm}" + fi + $0 status + if test $? = 0; then + $0 restart + else + rc_reset # Not running is not a failure. + fi + rc_status + ;; + restart) + $0 stop + $0 start + rc_status + ;; + force-reload) + echo -n "Reload service festival " + /sbin/killproc -HUP $FESTIVAL_BIN + rc_status -v + ;; + reload) + echo -n "Reload service festival " + /sbin/killproc -HUP $FESTIVAL_BIN + rc_status -v + ;; + status) + echo -n "Checking for service festival " + /sbin/checkproc $FESTIVAL_BIN + rc_status -v + ;; + probe) + test /etc/festival.scm -nt /var/run/festival.pid && echo reload + ;; + *) + echo "Usage: $0 {start|stop|status|try-restart|restart|force-reload|reload|probe}" + exit 1 + ;; +esac +rc_exit diff --git a/packaging/speech_tools-1.2.95-config.patch b/packaging/speech_tools-1.2.95-config.patch new file mode 100644 index 0000000..85f6cdc --- /dev/null +++ b/packaging/speech_tools-1.2.95-config.patch @@ -0,0 +1,11 @@ +--- config/config.in ++++ config/config.in +@@ -57,7 +57,7 @@ + ## OPTIMISE=4 will turn off DEBUG + + OPTIMISE=3 +-WARN=1 ++# WARN=1 + # VERBOSE=1 + #DEBUG=1 + # PROFILE=gprof diff --git a/packaging/speech_tools-2.1-release.tar.gz b/packaging/speech_tools-2.1-release.tar.gz new file mode 100644 index 0000000..e8cf2b8 Binary files /dev/null and b/packaging/speech_tools-2.1-release.tar.gz differ diff --git a/packaging/speech_tools-gcc47.patch b/packaging/speech_tools-gcc47.patch new file mode 100644 index 0000000..facd3c1 --- /dev/null +++ b/packaging/speech_tools-gcc47.patch @@ -0,0 +1,89 @@ +Index: speech_tools/base_class/EST_TSimpleMatrix.cc +=================================================================== +--- speech_tools.orig/base_class/EST_TSimpleMatrix.cc ++++ speech_tools/base_class/EST_TSimpleMatrix.cc +@@ -44,6 +44,7 @@ + #include "EST_TVector.h" + #include + #include ++#include + #include "EST_cutils.h" + + template +@@ -98,7 +99,7 @@ void EST_TSimpleMatrix::resize(int ne + { + int copy_r = Lof(this->num_rows(), new_rows); + +- just_resize(new_rows, new_cols, &old_vals); ++ this->just_resize(new_rows, new_cols, &old_vals); + + for (q=0; q<(copy_r*new_cols*sizeof(T)); q++) /* memcpy */ + ((char *)this->p_memory)[q] = ((char *)old_vals)[q]; +@@ -127,9 +128,9 @@ void EST_TSimpleMatrix::resize(int ne + int copy_r = Lof(this->num_rows(), new_rows); + int copy_c = Lof(this->num_columns(), new_cols); + +- just_resize(new_rows, new_cols, &old_vals); ++ this->just_resize(new_rows, new_cols, &old_vals); + +- set_values(old_vals, ++ this->set_values(old_vals, + old_row_step, old_column_step, + 0, copy_r, + 0, copy_c); +Index: speech_tools/base_class/EST_TSimpleVector.cc +=================================================================== +--- speech_tools.orig/base_class/EST_TSimpleVector.cc ++++ speech_tools/base_class/EST_TSimpleVector.cc +@@ -44,6 +44,7 @@ + #include "EST_matrix_support.h" + #include + #include "EST_cutils.h" ++#include + + template void EST_TSimpleVector::copy(const EST_TSimpleVector &a) + { +@@ -70,7 +71,7 @@ template void EST_TSimpleVector + int old_offset = this->p_offset; + unsigned int q; + +- just_resize(newn, &old_vals); ++ this->just_resize(newn, &old_vals); + + if (set && old_vals) + { +Index: speech_tools/include/EST_TIterator.h +=================================================================== +--- speech_tools.orig/include/EST_TIterator.h ++++ speech_tools/include/EST_TIterator.h +@@ -209,7 +209,7 @@ public: + + /// Create an iterator ready to run over the given container. + EST_TStructIterator(const Container &over) +- { begin(over); } ++ { this->begin(over); } + + const Entry *operator ->() const + {return &this->current();} +@@ -289,7 +289,7 @@ public: + + /// Create an iterator ready to run over the given container. + EST_TRwStructIterator(Container &over) +- { begin(over); } ++ { this->begin(over); } + + Entry *operator ->() const + {return &this->current();} +Index: speech_tools/include/EST_TNamedEnum.h +=================================================================== +--- speech_tools.orig/include/EST_TNamedEnum.h ++++ speech_tools/include/EST_TNamedEnum.h +@@ -130,7 +130,7 @@ public: + {this->initialise((const void *)defs); }; + EST_TNamedEnumI(EST_TValuedEnumDefinition defs[], ENUM (*conv)(const char *)) + {this->initialise((const void *)defs, conv); }; +- const char *name(ENUM tok, int n=0) const {return value(tok,n); }; ++ const char *name(ENUM tok, int n=0) const {return this->value(tok,n); }; + + }; + diff --git a/packaging/speech_tools-no-LD_LIBRARY_PATH-extension.patch b/packaging/speech_tools-no-LD_LIBRARY_PATH-extension.patch new file mode 100644 index 0000000..efe4e61 --- /dev/null +++ b/packaging/speech_tools-no-LD_LIBRARY_PATH-extension.patch @@ -0,0 +1,69 @@ +Index: speech_tools/scripts/shared_script +=================================================================== +--- speech_tools.orig/scripts/shared_script ++++ speech_tools/scripts/shared_script +@@ -1,24 +1,5 @@ + #!/bin/sh + +-# EST shared script +- +-extend() { +- var="$1" +- extra="$2" +- eval "val=\$$var" +- +- if [ -n "$val" ] +- then +- val="$extra:$val" +- else +- val="$extra" +- fi +- eval "$var='$val'" +- eval "export $var" +- } +- +-extend __LDVAR__ "__LIB__:__LDPATH__" +- + exec __MAIN__/__PROGRAM__ "$@" + + exit 0 +Index: speech_tools/scripts/shared_setup_prl +=================================================================== +--- speech_tools.orig/scripts/shared_setup_prl ++++ speech_tools/scripts/shared_setup_prl +@@ -1,10 +1,2 @@ + +-if (defined($ENV{LD_LIBRARY_PATH})) +- { +- $ENV{LD_LIBRARY_PATH} = "__TOP__/lib:__LDPATH__:$ENV{LD_LIBRARY_PATH}"; +- } +-else +- { +- $ENV{LD_LIBRARY_PATH} = "__TOP__/lib"; +- } + +Index: speech_tools/scripts/shared_setup_sh +=================================================================== +--- speech_tools.orig/scripts/shared_setup_sh ++++ speech_tools/scripts/shared_setup_sh +@@ -1,20 +1,2 @@ + +-# EST shared setup +- +-extend() { +- var="$1" +- extra="$2" +- eval "val=\$$var" +- +- if [ -n "$val" ] +- then +- val="$extra:$val" +- else +- val="$extra" +- fi +- eval "$var='$val'" +- eval "export $var" +- } +- +-extend LD_LIBRARY_PATH "__TOP__/lib:__LDPATH__" + diff --git a/packaging/speech_tools-undefined-operation.patch b/packaging/speech_tools-undefined-operation.patch new file mode 100644 index 0000000..34659b0 --- /dev/null +++ b/packaging/speech_tools-undefined-operation.patch @@ -0,0 +1,12 @@ +--- speech_tools/base_class/rateconv.cc ++++ speech_tools/base_class/rateconv.cc +@@ -384,7 +384,8 @@ + } + fir_stereo(inp + inoffset + inbaseidx, + coep + cycctr * firlen, firlen, +- outp + outidx++, outp + outidx++); ++ outp + outidx, outp + outidx + 1); ++ outidx += 2; + cycctr++; + if (!(cycctr %= up)) + inbaseidx += 2*down; diff --git a/packaging/sysconfig.festival b/packaging/sysconfig.festival new file mode 100644 index 0000000..5d38ac3 --- /dev/null +++ b/packaging/sysconfig.festival @@ -0,0 +1,27 @@ +## Path: Network/Sound/festival +## Description: Options for the festival server + +## Type: string +## Default "" +# +FESTIVAL_OPTIONS="" + +## Type: yesno +## Default: yes +## ServiceRestart: ntp +# +# Shall the festival server run in the chroot jail /var/lib/festival? +# +# Each time you start festival with the init script, /etc/festival.conf will be +# copied to /var/lib/festival/etc/. +# +FESTIVAL_RUN_CHROOTED="yes" + +## Type: string +## Default: "" +## ServiceRestart: ntp +# +# If the festival server runs in the chroot jail these files will be +# copied to /var/lib/festival/ besides the default of /etc/festival.scm +# +FESTIVAL_CHROOT_FILES="" diff --git a/src/Makefile b/src/Makefile new file mode 100644 index 0000000..71ff7bc --- /dev/null +++ b/src/Makefile @@ -0,0 +1,54 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## ## +## Makefile for main src directory ## +## ## +########################################################################### +TOP=.. +DIRNAME=src +LIB_BUILD_DIRS = arch modules +BUILD_DIRS = $(LIB_BUILD_DIRS) main +ALL_DIRS = include $(BUILD_DIRS) lib scripts + +FILES=Makefile + +ALL = .sub_directories + +include $(TOP)/config/common_make_rules + +tags: + @ rm -f $(TOP)/FileList + @ $(MAKE) --no-print-directory file-list + @ etags `grep "src/.*[ch]c*$$" $(TOP)/FileList | sed 's/src\///'` + + diff --git a/src/arch/Makefile b/src/arch/Makefile new file mode 100644 index 0000000..3720d2a --- /dev/null +++ b/src/arch/Makefile @@ -0,0 +1,49 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## ## +## Makefile for architecture directory ## +## ## +########################################################################### +TOP=../.. +DIRNAME=src/arch +#BUILD_DIRS = festival unitdb +BUILD_DIRS = festival +ALL_DIRS = $(BUILD_DIRS) + +FILES=Makefile + +ALL = $(BUILD_DIRS) + +include $(TOP)/config/common_make_rules + + diff --git a/src/arch/festival/Makefile b/src/arch/festival/Makefile new file mode 100644 index 0000000..fe49726 --- /dev/null +++ b/src/arch/festival/Makefile @@ -0,0 +1,67 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +# # +# Makefile for festival directory # +# # +# Contains basic interface functions, initialization, and functions for # +# using various models # +# # +########################################################################### +TOP=../../.. +DIRNAME=src/arch/festival +H = festivalP.h +SRCS = festival.cc Phone.cc utterance.cc features.cc \ + wave.cc wagon_interp.cc linreg.cc \ + audspio.cc server.cc client.cc web.cc tcl.cc \ + wfst.cc ngram.cc viterbi.cc \ + ModuleDescription.cc $(TSRCS) + +OBJS = $(SRCS:.cc=.o) + +FILES = Makefile $(SRCS) $(H) + +#LOCAL_TEMPLATES = -pti. + +INLIB = $(TOP)/src/lib/libFestival.a + +ALL = .buildlib + +include $(TOP)/config/common_make_rules + +LOCAL_DEFINES += $(FESTIVAL_DEFINES) +LOCAL_INCLUDES += $(FESTIVAL_INCLUDES) + +festival.o: festival.cc + $(CXX_COMMAND_TEMPLATES) -DFTNAME='$(PROJECT_NAME)' -DFTLIBDIRC='$(FTLIBDIR)' -DFTVERSION='$(PROJECT_VERSION)' -DFTSTATE='$(PROJECT_STATE)' -DFTDATE='$(PROJECT_DATE)' -DFTOSTYPE=\"$(SYSTEM_TYPE)\" festival.cc + + diff --git a/src/arch/festival/ModuleDescription.cc b/src/arch/festival/ModuleDescription.cc new file mode 100644 index 0000000..f6afdd7 --- /dev/null +++ b/src/arch/festival/ModuleDescription.cc @@ -0,0 +1,153 @@ + /************************************************************************/ + /* */ + /* Centre for Speech Technology Research */ + /* University of Edinburgh, UK */ + /* Copyright (c) 1996,1997 */ + /* All Rights Reserved. */ + /* */ + /* Permission is hereby granted, free of charge, to use and distribute */ + /* this software and its documentation without restriction, including */ + /* without limitation the rights to use, copy, modify, merge, publish, */ + /* distribute, sublicense, and/or sell copies of this work, and to */ + /* permit persons to whom this work is furnished to do so, subject to */ + /* the following conditions: */ + /* 1. The code must retain the above copyright notice, this list of */ + /* conditions and the following disclaimer. */ + /* 2. Any modifications must be clearly marked as such. */ + /* 3. Original authors' names are not deleted. */ + /* 4. The authors' names are not used to endorse or promote products */ + /* derived from this software without specific prior written */ + /* permission. */ + /* */ + /* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ + /* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ + /* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ + /* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ + /* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ + /* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ + /* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ + /* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ + /* THIS SOFTWARE. */ + /* */ + /*************************************************************************/ + /* */ + /* Author: Richard Caley (rjc@cstr.ed.ac.uk) */ + /* Date: Tue Aug 26 1997 */ + /* -------------------------------------------------------------------- */ + /* Machine readable description of a module. */ + /* */ + /*************************************************************************/ + +#include +#include "siod.h" +#include "ModuleDescription.h" + +// to make life easier +static inline EST_String S(const char *s) { return EST_String(s); } + +struct ModuleDescription *ModuleDescription::create() +{ + struct ModuleDescription *desc = new ModuleDescription; + + desc->name = ""; + desc->version = 0.0; + desc->organisation = ""; + desc->author = ""; + desc->description[0] = NULL; + desc->input_streams[0].name = NULL; + desc->optional_streams[0].name = NULL; + desc->output_streams[0].name = NULL; + desc->parameters[0].name = NULL; + + return desc; +} + +EST_String ModuleDescription::to_string(const ModuleDescription &desc) +{ + EST_String s; + char buf[10]; + int i; + + if (desc.input_streams[0].name + || desc.optional_streams[0].name + || desc.output_streams[0].name) + { + s += S("(") + desc.name + S(" UTT"); + for(i=0; i +#include "EST_unix.h" +#include "festival.h" +#include "festivalP.h" + +static void check_phoneset(void); +static void ps_add_def(const EST_String &name, PhoneSet *ps); +static LISP lisp_select_phoneset(LISP phoneset); + +static LISP phone_set_list = NULL; +static PhoneSet *current_phoneset = NULL; + +static const EST_String f_cvox("cvox"); +static const EST_String f_vc("vc"); +static const EST_String f_ctype("ctype"); + +VAL_REGISTER_CLASS(phone,Phone) +SIOD_REGISTER_CLASS(phone,Phone) +VAL_REGISTER_CLASS(phoneset,PhoneSet) +SIOD_REGISTER_CLASS(phoneset,PhoneSet) + +int Phone::match_features(Phone *foreign) +{ + // Try to match the features of the foreign phone with this one + EST_Litem *f; + + for (f=features.list.head(); f != 0; f=f->next()) + { + if ( features.list(f).v != foreign->val(features.list(f).k)) + return FALSE; + } + + return TRUE; + +} + +PhoneSet::~PhoneSet() +{ + gc_unprotect(&silences); + gc_unprotect(&map); + gc_unprotect(&feature_defs); + gc_unprotect(&phones); +} + +Phone * PhoneSet::member(const EST_String &ph) const +{ + LISP p = siod_assoc_str(ph,phones); + if (p != 0) + return phone(car(cdr(p))); + else + { + cerr << "Phone \"" << ph << "\" not member of PhoneSet \"" << + psetname << "\"" << endl; + return 0; + } +} + +const char *PhoneSet::phnum(const int n) const +{ + // return the nth phone name + int i; + LISP p; + + // Should use nth for this, but I want to controll the error case + for (i=0,p=phones; p != NIL; p=cdr(p),i++) + { + if (i == n) + return get_c_string(car(car(p))); + } + + cerr << "Phone (phnum) " << n << + " too large, not that many members in PhoneSet \"" << + psetname << "\"" << endl; + festival_error(); + return NULL; +} + +int PhoneSet::add_phone(Phone *phone) +{ + // Add phone (deleting existing one with warning) + LISP lpair; + + lpair = siod_assoc_str(phone->phone_name(),phones); + + if (lpair == NIL) + { + phones = cons(make_param_lisp(phone->phone_name(), + siod(phone)), + phones); + return TRUE; + } + else + return FALSE; +} + +void PhoneSet::set_feature(const EST_String &name, LISP vals) +{ + LISP lpair; + + lpair = siod_assoc_str(name, feature_defs); + + if (lpair == NIL) + feature_defs = cons(make_param_lisp(name,vals),feature_defs); + else + { + cerr << "PhoneSet: replacing feature definition of " << + name << " PhoneSet " << psetname << endl; + CAR(cdr(lpair)) = vals; + } +} + +int PhoneSet::phnum(const char *phone) const +{ + // Return a unique number for this phone, i.e. position in list + int i; + LISP p; + + for (i=0,p=phones; p != NIL; p=cdr(p),i++) + { + if (streq(phone,get_c_string(car(car(p))))) + return i; + } + + cerr << "Phone \"" << phone << "\" not member of PhoneSet \"" << + psetname << "\"" << endl; + festival_error(); + + return -1; +} + +Phone *PhoneSet::find_matched_phone(Phone *foreign) +{ + // find a phone in the current set that matches the features + // in foreign (from a different phoneset) + LISP p; + + for (p=phones; p != NIL; p=cdr(p)) + { + if (phone(car(cdr(car(p))))->match_features(foreign)) + return phone(car(cdr(car(p)))); + } + + // could try harder + + cerr << "Cannot map phoneme " << *foreign << endl; + festival_error(); + + return 0; + +} + +int PhoneSet::is_silence(const EST_String &ph) const +{ + // TRUE is ph is silence + + return (siod_member_str(ph,silences) != NIL); + +} + +void PhoneSet::set_silences(LISP sils) +{ + silences=sils; +} + +void PhoneSet::set_map(LISP m) +{ + map=m; +} + +PhoneSet &PhoneSet::operator = (const PhoneSet &a) +{ + psetname = a.psetname; + silences = a.silences; + map = a.map; + feature_defs = a.feature_defs; + phones = a.phones; + return *this; +} + +LISP make_phoneset(LISP args,LISP env) +{ + // define a new phoneme set + (void)env; + PhoneSet *ps = new PhoneSet; + Phone *phone; + LISP f,p,pv; + LISP name, features, phones; + EST_String feat,val; + int num_feats; + + name = car(args); + features = car(cdr(args)); + phones = car(cdr(cdr(args))); + + ps->set_phone_set_name(get_c_string(name)); + // Define the phonetic features + num_feats = siod_llength(features); + for (f=features; f != NIL; f=cdr(f)) + ps->set_feature(get_c_string(car(car(f))),cdr(car(f))); + + // Define the phones + for (p=phones; p != NIL; p=cdr(p)) + { + if (siod_llength(cdr(car(p))) != num_feats) + { + cerr << "Wrong number of phone features for " + << get_c_string(car(car(p))) << " in " << + get_c_string(name) << endl; + festival_error(); + } + phone = new Phone; + phone->set_phone_name(get_c_string(car(car(p)))); + for (pv=cdr(car(p)),f=features; f != NIL; pv=cdr(pv),f=cdr(f)) + { + feat = get_c_string(car(car(f))); + val = get_c_string(car(pv)); + if (ps->feat_val(feat,val)) + phone->add_feat(feat,val); + else + { + cerr << "Phone " << phone->phone_name() << + " has invalid value " + << get_c_string(car(pv)) << " for feature " + << feat << endl; + festival_error(); + } + } + if (ps->add_phone(phone) == FALSE) + { + cerr << "Phone " << phone->phone_name() << + " multiply defined " << endl; + festival_error(); + } + } + + ps_add_def(ps->phone_set_name(),ps); + current_phoneset = ps; // selects this one as current + + return NIL; +} + +static LISP lisp_set_silence(LISP silences) +{ + // Set list of names as silences for current phoneset + + check_phoneset(); + current_phoneset->set_silences(silences); + return silences; +} + +PhoneSet *phoneset_name_to_set(const EST_String &name) +{ + LISP lpair = siod_assoc_str(name,phone_set_list); + + if (lpair == NIL) + { + cerr << "Phoneset " << name << " not defined" << endl; + festival_error(); + } + + return phoneset(car(cdr(lpair))); + +} + +static LISP lisp_select_phoneset(LISP pset) +{ + // Select named phoneset and make it current + EST_String name = get_c_string(pset); + LISP lpair; + + lpair = siod_assoc_str(name,phone_set_list); + + if (lpair == NIL) + { + cerr << "Phoneset " << name << " not defined" << endl; + festival_error(); + } + else + current_phoneset = phoneset(car(cdr(lpair))); + + return pset; +} + +static void ps_add_def(const EST_String &name, PhoneSet *ps) +{ + // Add phoneset to list of phonesets + LISP lpair; + + if (phone_set_list == NIL) + gc_protect(&phone_set_list); + + lpair = siod_assoc_str(name,phone_set_list); + + if (lpair == NIL) + { + phone_set_list = cons(cons(rintern(name), + cons(siod(ps),NIL)), + phone_set_list); + } + else + { + cwarn << "Phoneset \"" << name << "\" redefined" << endl; + setcar(cdr(lpair),siod(ps)); + } + + return; +} + +static void check_phoneset(void) +{ + // check if there is a phoneset defined + + if (current_phoneset == NULL) + { + cerr << "No phoneset currently selected"; + festival_error(); + } +} + +static EST_Val ff_ph_feature(EST_Item *s,const EST_String &name) +{ + // This function is called for all phone features. + // It looks at the name to find out the + // the actual name used to call this feature and removed the + // ph_ prefix and uses the remainder as the phone feature name + Phone *phone_def; + + if (!name.contains("ph_",0)) + { + cerr << "Not a phone feature function " << name << endl; + festival_error(); + } + + check_phoneset(); + + const EST_String &fname = name.after("ph_"); + phone_def = current_phoneset->member(s->name()); + if (phone_def == 0) + { + cerr << "Phone " << s->name() << " not in PhoneSet \"" << + current_phoneset->phone_set_name() << "\"" << endl; + festival_error(); + } + + const EST_String &rrr = phone_def->val(fname,EST_String::Empty); + if (rrr == EST_String::Empty) + { + cerr << "Phone " << s->name() << " does not have feature " << + fname << endl; + festival_error(); + } + + return EST_Val(rrr); +} + +static PhoneSet *find_phoneset(EST_String name) +{ + // get the phone set from the phone set list + LISP lpair; + + lpair = siod_assoc_str(name,phone_set_list); + + if (lpair == NIL) + { + cerr << "Phoneset \"" << name << "\" not defined" << endl; + festival_error(); + } + return phoneset(car(cdr(lpair))); + +} + +const EST_String &map_phone(const EST_String &fromphonename, const EST_String &fromsetname, + const EST_String &tosetname) +{ + PhoneSet *fromset, *toset; + Phone *fromphone, *tophone; + + fromset = find_phoneset(fromsetname); + toset = find_phoneset(tosetname); + + // should check specific matches in fromset first + fromphone = fromset->member(fromphonename); + if (fromphone == 0) + festival_error(); + tophone = toset->find_matched_phone(fromphone); + + return tophone->phone_name(); +} + +int ph_is_silence(const EST_String &ph) +{ + // TRUE if this phone is silence + + check_phoneset(); + return current_phoneset->is_silence(ph); + +} + +EST_String ph_silence(void) +{ + // return the first silence in the current_phoneset + EST_String s; + + check_phoneset(); + + if (current_phoneset->get_silences() == NIL) + { + cerr << "No silences set for PhoneSet\"" << + current_phoneset->phone_set_name() << "\"" << endl; + return "sil"; + } + else + return get_c_string(car(current_phoneset->get_silences())); + +} + +int ph_is_vowel(const EST_String &ph) +{ + // TRUE if this phone is a vowel, assumes the feature vc is used + + return (ph_feat(ph,f_vc) == "+"); +} + +int ph_is_consonant(const EST_String &ph) +{ + // TRUE if this phone is a consonant, assumes the feature vc is used + + return ((ph_feat(ph,f_vc) == "-") && + !(ph_is_silence(ph))); +} + +int ph_is_liquid(const EST_String &ph) +{ + // TRUE if this phone is a liquid + + return (ph_feat(ph,f_ctype) == "l"); +} + +int ph_is_approximant(const EST_String &ph) +{ + // TRUE if this phone is an approximant + + return (ph_feat(ph,f_ctype) == "r"); +} + +int ph_is_stop(const EST_String &ph) +{ + // TRUE if this phone is a stop + + return (ph_feat(ph,f_ctype) == "s"); +} + +int ph_is_fricative(const EST_String &ph) +{ + // TRUE if this phone is a stop + + return (ph_feat(ph,f_ctype) == "f"); +} + +int ph_is_nasal(const EST_String &ph) +{ + // TRUE if this phone is a nasal + + return (ph_feat(ph,f_ctype) == "n"); +} + +int ph_is_obstruent(const EST_String &ph) +{ + // TRUE if this phone is a obstruent + EST_String v = ph_feat(ph,f_ctype); + + return ((v == "s") || (v == "f") || (v == "a")); +} + +int ph_is_sonorant(const EST_String &ph) +{ + // TRUE if this phone is a sonorant + + return !ph_is_obstruent(ph); +} + +int ph_is_voiced(const EST_String &ph) +{ + // TRUE if this phone is a sonorant + + return (ph_feat(ph,f_cvox) == "+"); +} + +int ph_is_syllabic(const EST_String &ph) +{ + // TRUE if this phone is a syllabic consonant (or vowel) + // Yes I know we just don't have this ... + + return (ph_feat(ph,f_vc) == "+"); +} + +const EST_String &ph_feat(const EST_String &ph,const EST_String &feat) +{ + // Values for this phone -- error is phone of feat doesn't exist + Phone *phone_def; + EST_String rrr; + + check_phoneset(); + phone_def = current_phoneset->member(ph); + if (phone_def == 0) + { + cerr << "Phone " << ph << " not in phone set " << + current_phoneset->phone_set_name() << endl; + festival_error(); + } + + return phone_def->val(feat,EST_String::Empty); + +} + +int ph_sonority(const EST_String &ph) +{ + // assumes standard phone features + Phone *p; + + check_phoneset(); + p = current_phoneset->member(ph); + if (p == 0) + return 1; + + if (p->val(f_vc) == "+") + return 5; + else if (p->val(f_ctype) == "l") // || glide + return 4; + else if (p->val(f_ctype) == "n") + return 3; + else if (p->val(f_cvox) == "+") // voiced obstruents (stop fric affric) + return 2; + else + return 1; + +} + +LISP l_phoneset(LISP options) +{ + // Return Lisp form of current phone set + LISP description=NIL; + + check_phoneset(); + + if ((options == NIL) || + (siod_member_str("silences",options))) + { + description = cons(make_param_lisp("silences", + current_phoneset->get_silences()), + description); + } + if ((options == NIL) || + (siod_member_str("phones",options))) + { + LISP phones = current_phoneset->get_phones(); + LISP features = current_phoneset->get_feature_defs(); + LISP p,f,p_desc=NIL,f_desc=NIL; + + for (p=phones; p != NIL; p=cdr(p)) + { + f_desc = NIL; + for (f=reverse(features); f != NIL; f=cdr(f)) + { + f_desc = cons(rintern(ph_feat(get_c_string(car(car(p))), + get_c_string(car(car(f))))), + f_desc); + } + p_desc = cons(cons(car(car(p)),f_desc),p_desc); + } + description = cons(make_param_lisp("phones",p_desc),description); + } + if ((options == NIL) || + (siod_member_str("features",options))) + { + description = cons(make_param_lisp("features", + current_phoneset->get_feature_defs()), + description); + } + if ((options == NIL) || + (siod_member_str("name",options))) + { + description = cons(make_param_str("name", + current_phoneset->phone_set_name()), + description); + } + + return description; +} + +LISP l_phoneset_list() +{ + LISP phonesets = NIL; + LISP l; + + for (l=phone_set_list; l != NIL; l=cdr(l)) + phonesets = cons(car(car(l)),phonesets); + + return phonesets; +} + +void festival_Phone_init(void) +{ + // define Lisp accessor functions + + init_fsubr("defPhoneSet",make_phoneset, + "(defPhoneSet PHONESETNAME FEATURES PHONEDEFS)\n\ + Define a new phoneset named PHONESETNAME. Each phone is described with a\n\ + set of features as described in FEATURES. Some of these FEATURES may\n\ + be significant in various parts of the system. Copying an existing\n\ + description is a good start. [see Phonesets]"); + init_subr_1("PhoneSet.select",lisp_select_phoneset, + "(PhoneSet.select PHONESETNAME)\n\ + Select PHONESETNAME as current phoneset. [see Phonesets]"); + init_subr_1("PhoneSet.silences",lisp_set_silence, + "(PhoneSet.silences LIST)\n\ + Declare LIST of phones as silences. The first in the list should be\n\ + the \"most\" silent. [see Phonesets]"); + init_subr_1("PhoneSet.description",l_phoneset, + "(Phoneset.description OPTIONS)\n\ + Returns a lisp for of the current phoneme set. Options is a list of\n\ + parts of the definition you require. OPTIONS may include, silences,\n\ + phones, features and/or name. If nil all are returned."); + init_subr_0("PhoneSet.list",l_phoneset_list, + "(Phoneset.list)\n\ + List the names of all currently defined Phonesets."); + // All feature functions starting with "ph_" + festival_def_ff_pref("ph_","Segment",ff_ph_feature, + "Segment.ph_*\n\ + Access phoneset features for a segment. This definition covers multiple\n\ + feature functions where ph_ may be extended with any features that\n\ + are defined in the phoneset (e.g. vc, vlng, cplace etc.)."); + +} + diff --git a/src/arch/festival/audspio.cc b/src/arch/festival/audspio.cc new file mode 100644 index 0000000..636e827 --- /dev/null +++ b/src/arch/festival/audspio.cc @@ -0,0 +1,261 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : September 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Interface with the audio spooler */ +/* */ +/*=======================================================================*/ +#include +#include "EST_unix.h" +#include +#include "festival.h" +#include "festivalP.h" + +#ifdef NO_SPOOLER +void audsp_play_wave(EST_Wave *w) { cerr << "no spooler available\n"; } +LISP l_audio_mode(LISP mode) { return NIL; } +#else + +#include +#include + +static int start_sub_process(int *fds,int argc,char **argv); +static char **enargen(const char *command,int *argc); +static void audsp_send(const char *c); +static int *pipe_open(const char *command); +static void pipe_close(int *fds); + +static int *audfds; +static int audsp_num=0; +static int audsp_pid = 0; + +void audsp_play_wave(EST_Wave *w) +{ + EST_String tpref = make_tmp_filename(); + char *tmpfilename = walloc(char,tpref.length()+20); + sprintf(tmpfilename,"%s_aud_%05d",(const char *)tpref,audsp_num++); + w->save(tmpfilename,"nist"); + audsp_send(EST_String("play ")+tmpfilename+EST_String(" ")+ + itoString(w->sample_rate())); + wfree(tmpfilename); +} + +static void audsp_send(const char *c) +{ + char reply[4]; + int pid; + int statusp; + + pid = waitpid((pid_t)audsp_pid,&statusp,WNOHANG); + if (pid != 0) + { + cerr << "Audio spooler has died unexpectedly" << endl; + audsp_mode = FALSE; + festival_error(); + } + + write(audfds[0],c,strlen(c)); + write(audfds[0],"\n",1); + read(audfds[1],reply,3); /* confirmation */ +} + +LISP l_audio_mode(LISP mode) +{ + // Switch audio mode + LISP audio=NIL; + LISP command=NIL; + + if (mode == NIL) + { + cerr << "audio_mode: nil is not a valid mode\n"; + festival_error(); + } + else if (streq("async",get_c_string(mode))) + { // Asynchronous mode using the audio spooler. + if (audsp_mode == FALSE) + { + audio = ft_get_param("Audio_Method"); + command = ft_get_param("Audio_Command"); + audfds = pipe_open("audsp"); + if (audio != NIL) + audsp_send(EST_String("method ")+get_c_string(audio)); + if (command != NIL) + { + // command needs to be a single line so delete an newlines + EST_String flattened = get_c_string(command); + flattened.gsub("\\\n"," "); + flattened.gsub("\n"," "); + audsp_send(EST_String("command ")+flattened); + } + if ((audio = ft_get_param("Audio_Required_Rate")) != NIL) + audsp_send(EST_String("rate ")+get_c_string(audio)); + if ((audio = ft_get_param("Audio_Required_Format")) != NIL) + audsp_send(EST_String("otype ")+get_c_string(audio)); + if ((audio = ft_get_param("Audio_Device")) != NIL) + audsp_send(EST_String("device ")+get_c_string(audio)); + audsp_mode = TRUE; + } + } + else if (streq("sync",get_c_string(mode))) + { + // Synchronous mode + if (audsp_mode) + pipe_close(audfds); + audsp_mode = FALSE; + } + else if (streq("shutup",get_c_string(mode))) + { + if (audsp_mode) + audsp_send("shutup"); + else + { + cerr << "audio_mode: not in async mode, can't shutup\n"; + festival_error(); + } + } + else if (streq("close",get_c_string(mode))) + { // return only when queue is empty + if (audsp_mode) + audsp_send("close"); + } + else if (streq("query",get_c_string(mode))) + { + if (audsp_mode) + audsp_send("query"); + else + { + cerr << "audio_mode: not in async mode, can't query\n"; + festival_error(); + } + } + else + { + cerr << "audio_mode: unknown mode \"" << get_c_string(mode) << + "\"\n"; + festival_error(); + } + + return mode; +} + +static void pipe_close(int *fds) +{ + // Close down the pipes + close(fds[0]); + close(fds[1]); +} + +static int *pipe_open(const char *command) +{ + // Starts a subprocess with its stdin and stdout bounad to pipes + // the ends of which are returned in an array + int argc; + char **argv; + int *fds; + + argv = enargen(command,&argc); + fds = walloc(int,2); + + if (start_sub_process(fds,argc,argv) != 0) + { + cerr << "pipe_open: failed to start subprocess: \n" << endl; + cerr << "pipe_open: \"" << command << "\"\n"; + festival_error(); + } + + return fds; +} + +static int start_sub_process(int *fds, int argc, char **argv) +{ + // start sub_process with stdin and stdout bound to pipes whose ends + // are in fds[0] and fds[1] + int pid; + int in[2]; + int out[2]; + (void)argc; + + if ((pipe(in) != 0) || + (pipe(out) != 0)) + { + cerr << "pipe_open: failed to open pipes\n"; + festival_error(); + } + + switch (pid=fork()) + { + case 0: /* child */ + close(in[1]); /* close the end child isn't using */ + dup2(in[0],0); /* reassign stdin to the pipe */ + close(out[0]); + dup2(out[1],1); /* reassign stdout to the pipe */ + execvp(argv[0],argv); + cerr << "pipe_open: failed to start " << argv[0] << endl; + exit(-1); /* should only get here on failure */ + case -1: + cerr << "pipe_open: fork failed\n"; + festival_error(); + default: /* parent */ + close(in[0]); /* Close unused sides of the pipes */ + close(out[1]); + fds[0] = in[1]; + fds[1] = out[0]; + } + + audsp_pid = pid; + return 0; +} + +static char **enargen(const char *command,int *argc) +{ + EST_TokenStream ts; + char **argv; + int i; + + ts.open_string(command); + for (i=0; ts.get() != ""; i++); + ts.close(); + *argc = i; + + argv = walloc(char *,i+1); + ts.open_string(command); + for (i=0; i < *argc; i++) + argv[i] = wstrdup(ts.get().string()); + argv[i] = 0; + + return argv; +} + +#endif diff --git a/src/arch/festival/client.cc b/src/arch/festival/client.cc new file mode 100644 index 0000000..486a585 --- /dev/null +++ b/src/arch/festival/client.cc @@ -0,0 +1,100 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan Black (with Richard Tobin) */ +/* Date : December 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Low level client functions, separated from the other scoket functions */ +/* so things that link with these don't need the whole system */ +/* */ +/*=======================================================================*/ +#include +#include +#include +#include +#include +#include +#include "EST_unix.h" +#include "EST_socket.h" +#include "festival.h" +#include "festivalP.h" + +static EST_Regex ipnum("[0-9]+\\.[0-9]+\\.[0-9]+\\.[0-9]+"); + +int festival_socket_client(const char *host,int port) +{ + // Return an FD to a remote server + struct sockaddr_in serv_addr; + struct hostent *serverhost; + EST_String shost; + int fd; + + if (!socket_initialise()) + { + festival_error(); + } + fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); + + if (NOT_A_SOCKET(fd)) + { + int n = socket_error(); + cerr << "socket: socket failed (" << n << ")\n"; + festival_error(); + } + memset(&serv_addr, 0, sizeof(serv_addr)); + shost = host; + if (shost.matches(ipnum)) + serv_addr.sin_addr.s_addr = inet_addr(host); + else + { + serverhost = gethostbyname(host); + if (serverhost == (struct hostent *)0) + { + cerr << "socket: gethostbyname failed" << endl; + festival_error(); + } + memmove(&serv_addr.sin_addr,serverhost->h_addr, serverhost->h_length); + } + serv_addr.sin_family = AF_INET; + serv_addr.sin_port = htons(port); + + if (connect(fd, (struct sockaddr *)&serv_addr, sizeof(serv_addr)) != 0) + { + cerr << "socket: connect failed" << endl; + festival_error(); + } + + return fd; +} + + diff --git a/src/arch/festival/features.cc b/src/arch/festival/features.cc new file mode 100644 index 0000000..ad928b7 --- /dev/null +++ b/src/arch/festival/features.cc @@ -0,0 +1,467 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Functions for accessing parts of the utterance through "features" */ +/* features are a cononical addressing method for relatively accessing */ +/* information in an utterance from a given item in it. */ +/* */ +/* A feature name is is a dotted separated string of names ended by a */ +/* a final feature. The names before the final feature are navigational */ +/* consisting of a few builtin forms (i.e. next and prev) or names of */ +/* relations that are to be followed, */ +/* */ +/* For example: */ +/* "name" name of the current item */ +/* "n.name" name of the next item (in the current relation */ +/* "n.n.name" is the name of the next next item */ +/* "R:SylStructure.parent.name" name of the parent in the */ +/* SylStructure relation */ +/* for example from an item in the Segment relation */ +/* "p.R:SylStructure.parent.syl_break" */ +/* is the syllable break level of the syllable related */ +/* the previous segment. */ +/* "R:SylStucture.parent.R:Syllable.p.syl_break" */ +/* is the syllable break level of the syllable related */ +/* this item's previous Syllable item. */ +/* */ +/* The following features are defined for all stream items */ +/* name */ +/* Other feature are defined through the C++ function festival_def_ff */ +/* Note duration, is no longer predefined */ +/* */ +/* To extre features are defined here ph_* (i.. any feature prefixed by */ +/* ph_) and lisp_*. ph_ function check the phoneme set and use the */ +/* remainder of the name as a phone feature return its value for the */ +/* current item (which will normally be a Segment stream item). e.g. */ +/* ph_vc or ph_vheight */ +/* The lisp_* feature will call a lisp function (the name following */ +/* the lisp_) with two arguments the utterance and the stream item. */ +/* this allows arbitrary new features without recompilation */ +/* */ +/*=======================================================================*/ +#include +#include +#include "EST_unix.h" +#include +#include "festival.h" +#include "festivalP.h" + +static LISP ff_pref_assoc(const char *name, LISP alist); + +static LISP ff_docstrings = NULL; +static LISP ff_pref_list = NULL; + +void festival_def_nff(const EST_String &name,const EST_String &sname, + EST_Item_featfunc func,const char *doc) +{ + // define the given feature function with documentation + + register_featfunc(name,func); + if (ff_docstrings == NIL) + gc_protect(&ff_docstrings); + EST_String id = sname + "." + name; + ff_docstrings = cons(cons(rintern(id),cstrcons(doc)),ff_docstrings); + siod_set_lval("ff_docstrings",ff_docstrings); + + return; +} + +VAL_REGISTER_FUNCPTR(pref_ffunc,FT_ff_pref_func) +SIOD_REGISTER_FUNCPTR(pref_ffunc,FT_ff_pref_func) + +void festival_def_ff_pref(const EST_String &pref,const EST_String &sname, + FT_ff_pref_func func, const char *doc) +{ + // define the given class of feature functions + // All feature functions names with this prefix will go to this func + LISP lpair; + + lpair = siod_assoc_str(pref,ff_pref_list); + + if (lpair == NIL) + { + if (ff_pref_list == NIL) + gc_protect(&ff_pref_list); + ff_pref_list = cons(cons(rintern(pref), + cons(siod(func),NIL)), + ff_pref_list); + EST_String id = sname + "." + pref; + ff_docstrings = cons(cons(rintern(id),cstrcons(doc)),ff_docstrings); + siod_set_lval("ff_docstrings",ff_docstrings); + } + else + { + fprintf(stderr,"ffeature (prefix) %s duplicate definition\n", + (const char *)pref); + festival_error(); + } + + return; +} + +static LISP ff_pref_assoc(const char *name, LISP alist) +{ + // Search list of ff_pref_funcs to see if name has an appropriate + // prefix + LISP l; + const char *prefix; + + for (l=alist; CONSP(l); l=CDR(l)) + { + prefix = get_c_string(CAR(CAR(l))); + if (strstr(name,prefix) == name) + return CAR(l); + } + + // not found + return NIL; +} + +static EST_Item *parent_to(EST_Item *s,const EST_String &relname) +{ + // Follow parent relation of s until Item in relname + // *includes* testing this s + if (s == 0) + return 0; + if (s->relations().present(relname)) + return s->as_relation(relname); + else + return parent_to(parent(s),relname); +} + +static EST_Item *daughter1_to(EST_Item *s,const EST_String &relname) +{ + // Follow daughter1 relation of s until Item in relname + // *includes* testing this s + if (s == 0) + return 0; + if (s->relations().present(relname)) + return s->as_relation(relname); + else + return daughter1_to(daughter1(s),relname); +} + +static EST_Item *daughtern_to(EST_Item *s,const EST_String &relname) +{ + // Follow parent relation of s until Item in relname + // *includes* testing this s + if (s == 0) + return 0; + if (s->relations().present(relname)) + return s->as_relation(relname); + else + return daughtern_to(daughtern(s),relname); +} + +static EST_String Feature_Separator = "."; +static EST_String Feature_PunctuationSymbols = EST_String::Empty; +static EST_String Feature_PrePunctuationSymbols = EST_String::Empty; +static EST_Val default_feature_value(0); + +// Moving this to be static gives me an extra second but of course +// makes it thread unsafe, and makes reentrancy dubious +static EST_TokenStream ts; + +EST_Val ffeature(EST_Item *item,const EST_String &fname) +{ + // Select and apply feature function name to s and return result + FT_ff_pref_func pfunc; + EST_Item_featfunc func = 0; + LISP lpair; + EST_Item *s = item; + + if (item == 0) + return default_feature_value; + if (strchr(fname,'.') == 0) + { // if its a simple name do it quickly, without tokenizing + if ((func = get_featfunc(fname)) != 0) + return (func)(item); + else if ((lpair = ff_pref_assoc(fname,ff_pref_list)) != NIL) + { + pfunc = pref_ffunc(CAR(CDR(lpair))); + return (pfunc)(item,fname); + } + else // it must be a feature name for this item + return item->f(fname,default_feature_value); + } + ts.open_string(fname); + ts.set_WhiteSpaceChars(Feature_Separator); + ts.set_PunctuationSymbols(Feature_PunctuationSymbols); + ts.set_PrePunctuationSymbols(Feature_PrePunctuationSymbols); + + while (!ts.eof()) + { + const EST_String &Sname = ts.get().string(); + const char *name = Sname; + if (streq(name,"n")) + s=s->next(); + else if (streq(name,"p")) + s=s->prev(); + else if (streq(name,"nn")) + s=s->next()->next(); + else if (streq(name,"pp")) + s=s->prev()->prev(); + else if (streq(name,"up")) // up down should really be private + s=s->up(); + else if (streq(name,"down")) + s=s->down(); + else if (streq(name,"parent")) + s=parent(s); + else if (streq(name,"parent_to")) + s=parent_to(s,ts.get().string()); + else if (streq(name,"daughter1_to")) + s=daughter1_to(s,ts.get().string()); + else if (streq(name,"daughtern_to")) + s=daughtern_to(s,ts.get().string()); + else if (streq(name,"root")) + s=root(s); + else if (streq(name,"daughter1")) + s=daughter1(s); + else if (streq(name,"daughter2")) + s=daughter2(s); + else if (streq(name,"daughtern")) + s=daughtern(s); + else if (streq(name,"last")) + s=s->last(); + else if (streq(name,"first")) + s=s->first(); + else if (strncmp(name,"R:",2) == 0) // new relation structure + s = s->as_relation(&name[2]); + else if (s->f_present(Sname)) + { + EST_String p = Sname; + while (!ts.eof()) + p = EST_String::cat(p,Feature_Separator,ts.get().string()); + return s->f(p, default_feature_value); + } + else if ((func = get_featfunc(Sname)) != 0) + return (func)(s); + else if ((lpair = ff_pref_assoc(name,ff_pref_list)) != NIL) + { + pfunc = pref_ffunc(CAR(CDR(lpair))); + return (pfunc)(s,Sname); + } + else // unrecognized + s = 0; + + if (s==0) + return default_feature_value; + } + + cerr << "Invalid ffeature name: \"" << fname << "\"" << endl; + festival_error(); + + return default_feature_value; +} + +static LISP lisp_item_feature(LISP litem, LISP name) +{ + // return the ffeature name for this stream + EST_Item *s = item(litem); + EST_String fname = get_c_string(name); + + return lisp_val(ffeature(s,fname)); +} + +static LISP lisp_item_raw_feature(LISP litem, LISP name) +{ + // return the ffeature name for this stream + EST_Item *s = item(litem); + EST_String fname = get_c_string(name); + + EST_Val v = ffeature(s,fname); + + if (v.type() == val_type_feats) + return siod(feats(v)); + else + return lisp_val(ffeature(s,fname)); +} + +static LISP lisp_item_set_feat(LISP litem, LISP name, LISP value) +{ + // set the feature (locally) on this sitem + EST_Item *s = item(litem); + EST_String fname = get_c_string(name); + + if (fname.contains("R:")) + { + cerr << "item.set_feat: cannot set feat name containing " << + "\"R:\"" << endl; + festival_error(); + } + s->set_val(fname,val_lisp(value)); + + return value; +} + +static LISP lisp_item_set_function(LISP litem, LISP name, LISP funcname) +{ + EST_Item *s = item(litem); + + s->set_function(get_c_string(name),get_c_string(funcname)); + + return funcname; +} + +static LISP lisp_relation_feature(LISP utt, LISP relname, LISP name) +{ + // return the ffeature name for this stream + EST_Utterance *u = utterance(utt); + EST_String fname = get_c_string(name); + + return lisp_val(u->relation(get_c_string(relname))->f.val(fname)); +} + +static LISP lisp_relation_set_feat(LISP utt, LISP relname, LISP name, LISP val) +{ + // return the ffeature name for this stream + EST_Utterance *u = utterance(utt); + EST_String fname = get_c_string(name); + + u->relation(get_c_string(relname))->f.set_path(fname,val_lisp(val)); + return val; +} + +static LISP lisp_relation_remove_item_feat(LISP utt, LISP relname, LISP name) +{ + // return the ffeature name for this stream + EST_Utterance *u = utterance(utt); + EST_String fname = get_c_string(name); + + u->relation(get_c_string(relname))->remove_item_feature(fname); + + return NIL; +} + +static LISP lisp_relation_remove_feat(LISP utt, LISP relname, LISP name) +{ + // return the ffeature name for this stream + EST_Utterance *u = utterance(utt); + EST_String fname = get_c_string(name); + + u->relation(get_c_string(relname))->f.remove(fname); + + return NIL; +} + +void value_sort(EST_Features &f, const EST_String &field); + +static LISP lisp_feature_value_sort(LISP f, LISP name) +{ + value_sort(*(feats(f)), get_c_string(name)); + return NIL; +} + + +static EST_Val ff_lisp_func(EST_Item *i,const EST_String &name) +{ + // This function is called for features functions starting lisp_ + // It calls the lisp function following that with u and i + // as arguments, the return value (which must be atomic) is + // then passed back as a Val. I'm not sure if this will be + // particularly efficient, but it will make development of + // new features quicker as they can be done in Lisp without + // changing the C++ code. + EST_String lfunc_name = name.after("lisp_"); + LISP r,l; + + l = cons(rintern(lfunc_name), + cons(siod(i),NIL)); + r = leval(l,NIL); + if ((consp(r)) || (r == NIL)) + { + cerr << "FFeature Lisp function: " << lfunc_name << + " returned non-atomic value" << endl; + festival_error(); + } + else if (numberp(r)) + return EST_Val(get_c_float(r)); + + return EST_Val(get_c_string(r)); +} + +void festival_features_init(void) +{ + // declare feature specific Lisp functions + + festival_def_ff_pref("lisp_","any",ff_lisp_func, + "ANY.lisp_*\n\ + Apply Lisp function named after lisp_. The function is called with\n\ + an stream item. It must return an atomic value.\n\ + This method may be inefficient and is primarily desgined to allow\n\ + quick prototyping of new feature functions."); + + init_subr_2("item.feat",lisp_item_feature, + "(item.feat ITEM FEATNAME)\n\ + Return value of FEATNAME (which may be a simple feature name or a\n\ + pathname) of ITEM."); + + init_subr_2("item.raw_feat",lisp_item_raw_feature, + "(item.raw_feat ITEM FEATNAME)\n\ + Return value of FEATNAME as native features structure \n\ + (which may be a simple feature name or a\n\ + pathname) of ITEM."); + + init_subr_2("feats.value_sort",lisp_feature_value_sort, + "(feats.value_sort FEATURES NAME)\n"); + + init_subr_3("item.set_feat",lisp_item_set_feat, + "(item.set_feat ITEM FEATNAME VALUE)\n\ + Set FEATNAME to VALUE in ITEM."); + + init_subr_3("item.set_function",lisp_item_set_function, + "(item.set_function ITEM FEATNAME FEATFUNCNAME)\n\ + Set FEATNAME to feature function name FEATFUNCNAME in ITEM."); + + init_subr_3("utt.relation.feat",lisp_relation_feature, + "(utt.relation.feat UTT RELNAME FEATNAME)\n\ + Return value of FEATNAME on relation RELNAME in UTT."); + + init_subr_3("utt.relation.remove_feat",lisp_relation_remove_feat, + "(utt.relation.remove_feat UTT RELNAME FEATNAME)\n\ + Remove FEATNAME on relation RELNAME in UTT."); + + init_subr_3("utt.relation.remove_item_feat",lisp_relation_remove_item_feat, + "(utt.relation.remove_item_feat UTT RELNAME FEATNAME)\n\ + Remove FEATNAME on every item in relation RELNAME in UTT."); + + init_subr_4("utt.relation.set_feat",lisp_relation_set_feat, + "(utt.relation.set_feat UTT RELNAME FEATNAME VALUE)\n\ + Set FEATNAME to VALUE on relation RELNAME in UTT."); +} + + + diff --git a/src/arch/festival/festival.cc b/src/arch/festival/festival.cc new file mode 100644 index 0000000..987ed52 --- /dev/null +++ b/src/arch/festival/festival.cc @@ -0,0 +1,588 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black and Paul Taylor */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Top level file for synthesizer */ +/* */ +/*=======================================================================*/ +#include +#include "EST_unix.h" +#include "EST_Pathname.h" +#include +#include "festival.h" +#include "festivalP.h" +#include "siod.h" +#include "ModuleDescription.h" + +void festival_lisp_funcs(void); +void festival_lisp_vars(void); +void festival_banner(void); +void festival_load_default_files(void); + +#define _S_S_S(S) #S +#define STRINGIZE(S) _S_S_S(S) + +const char *festival_version = STRINGIZE(FTVERSION) ":" STRINGIZE(FTSTATE) " " STRINGIZE(FTDATE); + +// Allow the path to be passed in without quotes because Windoze command line +// is stupid +// Extra level of indirection needed to get an extra macro expansion. Yeuch. + +#ifdef FTLIBDIRC +# define FTLIBDIR STRINGIZE(FTLIBDIRC) +#endif +#ifdef FTOSTYPEC +# define FTOSTYPE STRINGIZE(FTOSTYPEC) +#endif + +#ifndef FTLIBDIR +#define FTLIBDIR "/projects/festival/lib/" +#endif +#ifndef FTOSTYPE +#define FTOSTYPE "" +#endif + +const char *festival_libdir = FTLIBDIR; +ostream *cdebug; +static int festival_server_port = 1314; +static EST_StrList sub_copyrights; + +#if 0 +LISP describe_module(LISP lname, LISP lstream); +LISP describe_all_modules(void); +#endif + +static int festival_initialized = 0; + +void festival_initialize(int load_init_files,int heap_size) +{ + // all initialisation + + if (! festival_initialized ) + { + siod_init(heap_size); + siod_est_init(); // add support for EST objects + siod_fringe_init(); // and for talking to fringe. + + siod_prog_name = "festival"; + cdebug = new ofstream("/dev/null"); // This wont work on Win/NT + stddebug = fopen("/dev/null","w"); + + festival_lisp_vars(); + festival_lisp_funcs(); + if (load_init_files) + festival_load_default_files(); + festival_initialized = TRUE; + } + else + { + cerr << "festival_initialize() called more than once" << endl; + } + + return; +} + +void festival_repl(int interactive) +{ + // top level read eval print loop + + siod_primary_prompt = "festival> "; + siod_secondary_prompt = "> "; + if (interactive) + festival_banner(); + siod_repl(interactive); + +} + +void festival_server_mode(void) +{ + // Go into server mode + LISP lport; + + lport = siod_get_lval("server_port",NULL); + + if (lport != NULL) + festival_server_port = get_c_int(lport); + + festival_start_server(festival_server_port); + +} + +void festival_init_lang(const EST_String &language) +{ + // Call the lisp function to setup for names language + + leval(cons(rintern("select_language"), + cons(quote(rintern(language)),NIL)), + NIL); +} + +int festival_load_file(const EST_String &fname) +{ + // Load and evaluate named file + EST_String b; + b = EST_String("(load ")+quote_string(fname,"\"","\\",1)+")"; + // I used to do the above without the b intermediate variable + // but that caused a crash for some compilers on some machines + return festival_eval_command(b); +} + +int festival_eval_command(const EST_String &command) +{ + // Eval command but catch any errors + // catch any errors and return true or false depending on success + jmp_buf *old_errjmp = est_errjmp; + int old_errjmp_ok = errjmp_ok; + int rvalue = TRUE; + LISP l; + gc_protect(&l); + + errjmp_ok = 1; + est_errjmp = walloc(jmp_buf,1); + if (setjmp(*est_errjmp)) + { + rvalue = FALSE; + } + else + { + EST_String ll = command; // copy it; + l = read_from_string((char *)ll); + leval(l,NIL); + rvalue = TRUE; + } + + gc_unprotect(&l); + // Restore error handler + wfree(est_errjmp); + est_errjmp = old_errjmp; + errjmp_ok = old_errjmp_ok; + return rvalue; +} + +void festival_tidy_up(void) +{ + // tidy up before exit + + // specifically this closes any ope files, which can be useful + // if finding errors + siod_tidy_up(); + +} + +void festival_banner(void) +{ + /* Print out a banner, copyright and version */ + + if (siod_get_lval("hush_startup",NULL) == NIL) + { + EST_Litem *t; + cout << "\n" << STRINGIZE(FTNAME) << " " << + festival_version << endl; + cout << "Copyright (C) University of Edinburgh, 1996-2010. " << + "All rights reserved." << endl; + if (sub_copyrights.length() > 0) + { + cout << "\n"; + for (t = sub_copyrights.head(); t != 0; t = t->next()) + cout << sub_copyrights.item(t); + } + cout << "For details type `(festival_warranty)'" << endl; + } +} + +void festival_def_utt_module(const char *name,LISP (*fcn)(LISP), + const char *docstring) +{ + // Define an utterance module to the LISP system + init_subr_1(name,fcn,docstring); +} + +int festival_say_file(const EST_String &fname) +{ + /* Say this file as text */ + // should use pathname on this + return festival_eval_command(EST_String("(tts ")+ + quote_string(fname,"\"","\\",1)+ + " nil)"); +} + +int festival_say_text(const EST_String &text) +{ + /* Say this text */ + return festival_eval_command(EST_String("(SayText ")+ + quote_string(text,"\"","\\",1)+ + ")"); +} + +int festival_text_to_wave(const EST_String &text,EST_Wave &wave) +{ + /* Convert text to waveform */ + LISP lutt; + EST_Wave *w; + + if (!festival_eval_command(EST_String("(set! wave_utt (SynthText ")+ + quote_string(text,"\"","\\",1)+ + "))")) + return FALSE; + lutt = siod_get_lval("wave_utt",NULL); + if (!utterance_p(lutt)) + return FALSE; + w = get_utt_wave(utterance(lutt)); + if (w == 0) + return FALSE; + wave = *w; + return TRUE; +} + +void festival_wait_for_spooler(void) +{ + leval(cons(rintern("audio_mode"), + cons(quote(rintern("close")),NIL)),NIL); +} + +static LISP lisp_debug_output(LISP arg) +{ + // switch debug output stream + + if (cdebug != &cerr) + delete cdebug; + if (stddebug != stderr) + fclose(stddebug); + + if (arg == NIL) + { // this might be a problem on non-Unix machines + cdebug = new ofstream("/dev/null"); + stddebug = fopen("/dev/null","w"); + } + else + { + cdebug = &cerr; + stddebug = stderr; + } + + return NIL; +} + +void festival_load_default_files(void) +{ + // Load in default files, init.scm. Users ~/.festivalrc + // (or whatever you wish to call it) is loaded by init.scm + EST_String userinitfile, home_str, initfile; + + // Load library init first + initfile = (EST_String)EST_Pathname(festival_libdir).as_directory() + + "init.scm"; + if (access((const char *)initfile,R_OK) == 0) + vload(initfile,FALSE); + else + cerr << "Initialization file " << initfile << " not found" << endl; + +} + +void festival_lisp_vars(void) +{ + // set up specific lisp variables + EST_TokenStream ts; + int major,minor,subminor; + + siod_set_lval("libdir",strintern(festival_libdir)); + if (!streq(FTOSTYPE,"")) + siod_set_lval("*ostype*",cintern(FTOSTYPE)); + siod_set_lval("festival_version", + strcons(strlen(festival_version),festival_version)); + ts.open_string(festival_version); + ts.set_WhiteSpaceChars(". "); + major = atoi(ts.get().string()); + minor = atoi(ts.get().string()); + subminor = atoi(ts.get().string()); + ts.close(); + siod_set_lval("festival_version_number", + cons(flocons(major), + cons(flocons(minor), + cons(flocons(subminor),NIL)))); + siod_set_lval("*modules*",NIL); + siod_set_lval("*module-descriptions*",NIL); + if (nas_supported) + proclaim_module("nas"); + if (esd_supported) + proclaim_module("esd"); + if (sun16_supported) + proclaim_module("sun16audio"); + if (freebsd16_supported) + proclaim_module("freebsd16audio"); + if (linux16_supported) + proclaim_module("linux16audio"); + if (macosx_supported) + proclaim_module("macosxaudio"); + if (win32audio_supported) + proclaim_module("win32audio"); + if (mplayer_supported) + proclaim_module("mplayeraudio"); + + // Add etc-dir path and machine specific directory etc/$OSTYPE + char *etcdir = walloc(char,strlen(festival_libdir)+strlen("etc/")+ + strlen(FTOSTYPE)+3); + sprintf(etcdir,"%s/etc/%s/",festival_libdir,FTOSTYPE); + char *etcdircommon = walloc(char,strlen(festival_libdir)+strlen("etc/")+3); + sprintf(etcdircommon,"%s/etc/",festival_libdir); + + // Modify my PATH to include these directories + siod_set_lval("etc-path",cons(rintern(etcdir), + cons(rintern(etcdircommon),NIL))); + const char *path = getenv("PATH"); + if (path == 0) + path = ""; + char *newpath = walloc(char,1024+strlen(path)+strlen(etcdir)+ + strlen(etcdircommon)); + sprintf(newpath,"PATH=%s:%s:%s",path,etcdir,etcdircommon); + putenv(newpath); + + wfree(etcdir); + wfree(etcdircommon); + return; +} + +static LISP lmake_tmp_filename(void) +{ + EST_String tfile = make_tmp_filename(); + + return strintern(tfile); +} + +LISP l_wagon(LISP si, LISP tree); +LISP l_lr_predict(LISP si, LISP lr_model); +void festival_unitdb_init(void); +LISP Gen_Viterbi(LISP utt); + +LISP utf8_explode(LISP name) +{ + /* return a list of utf-8 characters as strings */ + const unsigned char *xxx = (const unsigned char *)get_c_string(name); + LISP chars=NIL; + int i, l=0; + char utf8char[5]; + + for (i=0; xxx[i]; i++) + { + if (xxx[i] < 0x80) /* one byte */ + { + sprintf(utf8char,"%c",xxx[i]); + l = 1; + } + else if (xxx[i] < 0xe0) /* two bytes */ + { + sprintf(utf8char,"%c%c",xxx[i],xxx[i+1]); + i++; + l = 2; + } + else if (xxx[i] < 0xff) /* three bytes */ + { + sprintf(utf8char,"%c%c%c",xxx[i],xxx[i+1],xxx[i+2]); + i++; i++; + l = 3; + } + else + { + sprintf(utf8char,"%c%c%c%c",xxx[i],xxx[i+1],xxx[i+2],xxx[i+3]); + i++; i++; i++; + l = 4; + } + chars = cons(strcons(l,utf8char),chars); + } + return reverse(chars); + +} + +void festival_lisp_funcs(void) +{ + // declare festival specific Lisp functions + + // Standard functions + festival_utterance_init(); + festival_features_init(); + festival_wave_init(); + festival_Phone_init(); + + festival_tcl_init(); // does nothing if TCL not selected + festival_wfst_init(); + festival_ngram_init(); + +// festival_unitdb_init(); + // General ones + festival_init_modules(); + + // Some other ones that aren't anywhere else + init_subr_1("parse_url", lisp_parse_url, + "(parse_url URL)\n\ + Split URL into a list (protocol host port path) suitable\n\ + for giving to fopen."); + init_subr_0("make_tmp_filename",lmake_tmp_filename, + "(make_tmp_filename)\n\ + Return name of temporary file."); + init_subr_1("debug_output",lisp_debug_output, + "(debug_output ARG)\n\ + If ARG is non-nil cause all future debug output to be sent to cerr,\n\ + otherwise discard it (send it to /dev/null)."); + init_subr_1("utf8explode", utf8_explode, + "(utf8explode utf8string)\n\ + Returns a list of utf-8 characters in given string."); + + init_subr_2("wagon",l_wagon, + "(wagon ITEM TREE)\n\ + Apply the CART tree TREE to ITEM. This returns the full\n\ + predicted form, you need to extract the value from the returned form\n\ + itself. [see CART trees]"); + init_subr_2("lr_predict",l_lr_predict, + "(lr_predict ITEM LRMODEL)\n\ + Apply the linear regression model LRMODEL to ITEM in. This\n\ + returns float value by summing the product of the coeffients and values\n\ + returned by the specified features in ITEM. [see Linear regression]"); + init_subr_1("Gen_Viterbi",Gen_Viterbi, + "(Gen_Viterbi UTT)\n\ + Applies viterbi search algorithm based on the parameters in\n\ + gen_vit_params. Basically allows user candidate selection function\n\ + combined with ngrams."); + +#if 0 + init_subr_2("describe_module", describe_module, + "(describe_module NAME STREAM)\n\ + Print a description of the named module."); + init_subr_0("describe_all_modules", describe_all_modules, + "(describe_all_modules)\n\ + Print descriptions of all modules."); +#endif + +} + +void proclaim_module(const EST_String &name, + const EST_String &banner_copyright, + const ModuleDescription *description) +{ + // Add this name to the variable *modules*, so people can test for + // it in the lisp world + LISP mods = siod_get_lval("*modules*",NULL); + LISP name_sym = rintern(name); + + siod_set_lval("*modules*",cons(name_sym,mods)); + + if (banner_copyright != "") + sub_copyrights.append(name + ": " + banner_copyright); + + if (description != NULL) + { + LISP module_descriptions = siod_get_lval("*module-descriptions*",NULL); + LISP scheme_desc = siod(description); + siod_set_lval("*module-descriptions*", + cons(cons(name_sym, + cons(scheme_desc, NIL)), + module_descriptions)); + } + +} + +void proclaim_module(const EST_String &name, + const ModuleDescription *description) +{ + proclaim_module(name, "", description); +} + +void init_module_subr(const char *name, LISP (*fcn)(LISP), + const ModuleDescription *description) +{ + char * desc_string = NULL; + + if (description) + { + EST_String desc(ModuleDescription::to_string(*description)); + + desc_string = wstrdup(desc); + } + + init_lsubr((char *)name, fcn, desc_string); + + // delete desc_string; +} + +LISP ft_get_param(const EST_String &pname) +{ + EST_Features &p = Param(); + + if (p.present(pname)) + return lisp_val(p.f(pname)); + else + return NIL; +#if 0 + LISP params,lpair; + + params = siod_get_lval("Parameter","no parameters"); + + lpair = siod_assoc_str(pname,params); + + if (lpair == NIL) + return NIL; + else + return car(cdr(lpair)); +#endif +} + +void print_string(EST_String s) +{ + cout << s << endl; +} + +LISP map_pos(LISP posmap, LISP pos) +{ + // Map specified features + LISP l; + + if (consp(pos) || (pos == NIL)) + return pos; + + for (l=posmap; l != NIL; l=cdr(l)) + if (siod_member_str(get_c_string(pos),car(car(l))) != NIL) + return car(cdr(car(l))); + return pos; +} + +EST_String map_pos(LISP posmap, const EST_String &pos) +{ + LISP l; + + for (l=posmap; l != NIL; l=cdr(l)) + if (siod_member_str(pos,car(car(l))) != NIL) + return get_c_string(car(cdr(car(l)))); + return pos; +} + diff --git a/src/arch/festival/festivalP.h b/src/arch/festival/festivalP.h new file mode 100644 index 0000000..ee5e02b --- /dev/null +++ b/src/arch/festival/festivalP.h @@ -0,0 +1,62 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Festival Private functions */ +/* */ +/*=======================================================================*/ +#ifndef __FESTIVALP_H__ +#define __FESTIVALP_H__ + +/* Set up LISP access functions for various subsystems */ +void festival_utterance_init(void); +void festival_wave_init(void); +void festival_Phone_init(void); +void festival_features_init(void); +void festival_tcl_init(void); +void festival_ngram_init(); +void festival_wfst_init(); +void festival_fringe_init(void); + +extern ostream *cslog; + +LISP l_audio_mode(LISP mode); +void audsp_play_wave(EST_Wave *w); +EST_Wave *get_utt_wave(EST_Utterance *u); + +LISP lisp_parse_url(LISP url); + + +#endif /* __FESTIVALP_H__ */ diff --git a/src/arch/festival/linreg.cc b/src/arch/festival/linreg.cc new file mode 100644 index 0000000..96bc4e2 --- /dev/null +++ b/src/arch/festival/linreg.cc @@ -0,0 +1,88 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : May 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* A simple interpreter for Linear Regression models */ +/* */ +/*=======================================================================*/ +#include +#include "EST_unix.h" +#include "festival.h" + +#define FFEATURE_NAME(X) (get_c_string(car(X))) +#define FFEATURE_WEIGHT(X) (get_c_float(car(cdr(X)))) +#define FFEATURE_MAPCLASS(X) (car(cdr(cdr(X)))) + +EST_Val lr_predict(EST_Item *s, LISP lr_model) +{ + EST_Val v = 0.0; + float answer; + LISP f; + const char *ffeature_name, *last_name=""; + + answer = FFEATURE_WEIGHT(car(lr_model)); // Intercept; + for (f=cdr(lr_model); CONSP(f); f=CDR(f)) + { + ffeature_name = FFEATURE_NAME(CAR(f)); + if (!streq(ffeature_name,last_name)) + v = ffeature(s,ffeature_name); + if (siod_llength(CAR(f)) == 3) + { // A map class is specified + if (siod_member_str(v.string(),FFEATURE_MAPCLASS(CAR(f))) != NIL) + answer += FFEATURE_WEIGHT(CAR(f)); + } + else + answer += FFEATURE_WEIGHT(CAR(f)) * (float)v; + last_name = ffeature_name; + } + + return EST_Val(answer); +} + +LISP l_lr_predict(LISP si, LISP lr_model) +{ + EST_Item *s = item(si); + EST_Val answer; + + answer = lr_predict(s,lr_model); + return flocons(answer.Float()); +} + + + + + + + diff --git a/src/arch/festival/ngram.cc b/src/arch/festival/ngram.cc new file mode 100644 index 0000000..80256ee --- /dev/null +++ b/src/arch/festival/ngram.cc @@ -0,0 +1,128 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Authors: Alan W Black */ +/* Date : December 1997 */ +/*-----------------------------------------------------------------------*/ +/* Access to the Ngrammar */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "festivalP.h" + +static LISP ngram_loaded_list = NIL; +static EST_Ngrammar *load_ngram(const EST_String &filename); +static LISP add_ngram(const EST_String &name,EST_Ngrammar *n); + +SIOD_REGISTER_CLASS(ngrammar,EST_Ngrammar) + +static LISP lisp_load_ngram(LISP name, LISP filename) +{ + EST_Ngrammar *n; + + n = load_ngram(get_c_string(filename)); + add_ngram(get_c_string(name),n); + + return name; +} + +static EST_Ngrammar *load_ngram(const EST_String &filename) +{ + EST_Ngrammar *n = new EST_Ngrammar(); + if (n->load(filename) != 0) + { + fprintf(stderr,"Ngrammar: failed to read ngrammar from \"%s\"", + (const char *)filename); + festival_error(); + } + + return n; +} + +static LISP add_ngram(const EST_String &name,EST_Ngrammar *n) +{ + LISP lpair; + + lpair = siod_assoc_str(name,ngram_loaded_list); + + if (ngram_loaded_list == NIL) + { // First time round so do a little initialization + gc_protect(&ngram_loaded_list); + } + + LISP ng = siod(n); + + if (lpair == NIL) + ngram_loaded_list = + cons(cons(strintern(name),cons(ng,NIL)),ngram_loaded_list); + else + { + cwarn << "Ngrammar: " << name << " recreated" << endl; + setcar(cdr(lpair),ng); + } + + return ng; +} + +EST_Ngrammar *get_ngram(const EST_String &name,const EST_String &filename) +{ + // Find ngram named name, returns NULL if none; + LISP lpair; + + lpair = siod_assoc_str(name,ngram_loaded_list); + + if (lpair == NIL) + { + if (filename != EST_String::Empty) + { + EST_Ngrammar *n = load_ngram(filename); + add_ngram(name,n); + return n; + } + else + { + cwarn << "Ngrammar: no ngram named \"" << name << "\"" << endl; + return 0; + } + } + else + return ngrammar(car(cdr(lpair))); +} + +void festival_ngram_init() +{ + init_subr_2("ngram.load",lisp_load_ngram, + "(ngram.load NAME FILENAME)\n\ + Load an ngram from FILENAME and store it named NAME for later access."); + +} diff --git a/src/arch/festival/server.cc b/src/arch/festival/server.cc new file mode 100644 index 0000000..e35dad1 --- /dev/null +++ b/src/arch/festival/server.cc @@ -0,0 +1,298 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan Black and Richard Tobin */ +/* Date : December 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Socket support to run Festival as a server process */ +/* */ +/* The original socket code is derived Richard Tobin's scokpipe/pipesock */ +/* examples, but have be substantially changed to be more general */ +/* for Festival */ +/* */ +/* The logging, access control, file transfer stuff are all new */ +/* */ +/*=======================================================================*/ +#include +#include +#include +#include +#include +#include +#include "EST_unix.h" +#include "EST_socket.h" +#include "festival.h" +#include "festivalP.h" + +#define DEFAULT_MAX_CLIENTS 10 + +/* The folloing gives a server that never forks */ +/* and only accepts one client at a time. This is good for */ +/* OSs with an expensive implementation of fork and or waitpid (e.g. NT) */ +#ifdef WIN32 +#define SINGLE_CLIENT 1 +#endif + + +static int client_access_check(int fd,int client); +static EST_String log_time_stamp(int client); +static void log_message(int client,const char *message); + +int ft_server_socket = -1; +ostream *cslog = NULL; + +int festival_start_server(int port) +{ + // Never exits except by signals + struct sockaddr_in serv_addr; + int fd, fd1; + int statusp; + int client_name=0; + int max_clients, num_clients, pid; + LISP lmax_clients, llog_file; + + lmax_clients = siod_get_lval("server_max_client",NULL); + if (lmax_clients != NULL) + max_clients = get_c_int(lmax_clients); + else + max_clients = DEFAULT_MAX_CLIENTS; + num_clients = 0; + llog_file = siod_get_lval("server_log_file",NULL); + if (llog_file == NIL) + cslog = cdebug; + else if (llog_file == siod_get_lval("t",NULL)) + cslog = &cout; + else + cslog = new ofstream(get_c_string(llog_file),ios::app); + + if (!socket_initialise()) + { + festival_error(); + } + + fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); + + if (NOT_A_SOCKET(fd)) + { + int n = socket_error(); + cerr << "socket: socket failed (" << n << ")\n"; + + festival_error(); + } + int one = 1; + + if (setsockopt(fd, SOL_SOCKET,SO_REUSEADDR,(char *)&one,sizeof(int)) < 0) + { + cerr << "socket: SO_REUSEADDR failed" << endl; + festival_error(); + } + + memset(&serv_addr, 0, sizeof(serv_addr)); + serv_addr.sin_family = AF_INET; + serv_addr.sin_port = htons(port); + serv_addr.sin_addr.s_addr = htonl(INADDR_ANY); + + if (bind(fd, (struct sockaddr *)&serv_addr, sizeof(serv_addr)) != 0) + { + cerr << "socket: bind failed" << endl; + festival_error(); + } + + if (listen(fd, 5) != 0) + { + cerr << "socket: listen failed" << endl; + festival_error(); + } + +#if SINGLE_CLIENT + log_message(0,EST_String("Festival server (non-forking) started on port ")+ + itoString(port)); +#else + log_message(0,EST_String("Festival server started on port ")+ + itoString(port)); +#endif + + fflush(stdout); + fflush(stderr); + fflush(stdin); + + while(1) // never exits except by signals + { + if((fd1 = accept(fd, 0, 0)) < 0) + { + cerr << "socket: accept failed"; + festival_error(); + } + + client_name++; + if (client_access_check(fd1,client_name) == FALSE) + { + close(fd1); + continue; + } +#ifdef SINGLE_CLIENT + ft_server_socket = fd1; + repl_from_socket(fd1); + log_message(client_name,"disconnected"); +#else + num_clients++; + + // Fork new image of festival and call interpreter + if (num_clients > max_clients) + { + log_message(client_name,"failed: too many clients"); + num_clients--; + } + else if ((pid=fork()) == 0) + { + ft_server_socket = fd1; + repl_from_socket(fd1); + log_message(client_name,"disconnected"); + exit(0); + } + else if (pid < 0) + { + log_message(client_name,"failed to fork new client"); + num_clients--; + } + + while ((num_clients > 0) && (waitpid(0,&statusp,WNOHANG) != 0)) + num_clients--; +#endif + + close(fd1); + } + + return 0; +} + +static int client_access_check(int fd,int client) +{ + // Check client against various possible checks to see if they + // are allowed access to the server + LISP passwd, access_list, deny_list; + int client_access = TRUE; + struct sockaddr_in peer; + socklen_t addrlen=sizeof(peer); + struct hostent *clienthost; + const char *client_hostname; + const char *client_hostnum; + const char *reason = ""; + + getpeername(fd,(struct sockaddr *)&peer,&addrlen); + clienthost = gethostbyaddr((char *)&peer.sin_addr, + sizeof(peer.sin_addr),AF_INET); + client_hostnum = inet_ntoa(peer.sin_addr); + if (streq(client_hostnum,"0.0.0.0") || streq(client_hostnum,"127.0.0.1")) // its me ! + client_hostname = "localhost"; + else if (clienthost == 0) // failed to get a name + client_hostname = client_hostnum; + else + client_hostname = clienthost->h_name; + + if (((deny_list = siod_get_lval("server_deny_list",NULL)) != NIL) && + (siod_regex_member_str(client_hostname,deny_list) != NIL)) + { + client_access = FALSE; + reason = "in deny list"; + } + else if ((access_list = siod_get_lval("server_access_list",NULL)) != NIL) + { + client_access = FALSE; // by default now + reason = "not in access list"; + if (siod_regex_member_str(client_hostname,access_list) != NIL) + { + reason = ""; + client_access = TRUE; + } + } + + passwd = siod_get_lval("server_passwd",NULL); + if ((client_access == TRUE) && (passwd != NULL)) + { + char *client_passwd = walloc(char,strlen(get_c_string(passwd))+1); + read(fd,client_passwd,strlen(get_c_string(passwd))); + client_passwd[strlen(get_c_string(passwd))] = '\0'; + if (streq(get_c_string(passwd),client_passwd)) + client_access = TRUE; + else + { + client_access = FALSE; + reason = "bad passwd"; + } + wfree(client_passwd); + } + char *message = walloc(char,20+strlen(client_hostname)+strlen(reason)); + + if (client_access == TRUE) + { + sprintf(message,"accepted from %s",client_hostname); + log_message(client,message); + } + else + { + sprintf(message,"rejected from %s %s",client_hostname,reason); + log_message(client,message); + } + + wfree(message); + + return client_access; + +} + +static void log_message(int client, const char *message) +{ + // log the message in log file + + *cslog << log_time_stamp(client) << message << endl; +} + +static EST_String log_time_stamp(int client) +{ + // returns a string with client id, time and date + char lst[1024]; + time_t thetime = time(0); + char *cthetime = ctime(&thetime); + cthetime[24] = '\0'; // get rid of \n + + if (client == 0) + sprintf(lst,"server %s : ",cthetime); + else + sprintf(lst,"client(%d) %s : ",client,cthetime); + + return lst; +} + + + diff --git a/src/arch/festival/tcl.cc b/src/arch/festival/tcl.cc new file mode 100644 index 0000000..c7edba7 --- /dev/null +++ b/src/arch/festival/tcl.cc @@ -0,0 +1,127 @@ +/* + * Copyright (C)1997 Jacques H. de Villiers + * Copyright (C)1997 Center for Spoken Language Understanding, + * Oregon Graduate Institute of Science & Technology + * + * The authors hereby grant permission to use, copy, modify, distribute, + * and license this software and its documentation for any purpose, provided + * that existing copyright notices are retained in all copies and that this + * notice is included verbatim in any distributions. No written agreement, + * license, or royalty fee is required for any of the authorized uses. + * Modifications to this software may be copyrighted by their authors + * and need not follow the licensing terms described here, provided that + * the new terms are clearly indicated on the first page of each file where + * they apply. + * + * IN NO EVENT SHALL THE AUTHORS OR DISTRIBUTORS BE LIABLE TO ANY PARTY + * FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES + * ARISING OUT OF THE USE OF THIS SOFTWARE, ITS DOCUMENTATION, OR ANY + * DERIVATIVES THEREOF, EVEN IF THE AUTHORS HAVE BEEN ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + * + * THE AUTHORS AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES, + * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT. THIS SOFTWARE + * IS PROVIDED ON AN "AS IS" BASIS, AND THE AUTHORS AND DISTRIBUTORS HAVE + * NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR + * MODIFICATIONS. + * + * GOVERNMENT USE: If you are acquiring this software on behalf of the + * U.S. government, the Government shall have only "Restricted Rights" + * in the software and related documentation as defined in the Federal + * Acquisition Regulations (FARs) in Clause 52.227.19 (c) (2). If you + * are acquiring the software on behalf of the Department of Defense, the + * software shall be classified as "Commercial Computer Software" and the + * Government shall have only "Restricted Rights" as defined in Clause + * 252.227-7013 (c) (1) of DFARs. Notwithstanding the foregoing, the + * authors grant the U.S. Government and others acting in its behalf + * permission to use and distribute the software in accordance with the + * terms specified in this license. + * + * + * * Sat May 17 13:43:47 BST 1997 * + * minor modifications to make it work in 1.1.4 -- awb@cstr.ed.ac.uk + * + *--------------------------------------------------------------------------- + * This module adds a Tcl interpreter to Scheme, Scheme commands to + * retrieve wave and phoneme info. A Scheme command to invoke Tcl and a + * Tcl command to invoke Scheme. We'll load CSLUsh packages into the Tcl + * interpreter to implement a client/server architecture. + *--------------------------------------------------------------------------- + */ +#include +#include +#include "festival.h" +#include "festivalP.h" + +#if SUPPORT_TCL +#include + +static int festCmd(ClientData d, Tcl_Interp *interp, int argc, char *argv []); + +/* Scheme: (cslush TCL_COMMAND_STRING) + * + * Invoke Tcl interpreter from within Festival's scheme runtime. + * Returns a LISP string, throws an error if the Tcl command fails. + * For the time being a single, global Tcl interp will suffice. + */ +static Tcl_Interp *tcl_interpreter=NULL; +static LISP tcl_eval(LISP tcl_command) +{ + char *cmd=get_c_string(tcl_command); + + if (!tcl_interpreter) + { + tcl_interpreter=Tcl_CreateInterp(); + Tcl_Init(tcl_interpreter); + Tcl_CreateCommand(tcl_interpreter,"festival",festCmd,NULL,NULL); + } + if (Tcl_Eval(tcl_interpreter,cmd)!=TCL_OK) + { + cerr << tcl_interpreter->result << endl; + festival_error(); + } + return strintern(tcl_interpreter->result); +} + + +/* TCL: festival scheme_command + * + * Invoke the scheme interpreter that wraps our Tcl interp. + * Result is a Tcl string + * (The Scheme interpreter aborts the cslush command if a scheme error + * occurs. So much for it being reentrant.) + */ +static int festCmd(ClientData d, Tcl_Interp *interp, int argc, char *argv []) +{ + LISP cmd; + if(argc!=2) { + Tcl_AppendResult(interp,"wrong # args: should be \"",argv[0], + " scheme_command\"",NULL); + return TCL_ERROR; + } + d=d; /* stop compiler warning */ + cmd=strintern(argv[1]); + cmd=read_from_string(cmd); + cmd=leval(cmd,NIL); + Tcl_SetResult(interp,get_c_string(cmd),TCL_VOLATILE); + return TCL_OK; +} + + +/* register our new Scheme functions with Festival + */ +void festival_tcl_init(void) +{ + init_subr_1("tcl_eval",tcl_eval, + "(tcl_eval STRING)\n\ + Evaluate STRING as a Tcl command, return the result as a LISP string"); + + proclaim_module("tcl"); +} +#else /* no TCL */ +void festival_tcl_init(void) +{ + // nothing to do +} +#endif diff --git a/src/arch/festival/utterance.cc b/src/arch/festival/utterance.cc new file mode 100644 index 0000000..0d22dd9 --- /dev/null +++ b/src/arch/festival/utterance.cc @@ -0,0 +1,1041 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* EST_Utterance access functions (from Lisp) */ +/* */ +/*=======================================================================*/ +#include +#include "EST_unix.h" +#include "festival.h" +#include "festivalP.h" + +static LISP item_features(LISP sitem, LISP leval = NIL); +static LISP item_features(EST_Item *s, bool evaluate_ff=false); +static LISP stream_tree_to_lisp(EST_Item *s); + +LISP utt_iform(EST_Utterance &utt) +{ + return read_from_lstring(strintern(utt_iform_string(utt))); +} + +const EST_String utt_iform_string(EST_Utterance &utt) +{ + return utt.f("iform").string(); +} + +const EST_String utt_type(EST_Utterance &utt) +{ + return utt.f("type").string(); +} + + +static LISP utt_flat_repr( LISP l_utt ) +{ + EST_String flat_repr; + EST_Utterance *utt = get_c_utt( l_utt ); + + utt_2_flat_repr( *utt, flat_repr ); + + return strcons(flat_repr.length(), flat_repr.str()); +} + + +static LISP utt_feat(LISP lutt, LISP feat) +{ + EST_Utterance *u = utterance(lutt); + EST_String f = get_c_string(feat); + return lisp_val(u->f(f)); +} + +static LISP item_utt(LISP i) +{ + return siod(get_utt(item(i))); +} + +static LISP item_sub_utt(LISP i) +{ + EST_Utterance *u = new EST_Utterance; + + sub_utterance(*u,item(i)); + + return siod(u); +} + +static LISP utt_set_feat(LISP u, LISP name, LISP value) +{ + EST_String n = get_c_string(name); + + if (TYPEP(value,tc_flonum)) + utterance(u)->f.set(n,get_c_float(value)); + else if (val_p(value)) + utterance(u)->f.set_val(n,val(value)); + else + utterance(u)->f.set(n,get_c_string(value)); + + return value; +} + +static LISP utt_save(LISP utt, LISP fname, LISP ltype) +{ + EST_Utterance *u = utterance(utt); + EST_String filename = get_c_string(fname); + if (fname == NIL) + filename = "save.utt"; + EST_String type = get_c_string(ltype); + if (ltype == NIL) type = "est_ascii"; + + if (type == "est_ascii") + { + if (u->save(filename,type) != write_ok) + { + cerr << "utt.save: saving to \"" << filename << "\" failed" << + endl; + festival_error(); + } + } + else + { + cerr << "utt.save: unknown save format" << endl; + festival_error(); + } + + return utt; +} + +static LISP utt_save_relation(LISP utt, LISP rname, LISP fname, + LISP evaluate_ff) +{ + // Save relation in named file + EST_Utterance *u = utterance(utt); + EST_String relname = get_c_string(rname); + EST_String filename = get_c_string(fname); + bool a; + + if ((evaluate_ff == NIL) || (get_c_int(evaluate_ff) == 0)) + a = false; + else + a = true; + + if (fname == NIL) + filename = "save.utt"; + EST_Relation *r = u->relation(relname); + + if (r->save(filename, a) != write_ok) + { + cerr << "utt.save.relation: saving to \"" << filename << "\" failed" << + endl; + festival_error(); + } + return utt; +} + +static LISP utt_load(LISP utt, LISP fname) +{ + EST_Utterance *u; + if (utt == NIL) + u = new EST_Utterance; + else + u = utterance(utt); + EST_String filename = get_c_string(fname); + + if (u->load(filename) != 0) + { + cerr << "utt.load: loading from \"" << filename << "\" failed" << + endl; + festival_error(); + } + + if (utt == NIL) + return siod(u); + else + return utt; +} + +static LISP utt_relation_load(LISP utt, LISP lrelname, LISP lfilename) +{ + EST_Utterance *u; + if (utt == NIL) + u = new EST_Utterance; + else + u = utterance(utt); + EST_String filename = get_c_string(lfilename); + EST_String relname = get_c_string(lrelname); + EST_Relation *rel = u->create_relation(relname); + + if (rel->load(filename,"esps") != 0) + { + cerr << "utt.load.relation: loading from \"" << filename << + "\" failed" << endl; + festival_error(); + } + + if (utt == NIL) + return siod(u); + else + return utt; +} + +static LISP utt_evaluate_features(LISP utt) +{ + EST_Utterance *u = utterance(utt); + u->evaluate_all_features(); + return NIL; +} + +static LISP utt_evaluate_relation(LISP utt, LISP rname) +{ + EST_Utterance *u = utterance(utt); + EST_String relname = get_c_string(rname); + EST_Relation *r = u->relation(relname); + + r->evaluate_item_features(); + return NIL; +} + +static LISP utt_copy_relation(LISP utt, LISP l_old_name, LISP l_new_name) +{ + EST_Utterance *u = utterance(utt); + EST_String old_name = get_c_string(l_old_name); + EST_String new_name = get_c_string(l_new_name); + + u->create_relation(new_name); + + u->relation(new_name)->f = u->relation(old_name)->f; + + copy_relation(*u->relation(old_name), *u->relation(new_name)); + + return utt; +} + +static LISP utt_copy_relation_and_items(LISP utt, LISP l_old_name, + LISP l_new_name) +{ + EST_Utterance *u = utterance(utt); + EST_String old_name = get_c_string(l_old_name); + EST_String new_name = get_c_string(l_new_name); + + u->create_relation(new_name); + + u->relation(new_name)->f = u->relation(old_name)->f; + + *u->relation(new_name) = *u->relation(old_name); + + return utt; +} + +static LISP utt_relation_print(LISP utt, LISP l_name) +{ + EST_Utterance *u = utterance(utt); + EST_String name = get_c_string(l_name); + + cout << *u->relation(name); + return NIL; +} + +static LISP utt_relation_items(LISP utt, LISP rname) +{ + EST_Utterance *u = utterance(utt); + EST_String relationname = get_c_string(rname); + EST_Item *i; + LISP l = NIL; + + for (i=u->relation(relationname)->head(); i != 0; i=next_item(i)) + l = cons(siod(i),l); + + return reverse(l); +} + +// could be merged with above + +LISP utt_relation_tree(LISP utt, LISP sname) +{ + EST_Utterance *u = utterance(utt); + EST_String relname = get_c_string(sname); + + return stream_tree_to_lisp(u->relation(relname)->head()); +} + +static LISP stream_tree_to_lisp(EST_Item *s) +{ + if (s == 0) + return NIL; + else + { + LISP desc = cons(strintern(s->name()), + cons(item_features(s, false),NIL)); + return cons(cons(desc,stream_tree_to_lisp(s->down())), + stream_tree_to_lisp(s->next())); + } +} + +static LISP set_item_name(LISP litem, LISP newname) +{ + // Set a stream's name to newname + EST_Item *s = item(litem); + + if (s != 0) + s->set_name(get_c_string(newname)); + return litem; +} + +static LISP utt_relation(LISP utt, LISP relname) +{ + EST_Utterance *u = utterance(utt); + EST_String rn = get_c_string(relname); + EST_Item *r; + + r = u->relation(rn)->head(); + + return siod(r); +} + +static LISP utt_relation_create(LISP utt, LISP relname) +{ + EST_Utterance *u = utterance(utt); + EST_String rn = get_c_string(relname); + + u->create_relation(rn); + + return utt; +} + +static LISP utt_relation_delete(LISP utt, LISP relname) +{ + EST_Utterance *u = utterance(utt); + EST_String rn = get_c_string(relname); + + u->remove_relation(rn); + + return utt; +} + +static LISP utt_relationnames(LISP utt) +{ + // Return list of relation names + EST_Utterance *u = utterance(utt); + LISP relnames = NIL; + EST_Features::Entries p; + + for (p.begin(u->relations); p; p++) + relnames = cons(rintern(p->k),relnames); + + return reverse(relnames); +} + +static LISP item_relations(LISP si) +{ + // Return list of relation names + EST_Item *s = item(si); + LISP relnames = NIL; + EST_Litem *p; + + for (p = s->relations().list.head(); p; p=p->next()) + relnames = cons(rintern(s->relations().list(p).k),relnames); + + return reverse(relnames); +} + +static LISP item_relation_name(LISP si) +{ + // Return list of relation names + EST_Item *s = item(si); + + return rintern(s->relation_name()); +} + +static LISP item_relation_remove(LISP li, LISP relname) +{ + EST_String rn = get_c_string(relname); + EST_Item *si = item(li); + remove_item(si,rn); + // Just in case someone tries to access this again + // we set its contents to be 0 which will picked up by item + delete (EST_Val *)USERVAL(li); + EST_Val vv = est_val((EST_Item *)0); + USERVAL(li) = new EST_Val(vv); + return NIL; +} + +static LISP utt_relation_append(LISP utt, LISP relname, LISP li) +{ + EST_Utterance *u = utterance(utt); + EST_String rn = get_c_string(relname); + EST_Relation *r = u->relation(rn); + EST_Item *s=0; + + if (!r) + return NIL; + if (item_p(li)) + s = item(li); + + s = r->append(s); + + if (consp(li)) + { + s->set_name(get_c_string(car(li))); + add_item_features(s,car(cdr(li))); + } + + return siod(s); +} + +static LISP item_next(LISP li) +{ + return (li == NIL) ? NIL : siod(item(li)->next()); +} + +static LISP item_prev(LISP li) +{ + return (li == NIL) ? NIL : siod(item(li)->prev()); +} + +static LISP item_up(LISP li) +{ + return (li == NIL) ? NIL : siod(item(li)->up()); +} + +static LISP item_down(LISP li) +{ + return (li == NIL) ? NIL : siod(item(li)->down()); +} + +static LISP item_parent(LISP li) +{ + if (li == NIL) + return NIL; + else + return siod(parent(item(li))); +} + +static LISP item_daughter1(LISP li) +{ + if (li == NIL) + return NIL; + else + return siod(daughter1(item(li))); +} + +static LISP item_daughter2(LISP li) +{ + if (li == NIL) + return NIL; + else + return siod(daughter2(item(li))); +} + +static LISP item_daughtern(LISP li) +{ + if (li == NIL) + return NIL; + else + return siod(daughtern(item(li))); +} + +static LISP item_link1(LISP li) +{ + if (li == NIL) + return NIL; + else + return siod(link1(item(li))); +} + +static LISP item_link2(LISP li) +{ + if (li == NIL) + return NIL; + else + return siod(link2(item(li))); +} + +static LISP item_linkn(LISP li) +{ + if (li == NIL) + return NIL; + else + return siod(linkn(item(li))); +} + +static LISP item_next_link(LISP li) +{ + if (li == NIL) + return NIL; + else + return siod(next_link(item(li))); +} + +static LISP item_linkedfrom(LISP li) +{ + if (li == NIL) + return NIL; + else + return siod(linkedfrom(item(li))); +} + +static LISP item_next_leaf(LISP li) +{ + return (li == NIL) ? NIL : siod(next_leaf(item(li))); +} + +static LISP item_first_leaf(LISP li) +{ + return (li == NIL) ? NIL : siod(first_leaf(item(li))); +} + +static LISP item_last_leaf(LISP li) +{ + return (li == NIL) ? NIL : siod(last_leaf(item(li))); +} + +static LISP item_add_link(LISP lfrom, LISP lto) +{ + add_link(item(lfrom),item(lto)); + return NIL; +} + +static LISP utt_id(LISP lutt, LISP l_id) +{ + EST_Utterance *u = utterance(lutt); + + return siod(u->id(get_c_string(l_id))); +} + +static LISP item_next_item(LISP li) +{ + return (li == NIL) ? NIL : siod(next_item(item(li))); +} + +static LISP item_append_daughter(LISP li,LISP nli) +{ + EST_Item *l = item(li); + EST_Item *s = 0; + + if (item_p(nli)) + s = item(nli); + + s = l->append_daughter(s); + + if (consp(nli)) + { + s->set_name(get_c_string(car(nli))); + add_item_features(s,car(cdr(nli))); + } + + return siod(s); +} + +static LISP item_prepend_daughter(LISP li,LISP nli) +{ + EST_Item *l = item(li); + EST_Item *s = 0; + + if (item_p(nli)) + s = item(nli); + + s = l->prepend_daughter(s); + + if (consp(nli)) + { + s->set_name(get_c_string(car(nli))); + add_item_features(s,car(cdr(nli))); + } + + return siod(s); +} + +static LISP item_insert_parent(LISP li,LISP nparent) +{ + EST_Item *l = item(li); + EST_Item *s = 0; + + if (item_p(nparent)) + s = item(nparent); + + s = l->insert_parent(s); + + if (consp(nparent)) + { + s->set_name(get_c_string(car(nparent))); + add_item_features(s,car(cdr(nparent))); + } + + return siod(s); +} + +static LISP item_insert(LISP li,LISP nli,LISP direction) +{ + EST_Item *n = item(li); + EST_String dir; + EST_Item *s; + + if (item_p(nli)) + s = item(nli); + else + s = 0; + + if (direction) + dir = get_c_string(direction); + else + dir = "after"; + + if (dir == "after") + s = n->insert_after(s); + else if (dir == "before") + s = n->insert_before(s); + else if (dir == "above") + s = n->insert_above(s); + else if (dir == "below") + s = n->insert_below(s); + else + { + cerr << "item.insert: unknown direction \"" << dir << "\"" << endl; + festival_error(); + } + + if (consp(nli)) // specified information + { + s->set_name(get_c_string(car(nli))); + add_item_features(s,car(cdr(nli))); + } + + return siod(s); +} + +static LISP item_move_tree(LISP from,LISP to) +{ + EST_Item *f = item(from); + EST_Item *t = item(to); + + if (move_sub_tree(f,t) == TRUE) + return truth; + else + return NIL; +} + +static LISP item_merge_item(LISP from,LISP to) +{ + EST_Item *f = item(from); + EST_Item *t = item(to); + + merge_item(f,t); + return truth; +} + +static LISP item_exchange_tree(LISP from,LISP to) +{ + EST_Item *f = item(from); + EST_Item *t = item(to); + + if (exchange_sub_trees(f,t) == TRUE) + return truth; + else + return NIL; +} + +static LISP item_relation(LISP lingitem,LISP relname) +{ + EST_Item *li = item(lingitem); + EST_String rn = get_c_string(relname); + return siod(li->as_relation(rn)); +} + +void utt_cleanup(EST_Utterance &u) +{ + // Remove all relations + // This is called in the Initialization to ensure we can + // continue with a nice clean utterance + + u.relations.clear(); +} + +LISP make_utterance(LISP args,LISP env) +{ + /* Make an utterance structure from given input */ + (void)env; + EST_Utterance *u = new EST_Utterance; + EST_String t; + LISP lform; + + u->f.set("type",get_c_string(car(args))); + lform = car(cdr(args)); + u->f.set("iform",siod_sprint(lform)); + + return siod(u); +} + +static LISP item_delete(LISP litem) +{ + EST_Item *s = item(litem); + + s->unref_all(); + delete (EST_Val *)USERVAL(litem); + EST_Val vv = est_val((EST_Item *)0); + USERVAL(litem) = new EST_Val(vv); + + return NIL; +} + +static LISP item_remove_feature(LISP litem,LISP fname) +{ + EST_Item *s = item(litem); + EST_String f = get_c_string(fname); + + s->f_remove(f); + + return rintern("t"); +} + +static LISP item_features(LISP litem, LISP leval) +{ + // Return assoc list of features on this stream + return item_features(item(litem), ((leval != NIL) ? true : false)); +} + +static LISP item_features(EST_Item *s, bool evaluate_ff) +{ + // Can't simply use features_to_lisp as evaluation requires access + // to item. + LISP features = NIL; + EST_Val tv; + + EST_Features::Entries p; + for (p.begin(s->features()); p != 0; ++p) + { + const EST_Val &v = p->v; + LISP fpair; + + if (v.type() == val_int) + fpair = make_param_int(p->k, v.Int()); + else if (v.type() == val_float) + fpair = make_param_float(p->k, v.Float()); + else if (v.type() == val_type_feats) + fpair = make_param_lisp(p->k, + features_to_lisp(*feats(v))); + else if ((v.type() == val_type_featfunc) && evaluate_ff) + { + tv = (featfunc(v))(s); + if (tv.type() == val_int) + fpair = make_param_int(p->k, tv.Int()); + else if (tv.type() == val_float) + { + fpair = make_param_float(p->k, + tv.Float()); + } + else + fpair = make_param_lisp(p->k, + strintern(tv.string())); + } + else + fpair = make_param_lisp(p->k, + strintern(v.string())); + features = cons(fpair,features); + } + + return reverse(features); +} + +void add_item_features(EST_Item *s,LISP features) +{ + // Add LISP specified features to s; + LISP f; + + for (f=features; f != NIL; f=cdr(f)) + s->set_val(get_c_string(car(car(f))), + val_lisp(car(cdr(car(f))))); +} + +void festival_utterance_init(void) +{ + // declare utterance specific Lisp functions + + // Standard functions + init_fsubr("Utterance",make_utterance, + "(Utterance TYPE DATA)\n\ + Build an utterance of type TYPE from DATA. Different TYPEs require\n\ + different types of data. New types may be defined by defUttType.\n\ + [see Utterance types]"); + init_subr_2("utt.load",utt_load, + "(utt.load UTT FILENAME)\n\ + Loads UTT with the streams and stream items described in FILENAME.\n\ + The format is Xlabel-like as saved by utt.save. If UTT is nil a new\n\ + utterance is created, loaded and returned. If FILENAME is \"-\"\n\ + the data is read from stdin."); + init_subr_2("utt.feat",utt_feat, + "(utt.feat UTT FEATNAME)\n\ + Return value of feature name in UTT."); + init_subr_3("utt.set_feat",utt_set_feat, + "(utt.set_feat UTT FEATNAME VALUE)\n\ + Set feature FEATNAME with VALUE in UTT."); + init_subr_1("utt.flat_repr", utt_flat_repr, + "(utt.flat_repr UTT)\n\ + Returns a flat, string representation of the linguistic information\n\ + contained in fully formed utterance structure UTT." ); + init_subr_3("utt.relation.load",utt_relation_load, + "(utt.relation.load UTT RELATIONNAME FILENAME)\n\ + Loads (and creates) RELATIONNAME from FILENAME into UTT. FILENAME\n\ + should contain simple Xlabel format information. The label part\n\ + may contain the label proper followed by semi-colon separated\n\ + pairs of feature and value."); + init_subr_3("utt.save",utt_save, + "(utt.save UTT FILENAME TYPE)\n\ + Save UTT in FILENAME in an Xlabel-like format. If FILENAME is \"-\"\n\ + then print output to stdout. TYPE may be nil or est_ascii"); + init_subr_4("utt.save.relation",utt_save_relation, + "(utt.save UTT RELATIONNAME FILENAME EVALUATE_FEATURES)\n\ + Save relation RELATIONNAME in FILENAME in an Xlabel-like format. \n\ + If FILENAME is \"-\" then print output to stdout."); + + init_subr_3("utt.copy_relation", utt_copy_relation, + "(utt.copy_relation UTT FROM TO)\n\ + copy relation \"from\" to a new relation \"to\". Note that items\n\ + are NOT copied, simply linked into the new relation"); + + init_subr_3("utt.copy_relation_and_items", utt_copy_relation_and_items, + "(utt.copy_relation_and_items UTT FROM TO)\n\ + copy relation and contents of items \"from\" to a new relation \"to\""); + + init_subr_2("utt.relation.print", utt_relation_print, + "(utt.relation.print UTT NAME)\n\ + print contents of relation NAME"); + + init_subr_1("utt.evaluate", utt_evaluate_features, + "(utt.evaluate UTT)\n\ + evaluate all the features in UTT, replacing feature functions\n\ + with their evaluation."); + + init_subr_2("utt.evaluate.relation", utt_evaluate_relation, + "(utt.evaluate.relation UTT)\n\ + evaluate all the features in RELATION in UTT, replacing feature functions\n\ + with their evaluation."); + + init_subr_2("utt.relation.items",utt_relation_items, + "(utt.relation.items UTT RELATIONNAME)\n\ + Return a list of stream items in RELATIONNAME in UTT. \n\ + If this relation is a tree, the parent streamitem is listed before its \n\ + daughters."); + init_subr_2("utt.relation_tree",utt_relation_tree, + "(utt.relation_tree UTT RELATIONNAME)\n\ + Return a tree of stream items in RELATIONNAME in UTT. This will give a\n\ + simple list if the relation has no ups and downs. \n\ + [see Accessing an utterance]"); + init_subr_1("item.delete",item_delete, + "(item.delete ITEM)\n\ + Remove this item from all relations it is in and delete it."); + init_subr_2("item.set_name",set_item_name, + "(item.set_name ITEM NAME)\n\ + Sets ITEM's name to NAME. [see Accessing an utterance]"); + init_subr_2("item.features",item_features, + "(item.features ITEM EVALUATE_FEATURES))\n\ + Returns all features in ITEM as an assoc list."); + init_subr_2("item.remove_feature",item_remove_feature, + "(item.remove_feature ITEM FNAME)\n\ + Remove feature named FNAME from ITEM. Returns t is successfully\n\ + remove, nil if not found."); + + // New Utterance architecture (Relations and items) + init_subr_2("utt.relation",utt_relation, + "(utt.relation UTT RELATIONNAME)\n\ + Return root item of relation RELATIONNAME in UTT."); + init_subr_2("utt.relation.create",utt_relation_create, + "(utt.relation.create UTT RELATIONNAME)\n\ + Create new relation called RELATIONNAME in UTT."); + init_subr_2("utt.relation.delete",utt_relation_delete, + "(utt.relation.delete UTT RELATIONNAME)\n\ + Delete relation from utt, it the stream items are not linked elsewhere\n\ + in the utterance they will be deleted too."); + init_subr_2("item.relation.remove",item_relation_remove, + "(item.relation.remove ITEM RELATIONNAME)\n\ + Remove this item from Relation, if it apears in no other relation it\n\ + will be deleted too, in contrast item.delete will remove an item\n\ + from all other relations, while this just removes it from this relation.\n\ + Note this will also remove all daughters of this item in this \n\ + relation from this relation."); + init_subr_1("utt.relationnames",utt_relationnames, + "(utt.relationnames UTT)\n\ + List of all relations in this utterance."); + init_subr_3("utt.relation.append",utt_relation_append, + "(utt.relation.append UTT RELATIONNAME ITEM)\n\ + Append ITEM to top of RELATIONNAM in UTT. ITEM may be\n\ + a LISP description of an item or an item itself."); + + init_subr_1("item.next",item_next, + "(item.next ITEM)\n\ + Return the next ITEM in the current relation, or nil if there is\n\ + no next."); + init_subr_1("item.prev",item_prev, + "(item.prev ITEM)\n\ + Return the previous ITEM in the current relation, or nil if there\n\ + is no previous."); + init_subr_1("item.up",item_up, + "(item.up ITEM)\n\ + Return the item above ITEM, or nil if there is none."); + init_subr_1("item.down",item_down, + "(item.down ITEM)\n\ + Return the item below ITEM, or nil if there is none."); + init_subr_3("item.insert",item_insert, + "(item.insert ITEM1 ITEM2 DIRECTION)\n\ + Insert ITEM2 in ITEM1's relation with respect to DIRECTION. If DIRECTION\n\ + is unspecified, after, is assumed. Valid DIRECTIONS as before, after,\n\ + above and below. Use the functions item.insert_parent and\n\ + item.append_daughter for specific tree adjoining. If ITEM2 is of\n\ + type item then it is added directly, otherwise it is treated as a\n\ + description of an item and new one is created."); + + // Relation tree access/creation functions + init_subr_1("item.parent",item_parent, + "(item.parent ITEM)\n\ + Return the item of ITEM, or nil if there is none."); + init_subr_1("item.daughter1",item_daughter1, + "(item.daughter1 ITEM)\n\ + Return the first daughter of ITEM, or nil if there is none."); + init_subr_1("item.daughter2",item_daughter2, + "(item.daughter2 ITEM)\n\ + Return the second daughter of ITEM, or nil if there is none."); + init_subr_1("item.daughtern",item_daughtern, + "(item.daughtern ITEM)\n\ + Return the last daughter of ITEM, or nil if there is none."); + init_subr_1("item.next_leaf",item_next_leaf, + "(item.next_leaf ITEM)\n\ + Return the next leaf item (i.e. one with no daughters) in this \n\ + relation. Note this may traverse up and down the relation tree \n\ + significantly to find it."); + init_subr_1("item.first_leaf",item_first_leaf, + "(item.first_leaf ITEM)\n\ + Returns he left most leaf in the tree dominated by ITEM. This \n\ + is like calling item.daughter1 recursively until an item with no \n\ + daughters is found."); + init_subr_1("item.last_leaf",item_last_leaf, + "(item.last_leaf ITEM)\n\ + Returns he right most leaf in the tree dominated by ITEM. This \n\ + is like calling item.daughtern recursively until an item with no \n\ + daughters is found."); + init_subr_2("item.append_daughter",item_append_daughter, + "(item.append_daughter ITEM1 ITEM2)\n\ + Add a ITEM2 a new daughter (right-most) to ITEM1 in the relation of \n\ + ITEM1. If ITEM2 is of type item then it is added directly otherwise\n\ + ITEM2 is treated as a description of an item and a one is created\n\ + with that description (name features)."); + init_subr_2("item.prepend_daughter",item_prepend_daughter, + "(item.prepend_daughter ITEM1 ITEM2)\n\ + Add a ITEM2 a new daughter (left-most) to ITEM1 in the relation of ITEM1.\n\ + If ITEM2 is of type item then it is added directly otherwise\n\ + ITEM2 is treated as a description of an item and a one is created\n\ + with that description (name features)."); + init_subr_2("item.insert_parent",item_insert_parent, + "(item.insert_parent ITEM1 ITEM2)\n\ + Insert a new parent between this ITEM1 and its parentm in ITEM1's \n\ + relation. If ITEM2 is of type item then it is added directly, \n\ + otherwise it is treated as a description of an item and one is created\n\ + with that description (name features)."); + + + // MLS access/creation functions + init_subr_1("item.link1",item_link1, + "(item.link1 ITEM)\n\ + Return first item linked to ITEM in current relation."); + init_subr_1("item.link2",item_link2, + "(item.link2 ITEM)\n\ + Return second item linked to ITEM in current relation."); + init_subr_1("item.linkn",item_linkn, + "(item.linkn ITEM)\n\ + Return last item linked to ITEM in current relation."); + init_subr_1("item.next_link",item_next_link, + "(item.next_link ITEM)\n\ + Return next item licked to the same item ITEM is linked to."); + init_subr_1("item.linkedfrom",item_linkedfrom, + "(item.linkedfrom ITEM)\n\ + Return the item tht is linked to ITEM."); + init_subr_2("item.add_link",item_add_link, + "(item.add_link ITEMFROM ITEMTO)\n\ + Add a link from ITEMFROM to ITEMTO is the relation ITEMFROM is in."); + + init_subr_1("item.next_item",item_next_item, + "(item.next_item ITEM)\n\ + Will give next item in this relation visiting every item in the \n\ + relation until the end. Traverses in pre-order, root followed by \n\ + daughters (then siblings)."); + + init_subr_2("utt.id", utt_id, + "(utt.id UTT id_number)\n\ + Return the item in UTT whose id matches id_number."); + + init_subr_2("item.relation",item_relation, + "(item.relation ITEM RELATIONNAME)\n\ + Return the item such whose relation is RELATIONNAME. If ITEM\n\ + is not in RELATIONNAME then nil is return."); + + init_subr_1("item.relations",item_relations, + "(item.relations ITEM)\n\ + Return a list of names of the relations this item is in."); + init_subr_1("item.relation.name",item_relation_name, + "(item.relation.name ITEM)\n\ + Return the name of the relation this ITEM is currently being viewed\n\ + through."); + init_subr_2("item.move_tree",item_move_tree, + "(item.move_tree FROM TO)\n\ + Move contents, and descendants of FROM to TO. Old daughters of TO are\n\ + deleted. FROM will be deleted too if it is being viewed as the same\n\ + same relation as TO. FROM will be deleted from its current place in\n\ + TO's relation. Returns t if successful, returns nil if TO is within FROM."); + init_subr_2("item.exchange_trees",item_exchange_tree, + "(item.exchange_tree FROM TO)\n\ + Exchanged contents of FROM and TO, and descendants of FROM and TO.\n\ + Returns t if successful, or nil if FROM or TO contain each other."); + init_subr_2("item.merge",item_merge_item, + "(item.merge FROM TO)\n\ + Merge FROM into TO making them the same items. All features in FROM\n\ + are merged into TO and all references to FROM are made to point to TO."); + init_subr_1("item.get_utt",item_utt, + "(item.get_utt ITEM)\n\ + Get utterance from given ITEM (if possible)."); + init_subr_1("sub_utt",item_sub_utt, + "(sub_utt ITEM)\n\ + Return a new utterance that contains a copy of this item and all its\n\ + descendants and related descendants."); + + init_subr_1("audio_mode",l_audio_mode, + "(audio_mode MODE)\n\ + Control audio specific modes. Five subcommands are supported. If\n\ + MODE is async, start the audio spooler so that Festival need not wait\n\ + for a waveform to complete playing before continuing. If MODE is\n\ + sync wait for the audio spooler to empty, if running, and they cause\n\ + future plays to wait for the playing to complete before continuing.\n\ + Other MODEs are, close which waits for the audio spooler to finish\n\ + any waveforms in the queue and then closes the spooler (it will restart\n\ + on the next play), shutup, stops the current waveform playing and empties\n\ + the queue, and query which lists the files in the queue. The queue may\n\ + be up to five waveforms long. [see Audio output]"); + +} diff --git a/src/arch/festival/viterbi.cc b/src/arch/festival/viterbi.cc new file mode 100644 index 0000000..cc57b78 --- /dev/null +++ b/src/arch/festival/viterbi.cc @@ -0,0 +1,232 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1999 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Authors: Alan W Black */ +/* Date : February 1999 */ +/*-----------------------------------------------------------------------*/ +/* Generic Viterbi search specifications through scheme */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "lexicon.h" + +static EST_VTCandidate *gv_candlist(EST_Item *s,EST_Features &f); +static EST_VTPath *gv_npath(EST_VTPath *p,EST_VTCandidate *c,EST_Features &f); +static double gv_find_wfst_prob(EST_VTPath *p,EST_WFST *wfst, + int n,int &state); +static double gv_find_ngram_prob(EST_VTPath *p,EST_Ngrammar *ngram, + int n,int &state,EST_Features &f); + +LISP Gen_Viterbi(LISP utt) +{ + // For each syllable predict intonation events. + EST_Utterance *u = utterance(utt); + LISP params = siod_get_lval("gen_vit_params","no gen_vit_params"); + EST_Features f; + EST_WFST *wfst = 0; + EST_Ngrammar *ngram = 0; + int num_states; + + // Add some defaults + f.set("gscale_s",1.0); + f.set("gscale_p",0.0); + f.set("Relation","Syllable"); + f.set("return_feat","gen_vit_val"); + lisp_to_features(params,f); + + if (f.present("ngramname")) + { + ngram = get_ngram(f.S("ngramname")); + num_states = ngram->num_states(); + } + else + { + wfst = get_wfst(f.S("wfstname")); + num_states = wfst->num_states(); + } + + EST_Viterbi_Decoder v(gv_candlist,gv_npath,num_states); + v.f = f; + + v.initialise(u->relation(f.S("Relation"))); + v.search(); + v.result("gv_id"); + if (f.present("debug")) + { + v.copy_feature("nprob"); + v.copy_feature("prob"); + v.copy_feature("score"); + v.copy_feature("total_score"); + } + + // Map internal ids back to strings + for (EST_Item *p=u->relation(f.S("Relation"))->head(); p != 0; p=p->next()) + if (wfst == 0) + p->set(f.S("return_feat"),ngram->get_vocab_word(p->I("gv_id"))); + else + p->set(f.S("return_feat"),wfst->in_symbol(p->I("gv_id"))); + + return utt; +} + +static EST_VTCandidate *gv_candlist(EST_Item *s,EST_Features &f) +{ + LISP p; + LISP l; + EST_VTCandidate *c; + EST_VTCandidate *all_c = 0; + EST_WFST *w = 0; + EST_Ngrammar *n = 0; + float prob; + + // Call user function to get candidate probabilities + p = leval(cons(rintern(f.S("cand_function")), + cons(siod(s),NIL)),NIL); + if (f.present("ngramname")) + n = get_ngram(f.S("ngramname")); + else + w = get_wfst(f.S("wfstname")); + + for (l=p; l != NIL; l=cdr(l)) + { + prob = get_c_float(car(cdr(car(l)))); + if (f.present("debug")) + s->set(EST_String("cand_")+get_c_string(car(car(l))),prob); + if (prob != 0) + { + c = new EST_VTCandidate; + if (w == 0) + c->name = n->get_vocab_word(get_c_string(car(car(l)))); + else + c->name = w->in_symbol(get_c_string(car(car(l)))); + c->score = log(prob); + c->s = s; + c->next = all_c; + all_c = c; + } + } + return all_c; +} + +static EST_VTPath *gv_npath(EST_VTPath *p,EST_VTCandidate *c,EST_Features &f) +{ + EST_VTPath *np = new EST_VTPath; + double prob,lprob; + EST_WFST *wfst = 0; + EST_Ngrammar *ngram = 0; + + if (f.present("ngramname")) + ngram = get_ngram(f.S("ngramname")); + else + wfst = get_wfst(f.S("wfstname")); + + np->c = c; + np->from = p; + int n = c->name.Int(); + if (wfst == 0) + prob = gv_find_ngram_prob(p,ngram,n,np->state,f); + else + prob = gv_find_wfst_prob(p,wfst,n,np->state); + + prob = f.F("gscale_p") + (prob * (1-f.F("gscale_p"))); + + if (prob == 0) + lprob = log(0.00000001); + else + lprob = log(prob); + + if (p==0) + np->score = (c->score+lprob); + else + np->score = (c->score+lprob) + p->score; + + if (f.present("debug")) + { + np->f.set("prob",prob); + np->f.set("score",c->score); + np->f.set("nprob",prob*(exp(c->score))); + np->f.set("total_score",np->score); + } + + return np; +} + +static double gv_find_wfst_prob(EST_VTPath *p,EST_WFST *wfst, + int n,int &state) +{ + float prob; + int oldstate; + + if (p == 0) + oldstate = wfst->start_state(); + else + oldstate = p->state; + state = wfst->transition(oldstate,n,n,prob); + return prob; +} + +static double gv_find_ngram_prob(EST_VTPath *p,EST_Ngrammar *ngram, + int n,int &state,EST_Features &f) +{ + int oldstate=0; + double prob; + + if (p == 0) + { + // This could be done once before the search is called + int order = ngram->order(); + int i; + EST_IVector window(order); + + if (order > 1) + window.a_no_check(order-1) = n; + if (order > 2) + window.a_no_check(order-2) = + ngram->get_vocab_word(f.S("p_word")); + for (i = order-3; i>=0; i--) + window.a_no_check(i) = + ngram->get_vocab_word(f.S("pp_word")); + oldstate = ngram->find_state_id(window); + } + else + oldstate = p->state; + state = ngram->find_next_state_id(oldstate,n); + const EST_DiscreteProbDistribution &pd = ngram->prob_dist(oldstate); + if (pd.samples() == 0) + prob = 0; + else + prob = (double)pd.probability(n); + + return prob; +} + diff --git a/src/arch/festival/wagon_interp.cc b/src/arch/festival/wagon_interp.cc new file mode 100644 index 0000000..737fc7b --- /dev/null +++ b/src/arch/festival/wagon_interp.cc @@ -0,0 +1,184 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : May 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* A simple interpreter for CART trees as produced by Wagon */ +/* */ +/*=======================================================================*/ +#include +#include "EST_unix.h" +#include +#include "festival.h" + +#define ques_oper_str(X) (get_c_string(car(cdr(X)))) +#define ques_operand(X) (car(cdr(cdr(X)))) +static int wagon_ask(EST_Item *s, LISP tree, + EST_TKVL *fcache); +static LISP l_wagon_predict(EST_Item *s, LISP tree, + EST_TKVL *fcache); + +/* It seems to be worth building a feature cache */ + +EST_Val wagon_predict(EST_Item *s, LISP tree) +{ + LISP answer,val; + EST_TKVL *fcache; + + fcache = new EST_TKVL; // feature cache saves some calls + answer = l_wagon_predict(s,tree,fcache); + delete fcache; + + // Decide if this is a number of a string + val = car(siod_last(answer)); + if (!FLONUMP(val)) // just in case + return EST_Val(get_c_string(val)); + else if (!CONSP(car(answer))) + return EST_Val(get_c_float(val)); + else + return EST_Val(get_c_string(val)); + +} + +LISP wagon_pd(EST_Item *s, LISP tree) +{ + // return probability distribution + LISP answer; + EST_TKVL *fcache; + + fcache = new EST_TKVL; // feature cache saves some calls + answer = l_wagon_predict(s,tree,fcache); + delete fcache; + + return answer; + +} + +static LISP l_wagon_predict(EST_Item *s, LISP tree, + EST_TKVL *fcache) +{ + // Use the tree to predict + + if (cdr(tree) == NIL) + return car(tree); + else if (wagon_ask(s,car(tree),fcache) == TRUE) + return l_wagon_predict(s,car(cdr(tree)),fcache); + else + return l_wagon_predict(s,car(cdr(cdr(tree))),fcache); +} + +static int wagon_ask(EST_Item *s, LISP question, + EST_TKVL *fcache) +{ + // Ask a question of this stream item + EST_Val answer; + const char *str_oper; + const EST_String fname = get_c_string(car(question)); + LISP operand; + + /* printf("wagon_ask %s\n",(const char *)siod_sprint(question)); */ + if (!fcache->present(fname)) + { + answer = ffeature(s,fname); + fcache->add_item(fname,answer,0); + } + else + answer = fcache->val(fname); + + str_oper = ques_oper_str(question); + operand = ques_operand(question); + // So that you can use LISP variables in the operand (or any LISP) + // check if we've got a , if so eval it. + if ((consp(operand)) && (!consp(car(operand))) && + (streq("+internal-comma",get_c_string(car(operand))))) + operand = leval(cdr(operand),NIL); + + if (streq("is",str_oper)) + if (answer.string() == get_c_string(operand)) + { + /* printf("wagon_ask %s is %s\n",(const char *)answer.string(),(const char *)get_c_string(operand)); */ + return TRUE; + } + else + { + /* printf("wagon_ask %s isnot %s\n",(const char *)answer.string(),(const char *)get_c_string(operand)); */ + return FALSE; + } + else if (streq("=",str_oper)) + if (answer == get_c_float(operand)) + return TRUE; + else + return FALSE; + else if (streq("<",str_oper)) + if ((float)answer < get_c_float(operand)) + return TRUE; + else + return FALSE; + else if (streq(">",str_oper)) + if ((float)answer > get_c_float(operand)) + return TRUE; + else + return FALSE; + else if (streq("matches",str_oper)) + if (answer.string().matches(make_regex(get_c_string(operand)))) + return TRUE; + else + return FALSE; + else if (streq("in",str_oper)) + if (siod_member_str(answer.string(),operand) != NIL) + return TRUE; + else + return FALSE; + else + { + cerr << "Decision tree: unknown question operator: \"" << + str_oper << "\"\n"; + festival_error(); + } + return 0; +} + +LISP l_wagon(LISP si, LISP tree) +{ + // Lisp level binding for tree prediction + EST_Item *s = item(si); + LISP answer; + EST_TKVL *fcache; + + fcache = new EST_TKVL; // feature cache saves some calls + answer = l_wagon_predict(s,tree,fcache); + delete fcache; + return answer; +} + diff --git a/src/arch/festival/wave.cc b/src/arch/festival/wave.cc new file mode 100644 index 0000000..a44c5e7 --- /dev/null +++ b/src/arch/festival/wave.cc @@ -0,0 +1,704 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : October 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Interface to various low level waveform functions from Lisp */ +/* */ +/*=======================================================================*/ +#include +#include "EST_unix.h" +#include +#include "festival.h" +#include "festivalP.h" + +#ifdef WIN32 +#include "winsock2.h" +#endif + +static void utt_save_f0_from_targets(EST_Utterance *u,EST_String &filename); +static float f0_interpolate(EST_Item *ptval, EST_Item *tval, float time); + +EST_Wave *get_utt_wave(EST_Utterance *u) +{ + EST_Relation *r; + + if (((r = u->relation("Wave")) == 0) || (r->head() == 0)) + { + cerr << "no waveform in utterance" << endl; + festival_error(); + } + + return wave(r->head()->f("wave")); +} + +static LISP wave_save(LISP lwave,LISP fname,LISP ftype,LISP stype) +{ + EST_Wave *w = wave(lwave); + EST_String filename,filetype,sampletype; + + if (fname == NIL) + filename = "save.wav"; + else + filename = get_c_string(fname); + if (ftype == NIL) + { + if (ft_get_param("Wavefiletype")) + filetype = get_c_string(ft_get_param("Wavefiletype")); + else + filetype = "nist"; + } + else + filetype = get_c_string(ftype); + if (stype == NIL) + { + if (ft_get_param("Wavesampletype")) + sampletype = get_c_string(ft_get_param("Wavesampletype")); + else + sampletype = "short"; + } + else + sampletype = get_c_string(stype); + + if (w->save_file(filename,filetype,sampletype,EST_NATIVE_BO) != write_ok) + { + cerr << "utt.save.wave: failed to write wave to \"" << filename + << "\"" << endl; + festival_error(); + } + + return truth; +} + +static LISP wave_load(LISP fname,LISP ftype,LISP stype,LISP srate) +{ + EST_Wave *w = new EST_Wave; + EST_read_status r; + + if (ftype == NIL) + r = w->load(get_c_string(fname)); + else if (streq("raw",get_c_string(ftype))) + r = w->load_file(get_c_string(fname), + get_c_string(ftype), + get_c_int(srate), + get_c_string(stype), + EST_NATIVE_BO, + 1); + else + r = w->load(get_c_string(fname),get_c_string(ftype)); + + if (r != format_ok) + cerr << "Cannot load wavefile: " << get_c_string(fname) << endl; + + return siod(w); +} + +static LISP wave_copy(LISP w) +{ + return siod(new EST_Wave(*wave(w))); +} + +static LISP wave_append(LISP w1,LISP w2) +{ + EST_Wave *wave1 = wave(w1); + EST_Wave *wave2 = wave(w2); + + *wave1 += *wave2; + + return w1; +} + +static LISP wave_info(LISP w1) +{ + EST_Wave *w = wave(w1); + + return cons(make_param_float("num_samples", + w->num_samples()), + cons(make_param_float("sample_rate", + w->sample_rate()), + cons(make_param_float("num_channels", + w->num_channels()), + cons(make_param_str("file_type", + w->file_type()), + NIL)))); +} + +static LISP wave_set(LISP lwave,LISP lx, LISP ly, LISP lv) +{ + EST_Wave *t = wave(lwave); + + t->a(get_c_int(lx),get_c_int(ly)) = (short)get_c_float(lv); + return lv; +} + +static LISP wave_set_sample_rate(LISP lwave,LISP lsr) +{ + EST_Wave *t = wave(lwave); + + t->set_sample_rate(get_c_int(lsr)); + return lsr; +} + +static LISP wave_get(LISP lwave,LISP lx, LISP ly) +{ + EST_Wave *t = wave(lwave); + + return flocons(t->a(get_c_int(lx),get_c_int(ly))); +} + +static LISP wave_resize(LISP lwave,LISP lsamples, LISP lchannels) +{ + EST_Wave *t; + + if (lwave) + t = wave(lwave); + else + t = new EST_Wave; + + t->resize(get_c_int(lsamples),get_c_int(lchannels)); + + return siod(t); +} + +static LISP wave_resample(LISP w1,LISP newrate) +{ + EST_Wave *w = wave(w1); + + w->resample(get_c_int(newrate)); + + return w1; +} + +static LISP wave_rescale(LISP lw,LISP lgain,LISP normalize) +{ + EST_Wave *w = wave(lw); + float gain = get_c_float(lgain); + + if (normalize) + w->rescale(gain,TRUE); + else + w->rescale(gain); + + return lw; +} + +void play_wave(EST_Wave *w) +{ + EST_Option al; + LISP audio; + + if (audsp_mode) // asynchronous mode + audsp_play_wave(w); + else + { + if ((audio = ft_get_param("Audio_Method")) != NIL) + al.add_item("-p",get_c_string(audio)); + if ((audio = ft_get_param("Audio_Device")) != NIL) + al.add_item("-audiodevice",get_c_string(audio)); + if ((audio = ft_get_param("Audio_Command")) != NIL) + al.add_item("-command",quote_string(get_c_string(audio))); + if ((audio = ft_get_param("Audio_Required_Rate")) != NIL) + al.add_item("-rate",get_c_string(audio)); + if ((audio = ft_get_param("Audio_Required_Format")) != NIL) + al.add_item("-otype",get_c_string(audio)); + al.add_item("-quality","HIGH"); + play_wave(*w,al); + } +} + +static LISP wave_play(LISP lw) +{ + play_wave(wave(lw)); + return truth; +} + +static LISP track_save(LISP ltrack,LISP fname,LISP ftype) +{ + EST_Track *t = track(ltrack); + EST_String filename,filetype; + + filename = (fname == NIL) ? "save.track" : get_c_string(fname); + filetype = (ftype == NIL) ? "est" : get_c_string(ftype); + + if (t->save(filename, filetype) != write_ok) + { + cerr << "track.save: failed to write track to \"" << filename + << "\"" << endl; + festival_error(); + } + + return truth; +} + +static LISP track_load(LISP fname,LISP ftype,LISP ishift) +{ + EST_Track *t = new EST_Track; + EST_read_status r; + float is = 0.0; + if (ishift) + is = get_c_float(ishift); + + if (ftype == NIL) + r = t->load(get_c_string(fname),is); + else + r = t->load(get_c_string(fname), + get_c_string(ftype), + is); + + if (r != format_ok) + cerr << "Cannot load track: " << get_c_string(fname) << endl; + + return siod(t); +} + +static LISP track_index_below(LISP ltrack, LISP ltime) +{ + EST_Track *t = track(ltrack); + int index = -1; + + if(ltime) + { + index = t->index_below(get_c_float(ltime)); + return flocons(index); + } + + return NIL; +} + +static LISP track_resize(LISP ltrack,LISP lframes, LISP lchannels) +{ + EST_Track *t; + + if (ltrack) + t = track(ltrack); + else + t = new EST_Track; + + t->resize(get_c_int(lframes),get_c_int(lchannels)); + + return siod(t); +} + +static LISP track_set(LISP ltrack,LISP lx, LISP ly, LISP lv) +{ + EST_Track *t = track(ltrack); + + t->a(get_c_int(lx),get_c_int(ly)) = get_c_float(lv); + return lv; +} + +static LISP track_set_time(LISP ltrack,LISP lx, LISP lt) +{ + EST_Track *t = track(ltrack); + + t->t(get_c_int(lx)) = get_c_float(lt); + return lt; +} + +static LISP track_get(LISP ltrack,LISP lx, LISP ly) +{ + EST_Track *t = track(ltrack); + + return flocons(t->a(get_c_int(lx),get_c_int(ly))); +} + +static LISP track_get_time(LISP ltrack,LISP lx) +{ + EST_Track *t = track(ltrack); + + return flocons(t->t(get_c_int(lx))); +} + +static LISP track_frames(LISP ltrack) +{ + return flocons((float)track(ltrack)->num_frames()); +} + +static LISP track_channels(LISP ltrack) +{ + return flocons((float)track(ltrack)->num_channels()); +} + +static LISP track_copy(LISP t) +{ + return siod(new EST_Track(*track(t))); +} + +static LISP track_insert(LISP argv, LISP env) +{ + int i,j; + /* TRACK1 X1 TRACK2 X2 COUNT */ + EST_Track *t1 = track(leval(siod_nth(0,argv),env)); + int x1 = get_c_int(leval(siod_nth(1,argv),env)); + EST_Track *t2 = track(leval(siod_nth(2,argv),env)); + int x2 = get_c_int(leval(siod_nth(3,argv),env)); + int count = get_c_int(leval(siod_nth(4,argv),env)); + + if (t1->num_channels() != t2->num_channels()) + { + cerr << "track.insert: different number of channels" << + t1->num_channels() << " != " << t2->num_channels() << endl; + festival_error(); + } + + if (x1 + count >= t1->num_frames()) + t1->resize(x1+count,t1->num_channels()); + + for (i=0; inum_channels(); j++) + t1->a(x1+i,j) = t2->a(x2+i,j); + /* not sure this is right */ + t1->t(x1+i) = + (x1+i > 0 ? t1->t(x1+i-1) : 0) + + t2->t(x2+i) - (x2+i > 0 ? t2->t(x2+i-1) : 0); + } + + return siod_nth(1,argv); +} + +static LISP utt_save_f0(LISP utt, LISP fname) +{ + // Save utt's F0 in fname as an ESPS file + EST_Utterance *u = utterance(utt); + EST_String filename = get_c_string(fname); + + if ((u->relation_present("F0")) && (u->relation("F0")->head() != 0)) + { + EST_Track *f0 = track(u->relation("F0")->head()->f("f0")); + if (f0->save(filename,"esps") != write_ok) + { + cerr << "utt.save.f0: failed to write f0 to \"" << + filename << "\"" << endl; + festival_error(); + } + } + else if (u->relation("Target") != 0) + utt_save_f0_from_targets(u,filename); + else + { + cerr << "utt.save.f0: utterance doesn't contain F0 or Target stream" + << endl; + festival_error(); + } + return utt; +} + +static void utt_save_f0_from_targets(EST_Utterance *u,EST_String &filename) +{ + // Modifications by Gregor Moehler to do proper target tracing (GM) + EST_Item *s; + EST_Track f0; + float p = 0.0; + float length = u->relation("Segment")->last()->f("end"); + int i,frames = (int)(length / 0.010); + f0.resize(frames,4); + + EST_Item *ptval, *tval; + + ptval = tval = u->relation("Target")->first_leaf(); + for (i=0,s=u->relation("Segment")->first(); s != 0; s=s->next()) + { + if (i >= frames) + break; // may hit here one before end + for ( ; p < s->F("end",0); p+=0.010,i++) + { + if (tval != 0 && p > (float)ffeature(tval,"pos")) + { + ptval = tval; + tval = next_leaf(tval); + } + if (i >= frames) + break; // may hit here one before end + if ((ffeature(s,"ph_vc") == "+") || + (ffeature(s,"ph_cvox") == "+")) + { + f0(i,0) = f0_interpolate(ptval,tval,p); + f0(i,1) = 1; + } + else + { + f0(i,0) = 0; + f0(i,1) = 0.0; // unvoiced; + } + } + } + f0.set_channel_name("F0",0); + f0.set_channel_name("prob_voice",1); + f0.fill_time(0.01); + + if (f0.save(filename,"esps") != write_ok) + { + cerr << "utt.save.f0: failed to write F0 to \"" << + filename << "\"" << endl; + festival_error(); + } + + return; +} + +static float f0_interpolate(EST_Item *ptval, EST_Item *tval, float time) +{ + // GM: changed, to use proper targets + // Return interpolated F0 at time t + float p1,p0,d1,d0; + + d0=0; + d1=0; + + if (tval == 0) // after last target + return ffeature(ptval,"f0"); + + else if (time < (float) ffeature(ptval,"pos")) // before 1st target + return ffeature(tval,"f0"); + + else { + p0 = ffeature(ptval,"f0"); + p1 = ffeature(tval,"f0"); + d0 = ffeature(ptval,"pos"); + d1 = ffeature(tval,"pos"); + } + + if (p0 == 0.0 || d1 == d0) + return p1; + else if (p1 == 0.0) + return p0; + else + return p0 + (p1-p0)*(time-d0)/(d1-d0); +} + +static LISP utt_send_wave_client(LISP utt) +{ + // Send the waveform to a client (must be acting as server) + EST_Utterance *u = utterance(utt); + EST_Wave *w; + EST_String tmpfile = make_tmp_filename(); + LISP ltype; + EST_String type; + + w = get_utt_wave(u); + if (ft_server_socket == -1) + { + cerr << "utt_send_wave_client: not in server mode" << endl; + festival_error(); + } + + ltype = ft_get_param("Wavefiletype"); + if (ltype == NIL) + type = "nist"; + else + type = get_c_string(ltype); + w->save(tmpfile,type); +#ifdef WIN32 + send(ft_server_socket,"WV\n",3,0); +#else + write(ft_server_socket,"WV\n",3); +#endif + socket_send_file(ft_server_socket,tmpfile); + unlink(tmpfile); + + return utt; +} + +/* Asterisk support, see http://www.asterisk.org */ + +static LISP utt_send_wave_asterisk(LISP utt) +{ + // Send the waveform to a client (must be acting as server) + EST_Utterance *u = utterance(utt); + EST_Wave *w; + EST_String tmpfile = make_tmp_filename(); + LISP ltype; + EST_String type; + + w = get_utt_wave(u); + if (ft_server_socket == -1) + { + cerr << "utt_send_wave_asterisk: not in server mode" << endl; + festival_error(); + } + + ltype = ft_get_param("Wavefiletype"); + if (ltype == NIL) + type = "nist"; + else + type = get_c_string(ltype); + w->resample(8000); + w->rescale(5); + + w->save(tmpfile,type); +#ifdef WIN32 + send(ft_server_socket,"WV\n",3,0); +#else + write(ft_server_socket,"WV\n",3); +#endif + socket_send_file(ft_server_socket,tmpfile); + unlink(tmpfile); + + return utt; +} + + +static LISP send_sexpr_to_client(LISP l) +{ + EST_String tmpfile = make_tmp_filename(); + FILE *fd; + + fd = fopen(tmpfile,"w"); + + lprin1f(l,fd); + fprintf(fd,"\n"); + fclose(fd); +#ifdef WIN32 + send(ft_server_socket,"LP\n",3,0); +#else + write(ft_server_socket,"LP\n",3); +#endif + socket_send_file(ft_server_socket,tmpfile); + unlink(tmpfile); + + return l; +} + +void festival_wave_init(void) +{ + // declare utterance (wave) specific Lisp functions + + init_subr_4("wave.save",wave_save, + "(wave.save WAVE FILENAME FILETYPE SAMPLETYPE)\n\ + Save WAVE in FILENAME, respecting FILETYPE and SAMPLETYPE if specifed\n\ + if these last two arguments are unspecified the global parameters\n\ + Wavefiletype and Wavesampletype are used. Returns t is successful\n\ + and throws an error if not."); + init_subr_4("wave.load",wave_load, + "(wave.load FILENAME FILETYPE SAMPLETYPE SAMPLERATE)\n\ + Load and return a wave from FILENAME. Respect FILETYPE is specified\n\ + if not specified respect whatever header is on the file. SAMPLETYPE\n\ + and SAMPLERATE are only used if FILETYPE is raw."); + init_subr_1("wave.copy",wave_copy, + "(wave.copy WAVE)\n\ + Return a copy of WAVE."); + init_subr_2("wave.append",wave_append, + "(wave.copy WAVE1 WAVE2)\n\ + Destuctively append WAVE2 to WAVE1 and return WAVE1."); + init_subr_1("wave.info",wave_info, + "(wave.info WAVE)\n\ + Returns assoc list of info about this wave."); + init_subr_2("wave.resample",wave_resample, + "(wave.resample WAVE NEWRATE)\n\ + Resamples WAVE to NEWRATE."); + init_subr_3("wave.rescale",wave_rescale, + "(wave.rescale WAVE GAIN NORMALIZE)\n\ + If NORMALIZE is specified and non-nil, maximizes the waveform first\n\ + before applying the gain."); + init_subr_1("wave.play",wave_play, + "(wave.play WAVE)\n\ + Play wave of selected audio"); + init_subr_3("wave.resize",wave_resize, + "(wave.resize WAVE NEWSAMPLES NEWCHANNELS)\n\ + Resize WAVE to have NEWSAMPLES number of frames and NEWCHANNELS\n\ + number of channels. If WAVE is nil a new wave is made of the\n\ + requested size."); + init_subr_4("wave.set",wave_set, + "(wave.set WAVE X Y V)\n\ + Set position X Y to V in WAVE.") +; init_subr_3("wave.get",wave_get, + "(wave.get WAVE X Y)\n\ + Get value of X Y in WAVE."); + init_subr_2("wave.set_sample_rate",wave_set_sample_rate, + "(wave.set_sample_rate WAVE SR)\n\ +set sample rate to SR."); + + + init_subr_3("track.save",track_save, + "(track.save TRACK FILENAME FILETYPE)\n\ + Save TRACK in FILENAME, in format FILETYPE, est is used if FILETYPE\n\ + is unspecified or nil."); + init_subr_3("track.load",track_load, + "(track.load FILENAME FILETYPE ISHIFT)\n\ + Load and return a track from FILENAME. Respect FILETYPE is specified\n\ + and ISHIFT if specified."); + init_subr_1("track.copy",track_copy, + "(track.copy TRACK)\n\ + Return a copy of TRACK."); + init_subr_2("track.index_below",track_index_below, + "(track.index_below TRACK TIME)\n\ + Returns the first frame index before this time."); + init_subr_3("track.resize",track_resize, + "(track.resize TRACK NEWFRAMES NEWCHANNELS)\n\ + Resize TRACK to have NEWFRAMES number of frames and NEWCHANNELS\n\ + number of channels. If TRACK is nil a new track is made of the\n\ + requested size."); + init_subr_1("track.num_frames",track_frames, + "(track.num_frames TRACK)\n\ + Returns number of frames in TRACK."); + init_subr_1("track.num_channels",track_channels, + "(track.num_channels TRACK)\n\ + Returns number of channels in TRACK."); + init_subr_4("track.set",track_set, + "(track.set TRACK X Y V)\n\ + Set position X Y to V in TRACK."); + init_subr_3("track.get",track_get, + "(track.get TRACK X Y)\n\ + Get value of X Y in TRACK."); + init_subr_3("track.set_time",track_set_time, + "(track.set_time TRACK X TIME)\n\ + Set time at X to TIME in TRACK."); + init_subr_2("track.get_time",track_get_time, + "(track.get_time TRACK X)\n\ + Get time of X in TRACK."); + init_fsubr("track.insert",track_insert, + "(track.insert TRACK1 X1 TRACK2 X2 COUNT)\n\ + Insert TRACK2 from X2 to X2+COUNT into TRACK1 at X1. TRACK1 is resized\n\ + as required."); + init_subr_1("utt.send.wave.client",utt_send_wave_client, + "(utt.send.wave.client UTT)\n\ + Sends wave in UTT to client. If not in server mode gives an error\n\ + Note the client must be expecting to receive the waveform."); + init_subr_1("utt.send.wave.asterisk",utt_send_wave_asterisk, +"(utt.send.wave.asterisk UTT)\n\ + Sends wave in UTT to client. If not in server mode gives an error\n\ + Note the client must be expecting to receive the waveform. The waveform\n\ + is rescaled and resampled according to what asterisk needs"); + init_subr_1("send_sexpr_to_client", send_sexpr_to_client, + "(send_sexpr_to_client SEXPR)\n\ +Sends given sexpression to currently connected client."); + init_subr_2("utt.save.f0",utt_save_f0, + "(utt.save.f0 UTT FILENAME)\n\ + Save F0 of UTT as esps track file in FILENAME."); + +} + + + diff --git a/src/arch/festival/web.cc b/src/arch/festival/web.cc new file mode 100644 index 0000000..3487c18 --- /dev/null +++ b/src/arch/festival/web.cc @@ -0,0 +1,219 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Authors: Alan W Black */ +/* Date : November 1996 */ +/*-----------------------------------------------------------------------*/ +/* Some basic functions for dealing with the web and urls */ +/* */ +/*=======================================================================*/ +#include +#include "EST_unix.h" +#include +#include "festival.h" +#include "festivalP.h" +#include "EST_String.h" + +#if 0 +static int getc_unbuffered(int fd,int *c); +#endif + +LISP parse_url(EST_String url) +{ + EST_String protocol, host, port, path; + + if (!parse_url(url, protocol, host, port, path)) + err("can't parse URL", url); + + return cons(strintern(protocol), + cons(strintern(host), + cons(strintern(port), + cons(strintern(path), NIL)))); +} + +LISP lisp_parse_url(LISP l_url) +{ + EST_String url(get_c_string(l_url)); + + return parse_url(url); +} + + +#if 0 +LISP lisp_get_url(LISP url,LISP filename) +{ + // Copy file identified by URL and copy it to filename + // Current only file:/.../... and http:/.../... are supported + EST_TokenStream us; + EST_String host,file,port; + char *comm = walloc(char,32+strlen(get_c_string(url))); + char *getstr = walloc(char,32+strlen(get_c_string(filename))); + int wwwserver=-1,c; + FILE *fd, *fin, *fout; + + // first parse the url + us.open_string(get_c_string(url)); + us.set_WhiteSpaceChars(""); + us.set_SingleCharSymbols(":/"); + + if (us.peek() == "http") + { + us.get(); + if ((us.get() != ":") || + (us.get() != "/") || + (us.get() != "/")) + { + cerr << "url_get: malformed url" << endl; + festival_error(); + } + host = us.get().string(); // upto next / + if (us.peek() == ":") // a port is specified + { + us.get(); + port = us.get().string(); + } + else + port = "80"; // standard port for http servers + file = us.get_upto_eoln(); + sprintf(comm,"telnet %s %s",(const char *)host,(const char *)port); + wwwserver = festival_socket_client(host,atoi(port)); + if (wwwserver < 0) + { + cerr << "get_url: can't access server\n"; + festival_error(); + } + fout = fdopen(wwwserver,"wb"); + fprintf(fout,"GET %s\n",(const char *)file); + fflush(fout); + + if ((fd=fopen(get_c_string(filename),"wb")) == NULL) + { + cerr << "get_url: can't open outputfile \"" << + get_c_string(filename) << "\"\n"; + festival_error(); + } + else + { + while (getc_unbuffered(wwwserver,&c) != EOF) + putc(c,fd); + fclose(fd); + } + + close(wwwserver); + wfree(comm); + wfree(getstr); + } + else if (us.peek() == "file") + { + us.get(); + if (us.get() != ":") + { + cerr << "url_get: malformed url" << endl; + festival_error(); + } + file = us.get_upto_eoln(); + if ((fin = fopen(file,"rb")) == NULL) + { + cerr << "get_url: unable to access file url \"" << + get_c_string(url) << "\"\n"; + festival_error(); + } + if ((fd=fopen(get_c_string(filename),"wb")) == NULL) + { + cerr << "get_url: can't open outputfile \"" << + get_c_string(filename) << "\"\n"; + fclose(fin); + festival_error(); + } + else + { + while ((c=getc(fin)) != EOF) + putc(c,fd); + fclose(fd); + fclose(fin); + } + } + else + { + cerr << "get_url: unrecognizable url \"" << + get_c_string(url) << "\"\n"; + festival_error(); + } + + return NIL; +} + +LISP l_open_socket(LISP host, LISP port, LISP how) +{ + // Open socket to remote server + int fd; + char *how_c; + + fd = festival_socket_client(get_c_string(host),get_c_int(port)); + if (streq(get_c_string(how),"rw")) + { // return list of r and w FILEDESCRIPTORS + return cons(siod_fdopen_c(fd, + EST_String(get_c_string(host))+ + ":"+get_c_string(port), + "rb"), + cons(siod_fdopen_c(fd, + EST_String(get_c_string(host))+ + ":"+get_c_string(port), + "wb"),NIL)); + } + else + { + if (how == NIL) + how_c = "wb"; + else + how_c = get_c_string(how); + return siod_fdopen_c(fd, + EST_String(get_c_string(host))+":"+get_c_string(port), + how_c); + } +} + +static int getc_unbuffered(int fd,int *rc) +{ + // An attempted to get rid of the buffering + char c; + int n; + + n = read(fd,&c,1); + *rc = c; + if (n == 0) + return EOF; + else + return 0; +} + +#endif diff --git a/src/arch/festival/wfst.cc b/src/arch/festival/wfst.cc new file mode 100644 index 0000000..0945d7f --- /dev/null +++ b/src/arch/festival/wfst.cc @@ -0,0 +1,150 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Authors: Alan W Black */ +/* Date : December 1997 */ +/*-----------------------------------------------------------------------*/ +/* Access to WFST classes */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "festivalP.h" + +static LISP wfst_loaded_list = NIL; +static EST_WFST *load_wfst(const EST_String &filename); +static LISP add_wfst(const EST_String &name,EST_WFST *n); + +SIOD_REGISTER_CLASS(wfst,EST_WFST) + +static LISP lisp_load_wfst(LISP name, LISP filename) +{ + EST_WFST *n; + + n = load_wfst(get_c_string(filename)); + add_wfst(get_c_string(name),n); + + return name; +} + +static EST_WFST *load_wfst(const EST_String &filename) +{ + EST_WFST *n = new EST_WFST(); + if (n->load(filename) != 0) + { + fprintf(stderr,"WFST: failed to read wfst from \"%s\"\n", + (const char *)filename); + festival_error(); + } + + return n; +} + +static LISP add_wfst(const EST_String &name,EST_WFST *n) +{ + LISP lpair; + + lpair = siod_assoc_str(name,wfst_loaded_list); + + if (wfst_loaded_list == NIL) + gc_protect(&wfst_loaded_list); + + LISP lwfst = siod(n); + + if (lpair == NIL) + wfst_loaded_list = + cons(cons(strintern(name),cons(lwfst,NIL)),wfst_loaded_list); + else + { + cwarn << "WFST: " << name << " recreated" << endl; + setcar(cdr(lpair),lwfst); + } + return lwfst; +} + +EST_WFST *get_wfst(const EST_String &name,const EST_String &filename) +{ + // Find ngram named name, returns NULL if none; + LISP lpair; + + lpair = siod_assoc_str(name,wfst_loaded_list); + + if (lpair == NIL) + { + if (filename != EST_String::Empty) + { + EST_WFST *n = load_wfst(filename); + add_wfst(name,n); + return n; + } + else + { + cwarn << "WFST: no wfst named \"" << name << "\" loaded" << endl; + return 0; + } + } + else + return wfst(car(cdr(lpair))); +} + +LISP lisp_wfst_transduce(LISP wfstname, LISP input) +{ + EST_WFST *wfst = get_wfst(get_c_string(wfstname)); + EST_StrList in,out; + int r; + + if (consp(input)) + siod_list_to_strlist(input,in); + else + siod_list_to_strlist(stringexplode(get_c_string(input)),in); + + r = transduce(*wfst,in,out); + + if (r == FALSE) + return rintern("FAILED"); + else + return siod_strlist_to_list(out); +} + +void festival_wfst_init() +{ + + init_subr_2("wfst.load",lisp_load_wfst, + "(wfst.load NAME FILENAME)\n\ + Load a WFST from FILENAME and store it named NAME for later access."); + init_subr_2("wfst.transduce",lisp_wfst_transduce, + "(wfst.trasduce WFSTNAME INPUT)\n\ + Transduce list INPUT (or exploded INPUT if its an atom) to a list of \n\ + outputs. The atom FAILED is return if the transduction fails."); + + +} diff --git a/src/include/Makefile b/src/include/Makefile new file mode 100644 index 0000000..2b68e49 --- /dev/null +++ b/src/include/Makefile @@ -0,0 +1,42 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +TOP=../.. +DIRNAME=src/include +H = festival.h \ + Phone.h intonation.h lexicon.h \ + text.h fngram.h modules.h ModuleDescription.h \ + module_support.h +FILES=Makefile $(H) + +include $(TOP)/config/common_make_rules + diff --git a/src/include/ModuleDescription.h b/src/include/ModuleDescription.h new file mode 100644 index 0000000..ca647d9 --- /dev/null +++ b/src/include/ModuleDescription.h @@ -0,0 +1,148 @@ + /************************************************************************/ + /* */ + /* Centre for Speech Technology Research */ + /* University of Edinburgh, UK */ + /* Copyright (c) 1996,1997 */ + /* All Rights Reserved. */ + /* */ + /* Permission is hereby granted, free of charge, to use and distribute */ + /* this software and its documentation without restriction, including */ + /* without limitation the rights to use, copy, modify, merge, publish, */ + /* distribute, sublicense, and/or sell copies of this work, and to */ + /* permit persons to whom this work is furnished to do so, subject to */ + /* the following conditions: */ + /* 1. The code must retain the above copyright notice, this list of */ + /* conditions and the following disclaimer. */ + /* 2. Any modifications must be clearly marked as such. */ + /* 3. Original authors' names are not deleted. */ + /* 4. The authors' names are not used to endorse or promote products */ + /* derived from this software without specific prior written */ + /* permission. */ + /* */ + /* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ + /* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ + /* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ + /* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ + /* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ + /* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ + /* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ + /* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ + /* THIS SOFTWARE. */ + /* */ + /*************************************************************************/ + +#ifndef __MODULEDESCRIPTION_H__ +#define __MODULEDESCRIPTION_H__ + +#include +#include + +using namespace std; + +#include "EST_String.h" + +/** Machine readable descriptions of modules. Useful for help messages + * and for verifying that a set of modules should work together. + * + * This is a struct rather than a class so that it can be initialised + * in the source of the module. + * @author Richard Caley + * @version $Id: ModuleDescription.h,v 1.3 2004/09/29 08:56:56 robert Exp $ + */ + +struct ModuleDescription { + + /**@name limits */ +//@{ +/// Number of lines of descriptive text. +#define MD_MAX_DESCRIPTION_LINES (10) +/// Space for input streams. +#define MD_MAX_INPUT_STREAMS (5) +/// Space for optional streams. +#define MD_MAX_OPTIONAL_STREAMS (5) +/// Space for output streams. +#define MD_MAX_OUTPUT_STREAMS (5) +/// Space for parameters. +#define MD_MAX_PARAMETERS (10) +//@} + +/**@name Parameter types + * Use these for types to avoid typoes and to allow for a cleverer system + * at a later date. + */ +//@{ +/// t or nil +#define mpt_bool "BOOL" +/// Positive integer +#define mpt_natnum "NATNUM" +/// Integer +#define mpt_int "INT" +/// Floating point number +#define mpt_float "FLOAT" +/// Any string +#define mpt_string "STRING" +/// A UnitDatabase +#define mpt_unitdatabase "UNITDATABASE" +/// Anything +#define mpt_other "OTHER" +//@} + + /// name of module + const char * name; + /// version number of module + float version; + /// where it comes from + const char * organisation; + /// person(s) responsible + const char * author; + + /// general description + const char * description[MD_MAX_DESCRIPTION_LINES]; + + /// streams affected. + struct stream_parameter { + /// default stream name + const char * name; + /// what itis used for + const char * description; + }; + + /// Streams which must have values when the module is called. + struct stream_parameter input_streams[MD_MAX_INPUT_STREAMS]; + /// Streams which may or not be defined. + struct stream_parameter optional_streams[MD_MAX_OPTIONAL_STREAMS]; + /// Streams which will be defined after the module has run. + struct stream_parameter output_streams[MD_MAX_OUTPUT_STREAMS]; + + /// Record for a parameter. + struct parameter { + /// Name of parameter + const char * name; + /// Type of value. + const char * type; + /// Default value assumed. + const char * default_val; + /// Human readable description of effect. + const char * description; + }; + /// Parameters which effect the module. + struct parameter parameters[MD_MAX_PARAMETERS]; + + /// Create human readable string from description. + static EST_String to_string(const ModuleDescription &desc); + /// Create a module description, initialising it properly. + static struct ModuleDescription *create(); + /// Print the description to the strream. + static ostream &print(ostream &s, const ModuleDescription &desc); + + static int print(FILE *s, const ModuleDescription &desc); + +}; + +/// Output operator for descriptions. +ostream &operator << (ostream &stream, const ModuleDescription &desc); + +VAL_REGISTER_CLASS_DCLS(moddesc,ModuleDescription) +SIOD_REGISTER_CLASS_DCLS(moddesc,ModuleDescription) + +#endif diff --git a/src/include/Phone.h b/src/include/Phone.h new file mode 100644 index 0000000..1cb29fe --- /dev/null +++ b/src/include/Phone.h @@ -0,0 +1,147 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* Phone and PhoneSet class header file */ +/* */ +/*=======================================================================*/ +#ifndef __PHONE_H__ +#define __PHONE_H__ + +class Phone{ + private: + EST_String name; + EST_StrStr_KVL features; +public: + Phone() {name = "";} + EST_String &phone_name() {return name;} + void set_phone_name(const EST_String &p) {name = p;} + void add_feat(const EST_String &f, const EST_String &v) + { features.add_item(f,v); } + const EST_String &val(const EST_String &key) const + { return features.val_def(key,"");} + const EST_String &val(const EST_String &key,const EST_String &def) + { return features.val_def(key,def); } + int match_features(Phone *foreign); + + inline friend ostream& operator<<(ostream& s, Phone &p); + + Phone & operator =(const Phone &a); +}; + +inline ostream& operator<<(ostream& s, Phone &p) +{ + s << "[PHONE " << p.phone_name() << "]"; +// s << p.features << endl; + return s; +} + +inline Phone &Phone::operator = (const Phone &a) +{ + name = a.name; + features = a.features; + return *this; +} + + +class PhoneSet{ + private: + EST_String psetname; + LISP silences; + LISP map; + LISP feature_defs; // List of features and values + LISP phones; +public: + PhoneSet() {psetname = ""; phones=feature_defs=map=silences=NIL; + gc_protect(&silences); gc_protect(&map); + gc_protect(&feature_defs); gc_protect(&phones);} + ~PhoneSet(); + const EST_String &phone_set_name() const {return psetname;} + void set_phone_set_name(const EST_String &p) {psetname = p;} + int present(const EST_String &phone) const + {return (siod_assoc_str(phone,phones) != NIL);} + int is_silence(const EST_String &ph) const; + void set_silences(LISP sils); + void set_map(LISP m); + LISP get_silences(void) {return silences;} + LISP get_phones(void) {return phones;} + LISP get_feature_defs(void) {return reverse(feature_defs);} + int num_phones(void) const {return siod_llength(phones);} + Phone *member(const EST_String &phone) const; + int phnum(const char *phone) const; + const char *phnum(const int n) const; + int add_phone(Phone *phone); + int feat_val(const EST_String &feat, const EST_String &val) + { return (siod_member_str(val, + car(cdr(siod_assoc_str(feat,feature_defs)))) + != NIL); } + void set_feature(const EST_String &name, LISP vals); + + inline friend ostream& operator<<(ostream& s, PhoneSet &p); + + Phone *find_matched_phone(Phone *phone); + PhoneSet & operator =(const PhoneSet &a); +}; + +inline ostream& operator<<(ostream& s, PhoneSet &p) +{ + s << p.phone_set_name(); return s; +} + +const EST_String &map_phone(const EST_String &fromphonename, + const EST_String &fromsetname, + const EST_String &tosetname); +const EST_String &ph_feat(const EST_String &ph,const EST_String &feat); +int ph_is_silence(const EST_String &ph); +int ph_is_vowel(const EST_String &ph); +int ph_is_consonant(const EST_String &ph); +int ph_is_liquid(const EST_String &ph); +int ph_is_approximant(const EST_String &ph); +int ph_is_stop(const EST_String &ph); +int ph_is_nasal(const EST_String &ph); +int ph_is_fricative(const EST_String &ph); +int ph_is_sonorant(const EST_String &ph); +int ph_is_obstruent(const EST_String &ph); +int ph_is_voiced(const EST_String &ph); +int ph_is_sonorant(const EST_String &ph); +int ph_is_syllabic(const EST_String &ph); +int ph_sonority(const EST_String &ph); +EST_String ph_silence(void); + +PhoneSet *phoneset_name_to_set(const EST_String &name); + +#endif + + + diff --git a/src/include/festival.h b/src/include/festival.h new file mode 100644 index 0000000..a8dc707 --- /dev/null +++ b/src/include/festival.h @@ -0,0 +1,167 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Top level .h file: main public functions */ +/*=======================================================================*/ +#ifndef __FESTIVAL_H__ +#define __FESTIVAL_H__ + +#include +#include + +using namespace std; + +#include "EST.h" +#include "EST_cutils.h" +#include "siod.h" + +#include "Phone.h" + +#ifndef streq +#define streq(X,Y) (strcmp(X,Y)==0) +#endif + +struct ModuleDescription; + +/* An iostream for outputing debug messages, switchable */ +/* to /dev/null or cerr */ +extern ostream *cdebug; +#define cwarn cout +extern "C" FILE* stddebug; +extern int ft_server_socket; +extern const char *festival_version; + +/* For server/client */ +#define FESTIVAL_DEFAULT_PORT 1314 +int festival_socket_client(const char *host,int port); +int festival_start_server(int port); + +void festival_initialize(int load_init_files,int heap_size); +void festival_init_lang(const EST_String &language); +int festival_eval_command(const EST_String &expr); +int festival_load_file(const EST_String &fname); +int festival_say_file(const EST_String &fname); +int festival_say_text(const EST_String &text); +int festival_text_to_wave(const EST_String &text,EST_Wave &wave); +void festival_repl(int interactive); +void festival_server_mode(void); +void festival_wait_for_spooler(void); +void festival_tidy_up(); + +/* Never used and conflicts with some external system */ +/* typedef void (*FT_Module)(EST_Utterance &utt); */ + +/* Feature functions */ +void festival_def_nff(const EST_String &name,const EST_String &sname, + EST_Item_featfunc func,const char *doc); +typedef EST_Val (*FT_ff_pref_func)(EST_Item *s,const EST_String &name); +void festival_def_ff_pref(const EST_String &pref,const EST_String &sname, + FT_ff_pref_func func, const char *doc); +EST_Val ffeature(EST_Item *s, const EST_String &name); + +/* proclaim a new module + option Copyright to add to startup banner + description is a computer readable description of the + module + */ +void proclaim_module(const EST_String &name, + const EST_String &banner_copyright, + const ModuleDescription *description = NULL); + +void proclaim_module(const EST_String &name, + const ModuleDescription *description = NULL); + +void init_module_subr(const char *name, LISP (*fcn)(LISP), const ModuleDescription *description); + +/* Some basic functions for accessing structures created by */ +/* various modelling techniques */ +EST_Val wagon_predict(EST_Item *s, LISP tree); +LISP wagon_pd(EST_Item *s, LISP tree); +EST_Val lr_predict(EST_Item *s, LISP lr_model); + +/* Grammar access functions */ +EST_Ngrammar *get_ngram(const EST_String &name, + const EST_String &filename = EST_String::Empty); +EST_WFST *get_wfst(const EST_String &name, + const EST_String &filename = EST_String::Empty); +LISP lisp_wfst_transduce(LISP wfstname, LISP input); + +EST_String map_pos(LISP posmap, const EST_String &pos); +LISP map_pos(LISP posmap, LISP pos); + +/* On error do a longjmp to appropriate place */ +/* This is done as a macro so the compiler can tell its non-returnable */ +#define festival_error() (errjmp_ok ? longjmp(*est_errjmp,1) : festival_tidy_up(),exit(-1)) + +/* Add new (utterance) module */ +void festival_def_utt_module(const char *name, + LISP (*fcn)(LISP), + const char *docstring); + +void utt_cleanup(EST_Utterance &u); // delete all relations +const EST_String utt_iform_string(EST_Utterance &utt); +LISP utt_iform(EST_Utterance &utt); +const EST_String utt_type(EST_Utterance &utt); +void add_item_features(EST_Item *s,LISP features); + +extern const char *festival_libdir; + +// Module specific LISP/etc definitions +void festival_init_modules(void); + +// Some general functions +LISP ft_get_param(const EST_String &pname); + +// SIOD user defined types used by festival + +#define tc_festival_dummyobject tc_application_1 +#define tc_festival_unit tc_application_2 +#define tc_festival_unitdatabase tc_application_3 +#define tc_festival_unitindex tc_application_4 +#define tc_festival_join tc_application_5 +#define tc_festival_schememoduledescription tc_application_6 +#define tc_festival_unitcatalogue tc_application_7 + +// used to recognise our types +#define tc_festival_first_type tc_festival_dummyobject +#define tc_festival_last_type tc_festival_schememoduledescription +#define is_festival_type(X) ((X) >= tc_festival_first_type && (X) <= tc_festival_last_type) + +class UnitDatabase *get_c_unitdatabase(LISP x); + +#define FESTIVAL_HEAP_SIZE 1000000 + +#endif diff --git a/src/include/fngram.h b/src/include/fngram.h new file mode 100644 index 0000000..749635b --- /dev/null +++ b/src/include/fngram.h @@ -0,0 +1,46 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Authors: Alan W Black */ +/* Date : October 1996 */ +/*-----------------------------------------------------------------------*/ +/* LISP level interface to ngrams */ +/* */ +/*=======================================================================*/ + +#ifndef __FNGRAM_H__ +#define __FNGRAM_H__ + +const EST_Ngrammar &get_ngram(const EST_String &name); +const EST_WFST &get_wfst(const EST_String &name); + +#endif // __FNGRAM_H__ diff --git a/src/include/intonation.h b/src/include/intonation.h new file mode 100644 index 0000000..06fa7ea --- /dev/null +++ b/src/include/intonation.h @@ -0,0 +1,51 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Shared intonation utilies */ +/* */ +/*=======================================================================*/ +#ifndef __INTONATION_H__ +#define __INTONATION_H__ + +EST_Item *add_target(EST_Utterance *u, EST_Item *seg, + float pos,float val); +EST_Item *add_IntEvent(EST_Utterance *u, EST_Item *syl, + const EST_String &label); + +#endif /* __INTONATION_H__ */ + + + diff --git a/src/include/lexicon.h b/src/include/lexicon.h new file mode 100644 index 0000000..1bd41b6 --- /dev/null +++ b/src/include/lexicon.h @@ -0,0 +1,130 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Shared lexicon utilities */ +/* Top level form (simply an s-expression of the input form */ +/* */ +/*=======================================================================*/ +#ifndef __LEXICON_H__ +#define __LEXICON_H__ + +#include "EST_Pathname.h" + +enum lex_type_t {lex_external, lex_internal}; +class Lexicon{ + private: + lex_type_t type; + EST_String name; + EST_String ps_name; + LISP addenda; // for personal local changes + LISP posmap; + int comp_num_entries; + EST_Pathname bl_filename; + FILE *binlexfp; + EST_String lts_method; + EST_String lts_ruleset; + int blstart; + LISP index_cache; + void binlex_init(void); + LISP lookup_addenda(const EST_String &word, LISP features); + LISP lookup_complex(const EST_String &word, LISP features); + LISP lookup_lts(const EST_String &word, LISP features); + LISP bl_bsearch(const EST_String &word,LISP features, + int start,int end,int depth); + LISP bl_find_next_entry(int pos); + LISP bl_find_actual_entry(int pos,const EST_String &word,LISP features); + int lex_entry_match; + LISP matched_lexical_entries; +public: + LISP pre_hooks; + LISP post_hooks; + + Lexicon(); + ~Lexicon(); + const EST_String &lex_name() const {return name;} + const EST_String &phoneset_name() const {return ps_name;} + void set_lex_name(const EST_String &p) {name = p;} + const EST_String &get_lex_name(void) const { return name; } + void set_phoneset_name(const EST_String &p) {ps_name = p;} + void set_lts_method(const EST_String &p) {lts_method = p;} + void set_lts_ruleset(const EST_String &p) {lts_ruleset = p;} + void set_pos_map(LISP p) {posmap = p;} + const EST_String &get_lts_ruleset(void) const { return lts_ruleset; } + void set_bl_filename(const EST_String &p) + {bl_filename = p; + if (binlexfp != NULL) fclose(binlexfp); + binlexfp=NULL;} + void add_addenda(LISP entry) {addenda = cons(entry,addenda);} + LISP lookup(const EST_String &word,const LISP features); + LISP lookup_all(const EST_String &word); + EST_String str_lookup(const EST_String &word,const LISP features); + int in_lexicon(const EST_String &word,LISP features); + int num_matches() { return lex_entry_match; } + void bl_lookup_cache(LISP cache, const EST_String &word, + int &start, int &end, int &depth); + void add_to_cache(LISP index_cache, + const EST_String &word, + int start,int mid, int end); + + inline friend ostream& operator<<(ostream& s, Lexicon &p); + + Lexicon & operator =(const Lexicon &a); +}; + +inline ostream& operator<<(ostream& s, Lexicon &p) +{ + s << "[LEXICON " << p.lex_name() << "]" ; + return s; +} + +inline Lexicon &Lexicon::operator = (const Lexicon &a) +{ + name = a.name; + addenda = a.addenda; + bl_filename = a.bl_filename; + binlexfp = NULL; + lts_method = a.lts_method; + return *this; +} + +LISP lex_lookup_word(const EST_String &word,LISP features); +EST_String lex_current_phoneset(void); +LISP lex_select_lex(LISP lexname); +LISP lex_syllabify(LISP phones); +LISP lex_syllabify_phstress(LISP phones); +int in_current_lexicon(const EST_String &word,LISP features); + +#endif /* __LEXICON_H__ */ diff --git a/src/include/module_support.h b/src/include/module_support.h new file mode 100644 index 0000000..990c995 --- /dev/null +++ b/src/include/module_support.h @@ -0,0 +1,108 @@ + /************************************************************************/ + /* */ + /* Centre for Speech Technology Research */ + /* University of Edinburgh, UK */ + /* Copyright (c) 1996,1997 */ + /* All Rights Reserved. */ + /* */ + /* Permission is hereby granted, free of charge, to use and distribute */ + /* this software and its documentation without restriction, including */ + /* without limitation the rights to use, copy, modify, merge, publish, */ + /* distribute, sublicense, and/or sell copies of this work, and to */ + /* permit persons to whom this work is furnished to do so, subject to */ + /* the following conditions: */ + /* 1. The code must retain the above copyright notice, this list of */ + /* conditions and the following disclaimer. */ + /* 2. Any modifications must be clearly marked as such. */ + /* 3. Original authors' names are not deleted. */ + /* 4. The authors' names are not used to endorse or promote products */ + /* derived from this software without specific prior written */ + /* permission. */ + /* */ + /* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ + /* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ + /* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ + /* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ + /* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ + /* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ + /* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ + /* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ + /* THIS SOFTWARE. */ + /* */ + /*************************************************************************/ + /* */ + /* Author: Richard Caley (rjc@cstr.ed.ac.uk) */ + /* Date: Tue Jul 29 1997 */ + /* -------------------------------------------------------------------- */ + /* Some things which are useful in modules. */ + /* */ + /*************************************************************************/ + + +#ifndef __MODULE_SUPPORT_H__ +#define __MODULE_SUPPORT_H__ + +#include "EST.h" +#include "festival.h" +#include "ModuleDescription.h" + +// To extract arguments passed as a list + +void unpack_multiple_args(LISP args, LISP &v1, LISP &v2, LISP &v3, LISP &v4); +void unpack_multiple_args(LISP args, LISP &v1, LISP &v2, LISP &v3, LISP &v4, LISP &v5); + +// To extract arguments for a module, modules are called +// (module-function Utterance StreamName1 StreamName2 StreamName3 ...) + +// To tell the unpacking functions what is expected of the stream. +enum RelArgType { + sat_existing, // must exist + sat_new, // must be new + sat_replace, // erase if there, then create + sat_as_is // take what we find +}; + + +void unpack_relation_arg(EST_Utterance *utt, + LISP lrelation_name, + EST_String &relation_name, EST_Relation *&relation, RelArgType type); + +void unpack_module_args(LISP args, + EST_Utterance *&utt); +void unpack_module_args(LISP args, + EST_Utterance *&utt, + EST_String &relation1_name, EST_Relation *&relation1, RelArgType type1); +void unpack_module_args(LISP args, + EST_Utterance *&utt, + EST_String &relation1_name, EST_Relation *&relation1, RelArgType type1, + EST_String &relation2_name, EST_Relation *&relation2, RelArgType type2 + ); +void unpack_module_args(LISP args, + EST_Utterance *&utt, + EST_String &relation1_name, EST_Relation *&relation1, RelArgType type1, + EST_String &relation2_name, EST_Relation *&relation2, RelArgType type2, + EST_String &relation3_name, EST_Relation *&relation3, RelArgType type3 + ); +void unpack_module_args(LISP args, + EST_Utterance *&utt, + EST_String &relation1_name, EST_Relation *&relation1, RelArgType type1, + EST_String &relation2_name, EST_Relation *&relation2, RelArgType type2, + EST_String &relation3_name, EST_Relation *&relation3, RelArgType type3, + EST_String &relation4_name, EST_Relation *&relation4, RelArgType type4 + ); +void unpack_module_args(LISP args, + EST_Utterance *&utt, + EST_String &relation1_name, EST_Relation *&relation1, RelArgType type1, + EST_String &relation2_name, EST_Relation *&relation2, RelArgType type2, + EST_String &relation3_name, EST_Relation *&relation3, RelArgType type3, + EST_String &relation4_name, EST_Relation *&relation4, RelArgType type4, + EST_String &relation5_name, EST_Relation *&relation5, RelArgType type5 + ); + +LISP lisp_parameter_get(const EST_String parameter); +int int_parameter_get(const EST_String parameter, int def=0); +float float_parameter_get(const EST_String parameter, float def=0.0); +bool bool_parameter_get(const EST_String parameter); +EST_String string_parameter_get(const EST_String parameter, EST_String def=""); + +#endif diff --git a/src/include/modules.h b/src/include/modules.h new file mode 100644 index 0000000..a06dcd4 --- /dev/null +++ b/src/include/modules.h @@ -0,0 +1,57 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : August 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* General functions shared between various simple modules */ +/* e.e. POS, Word, Phrasify etc */ +/* */ +/*=======================================================================*/ +#ifndef __MODULES_H__ +#define __MODULES_H__ + +EST_Item *add_word(EST_Utterance *u, const EST_String &name); +EST_Item *add_word(EST_Utterance *u, LISP word); +EST_Item *add_segment(EST_Utterance *u, const EST_String &s); +EST_Item *add_phrase(EST_Utterance *u); + +void create_phraseinput(EST_Utterance *u); + +#endif /* __MODULES_H__ */ + + + + + + diff --git a/src/include/text.h b/src/include/text.h new file mode 100644 index 0000000..a682447 --- /dev/null +++ b/src/include/text.h @@ -0,0 +1,68 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Shared text utilities */ +/* */ +/*=======================================================================*/ +#ifndef __TEXT_H__ +#define __TEXT_H__ + +EST_Item *add_token(EST_Utterance *u,EST_Token &t); +void festival_token_init(void); +LISP extract_tokens(LISP file, LISP tokens,LISP ofile); +LISP new_token_utt(void); +void tts_file_xxml(LISP filename); +void tts_file_raw(LISP filename); + +LISP xxml_call_element_function(const EST_String &element, + LISP atts, LISP elements, LISP utt); +LISP xxml_get_tokens(const EST_String &line,LISP feats,LISP utt); + +typedef void (*TTS_app_tok)(EST_Item *token); +typedef void (*TTS_app_utt)(LISP utt); + +LISP tts_chunk_stream(EST_TokenStream &ts, + TTS_app_tok app_tok, + TTS_app_utt app_utt, + LISP eou_tree, + LISP utt); + +void tts_file_user_mode(LISP filename, LISP params); + +#endif /* __TEXT_H__ */ + + + diff --git a/src/lib/Makefile b/src/lib/Makefile new file mode 100644 index 0000000..d49e8c7 --- /dev/null +++ b/src/lib/Makefile @@ -0,0 +1,40 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +TOP=../.. +DIRNAME=src/lib +ALL_LIBS = +FILES=Makefile +LOCAL_CLEAN = libFestival.a + +include $(TOP)/config/common_make_rules + diff --git a/src/main/Makefile b/src/main/Makefile new file mode 100644 index 0000000..3db23ab --- /dev/null +++ b/src/main/Makefile @@ -0,0 +1,72 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +TOP=../.. +DIRNAME=src/main + +SRCS = festival_main.cc audsp.cc festival_client.cc +OBJS = $(SRCS:.cc=.o) +FILES=Makefile $(SRCS) +LOCAL_CLEAN = $(ETCDIR)/audsp $(ETCDIR)/.made + +ETCDIR=$(TOP)/lib/etc/$(SYSTEM_TYPE) + +ALL_EXECS = festival festival_client + +ALL = $(ALL_EXECS) make_audiosp + +include $(TOP)/config/common_make_rules +include $(EST)/config/rules/bin_process.mak + +ETCDIR=$(TOP)/lib/etc/$(SYSTEM_TYPE) + +festival: festival_main.o $(LIBDEPS) + $(LINK_COMMAND) -o festival festival_main.o $(LIBS) + +festival_client: festival_client.o $(REQUIRED_LIBDEPS) + $(LINK_COMMAND) -o festival_client festival_client.o $(LIBS) + +$(ETCDIR)/audsp: $(ETCDIR)/.made audsp.o $(LIBDEPS) + $(LINK_COMMAND) -o $(ETCDIR)/audsp audsp.o $(LIBS) + +# Can't just rely on the dir as it gets updated with new files +# check for the data of a file created in etcdir + +make_audiosp: $(ETCDIR)/audsp + @$(DO_NOTHING) + +$(ETCDIR)/.made: + @ if [ ! -d $(ETCDIR) ] ; \ + then mkdir -p $(ETCDIR); fi + @ if [ ! -f $(ETCDIR)/.made ] ; \ + then touch $(ETCDIR)/.made ; fi + diff --git a/src/main/audsp.cc b/src/main/audsp.cc new file mode 100644 index 0000000..c55773a --- /dev/null +++ b/src/main/audsp.cc @@ -0,0 +1,472 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : September 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* An audio file spooler, like lpd. Reads in commands about files to */ +/* to play, and queues them until any previous requests are finished. */ +/* This allows the synthesizer to get on with synthesizing the next */ +/* utterance. */ +/* */ +/* Actually this doesn't use anything in Festival, only the speech_tools */ +/*=======================================================================*/ +#include +#include +#include +#include + +using namespace std; + +#include "EST.h" +#include "EST_unix.h" + +#ifdef NO_SPOOLER + +int main(int argc, char **argv) +{ + + printf("Audio spooler not supported\n"); + return 0; +} + +#else + +class Command { + private: + EST_String p_file; + int p_rate; + public: + Command(const EST_String &f, int rate) { p_file=f; p_rate=rate; } + int rate(void) const { return p_rate; } + const EST_String &file(void) const { return p_file; } +}; + +class CQueue_Item { + public: + Command *c; + CQueue_Item *next; + CQueue_Item(Command *com) { c=com; next=0; } + ~CQueue_Item() { delete c; if (next != 0) delete next; } +}; + +class CQueue { + private: + CQueue_Item *head; + CQueue_Item *tail; + public: + CQueue() { head = tail = 0; } + ~CQueue() { delete head; } + void push(Command *c); + Command *pop(void); + void display(void) const; + int length(void) const; + void clear(void); +}; + +static void auspl_main(int argc, char **argv); +static void check_new_input(void); +static char *read_a_line(void); +static void process_command(char *line); +static void check_new_output(void); +static int execute_command(Command *c); +static void load_play_file(Command *c); +static int sp_terminate(void); +static void tidy_up(void); + +void CQueue::push(Command *c) +{ + // Put this item on tail + CQueue_Item *n = new CQueue_Item(c); + + if (head == 0) + { // first one + head = n; + tail = n; + } + else + { + tail->next = n; + tail = n; + } +} + +Command *CQueue::pop(void) +{ + // Pop top from the queue + + if (head == 0) + return 0; + else + { + Command *c = head->c; + CQueue_Item *h; + h = head; + h->c = 0; + head = head->next; + h->next = 0; + delete h; + return c; + } +} + +void CQueue::display(void) const +{ + CQueue_Item *t; + int i; + + cerr << "Command_queue: " << length() << endl; + for (i=0,t=head; t != 0; t=t->next,i++) + cerr << " " << i << ": " << t->c->file() << endl; +} + +int CQueue::length(void) const +{ + // length of queue + CQueue_Item *t; + int i; + + for (i=0,t=head; t != 0; t=t->next) + i++; + + return i; +} + +void CQueue::clear(void) +{ + // Remove all memebers in the queue + CQueue_Item *t; + + // Somebody has to do it ... + for (t=head; t != 0; t=t->next) + unlink(t->c->file()); + + delete head; + head = 0; + tail = 0; +} + +static int no_more_input = FALSE; +static CQueue command_queue; +static int child_pid = 0; +static EST_String current_file; +static EST_Option play_wave_options; +static int maxqueue = 5; +static int pending_close = FALSE; +static int kids = 0; + +int main(int argc, char **argv) +{ + + auspl_main(argc,argv); + + return 0; +} + +static void auspl_main(int argc, char **argv) +{ + EST_Option al; + EST_StrList files; + + parse_command_line(argc, argv, + EST_String("Usage: audio spooler \n")+ + "auspl ...\n"+ + "--method audio play method\n"+ + "--command Unix command to play file, used when\n"+ + " method is audio_command\n"+ + "--maxqueue {5} Maximum number of files in queue\n", + files, al); + + if (al.present("--method")) + play_wave_options.add_item("-p",al.val("--method")); + if (al.present("--command")) + play_wave_options.add_item("-command",al.val("--command")); + play_wave_options.add_item("-quality","HIGH"); + + if (al.present("--maxqueue")) + maxqueue = al.ival("--maxqueue"); + + while (!sp_terminate()) + { + check_new_input(); + check_new_output(); + } + + tidy_up(); +} + +static int sp_terminate(void) +{ + // I'm never very sure of all the conditions necessary to terminate + + if (no_more_input && (command_queue.length() == 0)) + return TRUE; + else + return FALSE; +} + +static void tidy_up(void) +{ + // should wait for any remaining children if I've been + // requested to. + int pid; + int statusp; + + if (pending_close == TRUE) + { + while (kids > 0) + { + pid = waitpid(0,&statusp,0); + kids--; + } + fprintf(stdout,"OK\n"); // give an acknowledgement + fflush(stdout); + } + + return; +} + +static void check_new_input(void) +{ + // Do a select on stdin to find out if there is any new + // commands to process + fd_set inset; + fd_set outset; + fd_set exset; + struct timeval t; + int sv; + + t.tv_sec = 0; + t.tv_usec = 1000; // 0.1 seconds + + FD_ZERO(&inset); + FD_ZERO(&outset); + FD_ZERO(&exset); + + if ((command_queue.length() >= maxqueue) || + no_more_input) + { + // wait a bit for the queue to go down a bit + // not we're selecting on no fds at all, just for the delay + sv = select(0,&inset,&outset,&exset,&t); + return; + } + + FD_SET(0,&inset); + + sv = select(1,&inset,&outset,&exset,&t); + + if (sv == 1) + process_command(read_a_line()); + else if (sv == -1) + no_more_input = TRUE; +} + +static int getc_unbuffered(int fd) +{ + // An attempted to get rid of the buffering + char c; + int n; + + n = read(fd,&c,1); + + if (n == 0) + return EOF; + else + return c; +} + +static char *read_a_line(void) +{ + // read upto \n on stdin -- wonder if I should read instead + int maxsize = 1024; + char *line = walloc(char,maxsize+2); + int i,c; + + for (i=0; + (((c=getc_unbuffered(0)) != '\n') && + (c != EOF)); + i++) + { + if (i == maxsize) + { + char *nline = walloc(char,maxsize*2); + memcpy(nline,line,maxsize); + maxsize = maxsize*2; + wfree(line); + line = nline; + } + line[i] = c; + } + + line[i] = '\n'; + line[i+1] = '\0'; + if (c == EOF) + no_more_input = TRUE; + + if (strncmp(line,"close",5) != 0) + { + fprintf(stdout,"OK\n"); // give an acknowledgement + fflush(stdout); + } + + return line; +} + +static void process_command(char *line) +{ + // Process command, some are immediate + EST_TokenStream ts; + ts.open_string(line); + EST_String comm = ts.get().string(); + + if ((comm == "quit") || (comm == "")) + { + no_more_input = TRUE; + } + else if (comm == "play") + { + EST_String file = ts.get().string(); + int rate = atoi(ts.get().string()); + Command *c = new Command(file,rate); + command_queue.push(c); + } + else if (comm == "method") + { + play_wave_options.add_item("-p",ts.get().string()); + } + else if (comm == "command") + { + play_wave_options.add_item("-command",ts.get_upto_eoln().string()); + } + else if (comm == "rate") + { + play_wave_options.add_item("-rate",ts.get().string()); + } + else if (comm == "otype") + { + play_wave_options.add_item("-otype",ts.get().string()); + } + else if (comm == "device") + { + play_wave_options.add_item("-audiodevice",ts.get().string()); + } + else if (comm == "close") + { + pending_close = TRUE; + no_more_input = TRUE; + } + else if (comm == "shutup") + { + // clear queue and kill and child currently playing + command_queue.clear(); + if (child_pid != 0) + { + kill(child_pid,SIGKILL); + unlink(current_file); + } + } + else if (comm == "query") + command_queue.display(); + else if (comm != "") + { + cerr << "audsp: unknown command \"" << comm << "\"\n"; + } + + ts.close(); + wfree(line); +} + +static void check_new_output(void) +{ + // If we are not waiting on any children lauch next command + int pid; + int statusp; + + if (kids > 0) + { + pid = waitpid(0,&statusp,WNOHANG); + if (pid != 0) + { + kids--; + child_pid = 0; + } + } + else if (command_queue.length() != 0) + { + Command *c = command_queue.pop(); + if (execute_command(c) == 0) + kids++; + delete c; + } + + // else do nothing +} + +static int execute_command(Command *c) +{ + // Execute the command as a child process + int pid; + + current_file = c->file(); + + if ((pid=fork()) == 0) + { // child process + load_play_file(c); + _exit(0); // don't close any files on exit + return 0; // can't get here + } + else if (pid > 0) + { // parent process + child_pid = pid; + return 0; + } + else + { + cerr << "auspd: fork failed, \"" << c->file() << "\"\n"; + return -1; + } +} + +static void load_play_file(Command *c) +{ + // Load in wave file and play it + + EST_Wave w; + + w.load(c->file()); + play_wave(w,play_wave_options); + unlink(c->file()); // delete it afterwards +} + +#endif diff --git a/src/main/festival_client.cc b/src/main/festival_client.cc new file mode 100644 index 0000000..086f338 --- /dev/null +++ b/src/main/festival_client.cc @@ -0,0 +1,482 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : December 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Client program used to send comands/data to a festival server */ +/* */ +/*=======================================================================*/ + +#include + +using namespace std; + +#include "EST_unix.h" +#include "festival.h" + +#ifdef WIN32 +typedef HANDLE SERVER_FD; +#else +typedef FILE *SERVER_FD; +#endif + +static void festival_client_main(int argc, char **argv); +static void copy_to_server(FILE *fdin,SERVER_FD serverfd); +static void ttw_file(SERVER_FD serverfd, const EST_String &file); +static void client_accept_waveform(SERVER_FD fd); +static void client_accept_s_expr(SERVER_FD fd); +static void new_state(int c, int &state, int &bdepth); + +static EST_String output_filename = "-"; +static EST_String output_type = "riff"; +static EST_String tts_mode = "nil"; +static EST_String prolog = ""; +static int withlisp = FALSE; +static EST_String aucommand = ""; +static int async_mode = FALSE; + +// So that festival_error works (and I don't need the whole of libFestival.a) +void festival_tidy_up() { return; } + +int main(int argc, char **argv) +{ + + festival_client_main(argc,argv); + + return 0; +} + +static void festival_client_main(int argc, char **argv) +{ + EST_Option al; + EST_StrList files; + EST_String server; + int port; + FILE *infd; + + parse_command_line(argc, argv, + EST_String("Usage:\n")+ + "festival_client ...\n"+ + "Access to festival server process\n"+ + "--server hostname (or IP number) of server\n"+ + "--port {1314} port number of server process (1314)\n"+ + "--output file to save output waveform to\n"+ + "--otype {riff}\n" + " output type for waveform\n"+ + "--passwd server passwd in plain text (optional)\n"+ + "--prolog filename containing commands to be sent\n"+ + " to the server before standard commands\n"+ + " (useful when using --ttw)\n"+ + "--async Asynchronous mode, server may send back\n"+ + " multiple waveforms per text file\n"+ + "--ttw Text to waveform: take text from first\n"+ + " arg or stdin get server to return\n"+ + " waveform(s) stored in output or operated\n"+ + " on by aucommand.\n"+ + "--withlisp Output lisp replies from server.\n"+ + "--tts_mode TTS mode for file (default is fundamental).\n"+ + "--aucommand \n"+ + " command to be applied to each\n"+ + " waveform retruned from server. Use $FILE\n"+ + " in string to refer to waveform file\n", + files, al); + + if (al.present("--server")) + server = al.val("--server"); + else + server = "localhost"; + + if (al.present("--port")) + port = al.ival("--port"); + else + port = FESTIVAL_DEFAULT_PORT; + + if (al.present("--tts_mode")) + tts_mode = al.val("--tts_mode"); + + if (al.present("--output")) + output_filename = al.val("--output"); + if (al.present("--otype")) + { + output_type = al.val("--otype"); + if (!output_type.matches(RXalphanum)) + { + cerr << "festival_client: invalid output type \"" + << output_type << "\"" << endl; + exit(-1); + } + } + + // Specify what to do with received waveform + if (al.present("--aucommand")) + aucommand = al.val("--aucommand"); + + if (al.present("--withlisp")) + withlisp = TRUE; + + if (al.present("--async")) + async_mode = TRUE; + else + async_mode = FALSE; + + int fd = festival_socket_client(server,port); +#ifdef WIN32 + HANDLE serverfd = (HANDLE)fd; +#else + FILE *serverfd = fdopen(fd,"wb"); +#endif + + if (al.present("--passwd")) +#ifdef WIN32 + { + DWORD bytes_written; + DWORD passwdlen = (DWORD)strlen(al.val("--passwd"))+1; + char *buffer = new char[passwdlen]; + sprintf(buffer,"%s\n",al.val("--passwd")); + bytes_written = send((SOCKET)serverfd,buffer,passwdlen,0); + if (SOCKET_ERROR == bytes_written || bytes_written < passwdlen) + { + GetLastError(); // at least get the error + cerr << "festival_client: can't send password to server\n"; + } + delete [] buffer; + } +#else + fprintf(serverfd,"%s\n",(const char *)al.val("--passwd")); +#endif + + if (al.present("--prolog")) + { + FILE *pfd = fopen(al.val("--prolog"),"rb"); + if (pfd == NULL) + { + cerr << "festival_client: can't open prolog file \"" + << al.val("--prolog") << "\"" << endl; + exit(-1); + } + copy_to_server(pfd,serverfd); + fclose(pfd); + } + + if (al.present("--ttw")) + ttw_file(serverfd,files.nth(0)); + else + { + if ((files.length() == 0) || (files.nth(0) == "-")) + copy_to_server(stdin,serverfd); + else + { + if ((infd=fopen(files.nth(0),"rb")) == NULL) + { + cerr << "festival_client: can't open \"" << + files.nth(0) << "\"\n"; + exit(-1); + } + copy_to_server(infd,serverfd); + } + } + + return; +} + +static void ttw_file(SERVER_FD serverfd, const EST_String &file) +{ + // text to waveform file. This includes the tts wraparounds for + // the text in file and outputs a waveform in output_filename + // This is done as *one* waveform. This is designed for short + // dialog type examples. If you need spooling this isn't the + // way to do it + EST_String tmpfile = make_tmp_filename(); + FILE *fd, *tfd; + int c; + + if ((fd=fopen(tmpfile,"wb")) == NULL) + { + cerr << "festival_client: can't open tmpfile \"" << + tmpfile << "\"\n"; + exit(-1); + } + // Here we ask for NIST because its a byte order aware headered format + // the eventual desired format might be unheadered and if we asked the + // the server for that we wouldn't know if it required byte swap or + // not. The returned wave data from the server is actually saved + // to a file and read in by EST_Wave::load so NIST is a safe option + // Of course when the wave is saved by the client the requested + // format is respected. + fprintf(fd,"(Parameter.set 'Wavefiletype 'nist)\n"); + if (async_mode) + { // In async mode we need to set up tts_hooks to send back the waves + fprintf(fd,"(tts_return_to_client)\n"); + fprintf(fd,"(tts_text \"\n"); + } + else // do it in one go + fprintf(fd,"(tts_textall \"\n"); + if (file == "-") + tfd = stdin; + else if ((tfd=fopen(file,"rb")) == NULL) + { + cerr << "festival_client: can't open text file \"" << + file << "\"\n"; + exit(-1); + } + + while ((c=getc(tfd)) != EOF) + { + if ((c == '"') || (c == '\\')) + putc('\\',fd); + putc(c,fd); + } + if (file != "-") + fclose(tfd); + + fprintf(fd,"\" \"%s\")\n",(const char *)tts_mode); + + fclose(fd); + + // Now send the file to the server + if ((fd=fopen(tmpfile,"rb")) == NULL) + { + cerr << "festival_client: tmpfile \"" << + tmpfile << "\" mysteriously disappeared\n"; + exit(-1); + } + copy_to_server(fd,serverfd); + fclose(fd); + unlink(tmpfile); +} + +static void copy_to_server(FILE *fdin,SERVER_FD serverfd) +{ + // Open a connection and copy everything from stdin to + // server + int c,n; + int state=0; + int bdepth=0; + char ack[4]; + + while((c=getc(fdin)) != EOF) + { +#ifdef WIN32 + DWORD bytes_pending; + n = send((SOCKET)serverfd,(const char *)&c,1,0); + if (SOCKET_ERROR == n || 0 == n) + { + if (SOCKET_ERROR == n) + GetLastError(); + cerr << "festival_client: couldn't copy to server\n"; + } +#else + putc(c,serverfd); +#endif + new_state(c,state,bdepth); + + if (state == 1) + { + state = 0; +#ifndef WIN32 + fflush(serverfd); +#endif + do { +#ifdef WIN32 + { + for (n=0; n < 3; ) + { + int bytes_read = recv((SOCKET)serverfd,ack+n,3-n,0); + if (SOCKET_ERROR == bytes_read) + { + GetLastError(); + cerr << "festival_client: error reading from server\n"; + } + else n+= bytes_read; + } + } +#else + for (n=0; n < 3; ) + n += read(fileno(serverfd),ack+n,3-n); +#endif + ack[3] = '\0'; + if (streq(ack,"WV\n")) // I've been sent a waveform + client_accept_waveform(serverfd); + else if (streq(ack,"LP\n")) // I've been sent an s-expr + { + client_accept_s_expr(serverfd); + } + else if (streq(ack,"ER\n")) + { + cerr << "festival server error: reset to top level\n"; + break; + } + } while (!streq(ack,"OK\n")); + } + } +} + +static void new_state(int c, int &state, int &bdepth) +{ + // FSM (plus depth) to detect end of s-expr + + if (state == 0) + { + if ((c == ' ') || (c == '\t') || (c == '\n') || (c == '\r')) + state = 0; + else if (c == '\\') // escaped character + state = 2; + else if (c == ';') + state = 3; // comment + else if (c == '"') + state = 4; // quoted string + else if (c == '(') + { + bdepth++; + state = 5; + } + else + state = 5; // in s-expr + } + else if (state == 2) + state = 5; // escaped character + else if (state == 3) + { + if (c == '\n') + state = 5; + else + state = 3; + } + else if (state == 4) + { + if (c == '\\') + state = 6; + else if (c == '"') + state = 5; + else + state = 4; + } + else if (state == 6) + state = 4; + else if (state == 5) + { + if ((c == ' ') || (c == '\t') || (c == '\n') || (c == '\r')) + { + if (bdepth == 0) + state = 1; + else + state = 5; + } + else if (c == '\\') // escaped character + state = 2; + else if (c == ';') + state = 3; // comment + else if (c == '"') + state = 4; // quoted string + else if (c == '(') + { + bdepth++; + state = 5; + } + else if (c == ')') + { + bdepth--; + state = 5; + } + else + state = 5; // in s-expr + } + else // shouldn't get here + state = 5; +} + +static void client_accept_waveform(SERVER_FD fd) +{ + // Read a waveform from fd. The waveform will be passed + // using + EST_String tmpfile = make_tmp_filename(); + EST_Wave sig; + + // Have to copy this to a temporary file, then load it. +#ifdef WIN32 + socket_receive_file((SOCKET)fd,tmpfile); +#else + socket_receive_file(fileno(fd),tmpfile); +#endif + sig.load(tmpfile); + if (aucommand != "") + { + // apply the command to this file + EST_String tmpfile2 = make_tmp_filename(); + sig.save(tmpfile2,output_type); + char *command = walloc(char,1024+tmpfile2.length()+aucommand.length()); + sprintf(command,"FILE=\"%s\"; %s",(const char *)tmpfile2, + (const char *)aucommand); + system(command); + unlink(tmpfile2); + } + else if (output_filename == "") + cerr << "festival_client: ignoring received waveform, no output file" + << endl; + else + sig.save(output_filename,output_type); + unlink(tmpfile); +} + +static void client_accept_s_expr(SERVER_FD fd) +{ + // Read an s-expression. Inefficeintly into a file + EST_String tmpfile = make_tmp_filename(); + FILE *tf; + int c; + + // Have to copy this to a temporary file, then load it. +#ifdef WIN32 + socket_receive_file((SOCKET)fd,tmpfile); +#else + socket_receive_file(fileno(fd),tmpfile); +#endif + + if (withlisp) + { + if (( tf = fopen(tmpfile,"rb")) == NULL) + { + cerr << "festival_client: lost an s_expr tmp file" << endl; + } + else + { + while ((c=getc(tf)) != EOF) + putc(c,stdout); + fclose(tf); + } + } + + unlink(tmpfile); +} + diff --git a/src/main/festival_main.cc b/src/main/festival_main.cc new file mode 100644 index 0000000..fc9bdff --- /dev/null +++ b/src/main/festival_main.cc @@ -0,0 +1,259 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996-1998 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black, Paul Taylor, Richard Caley */ +/* and others */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Top level file for synthesizer */ +/* */ +/*=======================================================================*/ +#include + +using namespace std; + +#include "festival.h" + +static void festival_main(int argc, char **argv); +static int festival_check_script_mode(int argc, char **argv); +static void festival_script_mode(int argc, char **argv); + +void awb_free_diph_index(); + +extern "C" { + void mtrace(); + void muntrace(); +} +int main(int argc, char **argv) +{ + +/* putenv("MALLOC_TRACE=mallfile"); + mtrace(); */ + festival_main(argc,argv); + +/* awb_free_diph_index(); + +muntrace(); */ + return 0; +} + +static void festival_main(int argc, char **argv) +{ + EST_Option al; + int stdin_input,interactive; + EST_Litem *p; + EST_StrList files; + int real_number_of_files = 0; + int heap_size = FESTIVAL_HEAP_SIZE; + + if (festival_check_script_mode(argc,argv) == TRUE) + { // Need to check this directly as in script mode args are + // passed for analysis in the script itself + return; + } + + parse_command_line(argc, argv, + EST_String("Usage:\n")+ + "festival ...\n"+ + "In evaluation mode \"filenames\" starting with ( are evaluated inline\n"+ + "Festival Speech Synthesis System: "+ festival_version +"\n"+ + "-q Load no default setup files\n"+ + "--libdir \n"+ + " Set library directory pathname\n"+ + "-b Run in batch mode (no interaction)\n"+ + "--batch Run in batch mode (no interaction)\n"+ + "--tts Synthesize text in files as speech\n"+ + " no files means read from stdin\n"+ + " (implies no interaction by default)\n"+ + "-i Run in interactive mode (default)\n"+ + "--interactive\n"+ + " Run in interactive mode (default)\n"+ + "--pipe Run in pipe mode, reading commands from\n"+ + " stdin, but no prompt or return values\n"+ + " are printed (default if stdin not a tty)\n"+ + "--language \n"+ + " Run in named language, default is\n"+ + " english, spanish and welsh are available\n"+ + "--server Run in server mode waiting for clients\n"+ + " of server_port (1314)\n"+ + "--script \n"+ + " Used in #! scripts, runs in batch mode on\n"+ + " file and passes all other args to Scheme\n"+ + "--heap {1000000}\n"+ + " Set size of Lisp heap, should not normally need\n"+ + " to be changed from its default\n"+ + "-v Display version number and exit\n"+ + "--version Display version number and exit\n", + files, al); + + if ((al.present("-v")) || (al.present("--version"))) + { + printf("%s: Festival Speech Synthesis System: %s\n", + argv[0],festival_version); + exit(0); + } + + if (al.present("--libdir")) + festival_libdir = wstrdup(al.val("--libdir")); + else if (getenv("FESTLIBDIR") != 0) + festival_libdir = getenv("FESTLIBDIR"); + if (al.present("--heap")) + heap_size = al.ival("--heap"); + + festival_initialize(!al.present("-q"),heap_size); + + if (al.present("--language")) + festival_init_lang(al.val("--language")); + + // File processing + for (p=files.head(); p != 0; p=p->next()) + { + if (files(p) == "-") // paul thinks I want the "-" -- I don't + continue; + real_number_of_files++; + if (al.present("--tts")) + { + if (!festival_say_file(files(p))) + festival_error(); + } + else if (files(p).matches(make_regex("^(.*"))) + { + if (!festival_eval_command(files(p))) + festival_error(); // fail if it fails + } + else if (!festival_load_file(files(p))) + festival_error(); + } + + // What to do about standard input and producing prompts etc. + if ((al.present("-i")) || (al.present("--interactive"))) + { + interactive = TRUE; + stdin_input = TRUE; + } + else if ((al.present("--pipe"))) + { + interactive=FALSE; + stdin_input = TRUE; + } + else if ((al.present("-b")) || (al.present("--batch")) || + (al.present("--tts"))) + { + interactive=FALSE; + stdin_input=FALSE; + } + else if (isatty(0)) // if stdin is a terminal assume interactive + { + interactive = TRUE; + stdin_input = TRUE; + } + else // else assume pipe mode + { + interactive = FALSE; + stdin_input = TRUE; + } + + if (al.present("--server")) + festival_server_mode(); // server mode + else if ((al.present("--tts")) && (real_number_of_files == 0)) + festival_say_file("-"); // text to speech from files + else if (stdin_input) + festival_repl(interactive); // expect input from stdin + + if (al.present("--tts")) + festival_wait_for_spooler(); // wait for end of audio output + + return; +} + +static int festival_check_script_mode(int argc, char **argv) +{ + // Checks if we are in script mode, i.e. if --script exists + // which may be possibily be preceeded by --libdir (and heap ?) + // + + if (argc == 0) + return FALSE; + else if ((argc > 2) && (streq("--script",argv[1]))) + { + if (getenv("FESTLIBDIR") != 0) + festival_libdir = getenv("FESTLIBDIR"); + festival_script_mode(argc,argv); + return TRUE; + } + else if ((argc > 4) && (streq("--script",argv[3])) + && (streq("--libdir",argv[1]))) + { + festival_libdir = wstrdup(argv[2]); + festival_script_mode(argc,argv); + return TRUE; + } + else + return FALSE; +} + +static void festival_script_mode(int argc, char **argv) +{ + // In script mode the first file arg after -script is interpreted and + // the remainder are set in the variable argv so the script + // itself may do what ever it wants + LISP args; + const char *siodheapsize; + int i; + + if (argc < 2) + { + cerr << "festival: script_mode has no file to interpret" << endl; + return; + } + + // initialize without loading init files + siodheapsize = getenv("SIODHEAPSIZE"); + if (siodheapsize) + festival_initialize(FALSE,atoi(siodheapsize)); + else + festival_initialize(FALSE,FESTIVAL_HEAP_SIZE); + + for (args=NIL,i=3; i < argc; i++) + args = cons(rintern(argv[i]),args); + + siod_set_lval("argv",reverse(args)); + siod_set_lval("argc",flocons(argc)); + + festival_load_file(argv[2]); + + return; +} + + + diff --git a/src/modules/Duration/Klatt.cc b/src/modules/Duration/Klatt.cc new file mode 100644 index 0000000..26bd7fe --- /dev/null +++ b/src/modules/Duration/Klatt.cc @@ -0,0 +1,428 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Paul Taylor */ +/* Date : July 1995 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Klatt Duration Rules */ +/* */ +/*=======================================================================*/ + +/* +This is an implementation of the Klatt rule system as described in +chapter 9 of "From text to speech: The MITalk system", Allen, +Hunnicutt and Klatt. + +The function klatt_seg_dur() calculates a duration for each +segment in the input. It does this by calling a number +of rules (named 1 to 11) as defined in the MITalk book. Most +rules return a number which modifies the inherenent duration of +each segment. The original rules are set up so as to return +a percentage, here the system retursn a floating point value +which I think is neater. +*/ + +#include +#include "festival.h" +#include "durationP.h" + +static void klatt_dur_debug(EST_Item *s); + +static float rule2(EST_Item *seg); +static float rule3(EST_Item *seg); +static float rule4(EST_Item *seg); +static float rule5(EST_Item *seg); +static float rule6(EST_Item *seg); +static float rule7(EST_Item *seg); +static float rule8(EST_Item *seg); +static float rule9(EST_Item *seg); +static float rule10(EST_Item *seg); +static float rule9a(EST_Item *seg); +static float sub_rule9a(const EST_String &ph); + +static int klatt_seg_dur(EST_Item *seg); +static float min_dur(EST_Item *s_seg); +static float inher_dur(EST_Item *s_seg); + +int onset(EST_Item *seg); + +static LISP klatt_params = NIL; +static int debug = 0; + +LISP FT_Duration_Klatt_Utt(LISP utt) +{ + // Predict fixed duration on segments + EST_Utterance *u = get_c_utt(utt); + EST_Item *s; + + *cdebug << "Duration Klatt module\n"; + + klatt_params = siod_get_lval("duration_klatt_params", + "no klatt duration params"); + + for (s=u->relation("Segment")->first(); s != 0; s = s->next()) + klatt_seg_dur(s); + + return utt; +} + +static int klatt_seg_dur(EST_Item *seg) +{ + float min; + float fact = 1.0; + float start, dur; + float duration_speed = dur_get_stretch_at_seg(seg); + + start = ffeature(seg,"segment_start"); + + if (ph_is_silence(seg->name())) + dur = 0.250 * duration_speed; + else + { + if (debug) klatt_dur_debug(seg); + fact *= rule2(seg) * rule3(seg) * rule4(seg) * rule5(seg) + * rule6(seg) * rule7(seg) * rule8(seg) * + rule9(seg) * rule10(seg); + + min = (rule7(seg) != 1.0) ? min_dur(seg)/2: min_dur(seg); + + dur = ((((inher_dur(seg) - min) * fact) + min) / 1000.0) + * duration_speed; + } + + seg->set("end",start + dur); + + return 0; +} + +static float min_dur(EST_Item *seg) +{ + LISP p = siod_assoc_str(seg->name(),klatt_params); + + if (p == NIL) + { + cerr << "Klatt_Duration: no minimum duration for \"" << seg->name() + << "\"\n"; + festival_error(); + } + + return get_c_float(car(cdr(cdr(p)))); +} + +static float inher_dur(EST_Item *seg) +{ + LISP p = siod_assoc_str(seg->name(),klatt_params); + + if (p == NIL) + { + cerr << "Klatt_Duration: no minimum duration for \"" << seg->name() + << "\"\n"; + festival_error(); + } + + return get_c_float(car(cdr(p))); +} + +static int word_final(EST_Item *seg) +{ + // True if this segment is the last in a word + EST_Item *nn = seg->as_relation("SylStructure"); + + if (nn->next() || (parent(nn)->next())) + return FALSE; + else + return TRUE; +} + +static int syl_final(EST_Item *seg) +{ + // True if this segment is the last in a syllable + EST_Item *nn = seg->as_relation("SylStructure"); + + if (nn->next()) + return FALSE; + else + return TRUE; +} + +static int word_initial(EST_Item *seg) +{ + // True if this segment is the first in a word + EST_Item *nn = seg->as_relation("SylStructure"); + + if (nn->prev() || parent(nn)->prev()) + return FALSE; + else + return TRUE; +} + +static int phrase_initial(EST_Item *seg) +{ + // True if this segment is the first in a phrase + + if (word_initial(seg)) + { + EST_Item *nn = parent(parent(seg,"SylStructure")); + if (as(nn,"Phrase")->prev()) + return FALSE; + else + return TRUE; + } + return + FALSE; +} + +int onset(EST_Item *seg) +{ + if (ffeature(seg,"onsetcoda") == "onset") + return 1; + else + return 0; +} + +int coda(EST_Item *seg) +{ + if (ffeature(seg,"onsetcoda") == "coda") + return 1; + else + return 0; +} + +static float rule2(EST_Item *seg) +{ // clause final lengthening + + if (coda(seg)) + { + int b = ffeature(seg,"R:SylStructure.parent.syl_break"); + if ((b > 1) && (b < 4)) + return 1.4; + } + return 1.0; + +} + +static float rule3(EST_Item *seg) +{ // Non-phrase-final shortening + // syllabic segments are shortened by 60 if not in a phrase-final syllable + int b = ffeature(seg,"R:SylStructure.parent.syl_break"); + + if ((b < 2) && ph_is_syllabic(seg->name())) + return 0.6; + + // A phrase-final postvocalic liquid or nasal is lengthened by 140 + if ((b == 4) && (ph_is_liquid(seg->name()) || ph_is_nasal(seg->name()))) + return(1.4); + + return 1.0; +} + +static float rule4(EST_Item *seg) +{ // Non-word-final shortening + int b = ffeature(seg,"R:SylStructure.parent.syl_break"); + + // Syllabic segments are shortened by 85 if not in a word-final syllable + if ((b == 0) && ph_is_syllabic(seg->name())) + return(0.85); + + return 1.0; +} + +static float rule5(EST_Item *seg) +{ // Polysyllabic Shortening + int num_syls = ffeature(seg,"R:SylStructure.parent.parent.num_syls"); + + // Syllabic segments in a polysyllabic word are shortened by 80. + if ((num_syls > 1) && ph_is_syllabic(seg->name())) + return 0.8; + + return 1.0; +} + +static float rule6(EST_Item *seg) +{ // Non-initial-consonant shortening + + if (!word_initial(seg) && (ph_is_consonant(seg->name()))) + return 0.85; + + return 1.0; +} + +static float rule7(EST_Item *seg) +{ // Unstressed shortening + + if (ffeature(seg,"R:SylStructure.parent.stress") == 1) + return 1.0; + + if (ph_is_syllabic(seg->name())) + { + if (word_initial(seg) || word_final(seg)) + return 0.7; + else + return 0.5; + } + + if (onset(seg) && ph_is_liquid(seg->name())) // or glide... + return 0.1; + + return 0.7; +} + +// Lengthening for emphasis +static float rule8(EST_Item *seg) +{ + + if (!ph_is_vowel(seg->name())) + return 1.0; + + if (ffeature(seg,"R:SylStructure.parent.accented") == 1) + return 1.4; + + return 1.0; +} + +// this is really rule 9b, but its eaiser to make it call rule 9a + +static float rule9(EST_Item *seg) +{ // Postvocalic context of vowels */ + int b = ffeature(seg,"R:SylStructure.parent.syl_break"); + + if (b > 1) + return (0.7 + (0.3 * rule9a(seg))); + else + return rule9a(seg); +} + + +static float rule9a(EST_Item *seg) +{ // Postvocalic context of vowels + EST_Item *s_next,*s_next_next; + + if (ph_is_vowel(seg->name())) + { + if (syl_final(seg)) + return 1.2; + s_next = seg->next(); + if ((s_next) && (syl_final(s_next))) + return sub_rule9a(s_next->name()); + s_next_next = s_next->next(); + if ((ph_is_sonorant(s_next->name())) && + (s_next_next) && + (ph_is_obstruent(s_next_next->name()))) + return sub_rule9a(s_next_next->name()); + } + else if (onset(seg)) + return 1.0; + else if (ph_is_sonorant(seg->name())) + { + if (syl_final(seg)) + return 1.2; + s_next = seg->next(); + if (ph_is_obstruent(s_next->name())) + return sub_rule9a(s_next->name()); + } + + return 1.0; +} + +// sub rule, independent of seg position +static float sub_rule9a(const EST_String &ph) +{ + if (ph_is_voiced(ph)) + { + if (ph_is_fricative(ph)) + return 1.6; + else if (ph_is_stop(ph)) + return 1.2; + else if (ph_is_nasal(ph)) + return 0.85; + else + return 1.0; + } + else if (ph_is_stop(ph)) + return 0.7; + else + return 1.0; +} + +// Shortening in clusters + +static float rule10(EST_Item *seg) +{ + int b = ffeature(seg,"R:SylStructure.parent.syl_break"); + + if (syl_final(seg) && (b > 1)) + return 1.0; + else + { + if (ph_is_vowel(seg->name())) + { + if (ph_is_vowel(seg->next()->name())) + return 1.20; + else if ((!phrase_initial(seg)) && + (ph_is_vowel(seg->prev()->name()))) + return 0.70; + else + return 1.0; + } + else if (ph_is_consonant(seg->next()->name())) + if (!phrase_initial(seg) && + (ph_is_consonant(seg->prev()->name()))) + return 0.5; + else + return 0.7; + else if (!phrase_initial(seg) && + (ph_is_consonant(seg->prev()->name()))) + return 0.7; + } + + return 1.0; +} + + +static void klatt_dur_debug(EST_Item *seg) +{ + float f; + if ((f = rule2(seg))!= 1.0) cout << "Fired rule 2 " << f << endl; + if ((f = rule3(seg))!= 1.0) cout << "Fired rule 3 " << f << endl; + if ((f = rule4(seg))!= 1.0) cout << "Fired rule 4 " << f << endl; + if ((f = rule5(seg))!= 1.0) cout << "Fired rule 5 " << f << endl; + if ((f = rule6(seg))!= 1.0) cout << "Fired rule 6 " << f << endl; + if ((f = rule7(seg))!= 1.0) cout << "Fired rule 7 " << f << endl; + if ((f = rule8(seg))!= 1.0) cout << "Fired rule 8 " << f << endl; + if ((f = rule9(seg))!= 1.0) cout << "Fired rule 9 " << f << endl; + if ((f = rule10(seg))!= 1.0) cout << "Fired rule 10" << f << endl; + + return; +} + + diff --git a/src/modules/Duration/Makefile b/src/modules/Duration/Makefile new file mode 100644 index 0000000..6f92849 --- /dev/null +++ b/src/modules/Duration/Makefile @@ -0,0 +1,53 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +# Contains implementation of various duration prediction # +# modules # +# # +########################################################################### +TOP=../../.. +DIRNAME=src/modules/Duration +H = durationP.h +SRCS = dur_aux.cc duration.cc Klatt.cc +OBJS = $(SRCS:.cc=.o) + +FILES = Makefile $(SRCS) $(H) + +LOCAL_INCLUDES = -I../include + +INLIB = $(TOP)/src/lib/libFestival.a + +ALL = .buildlib + +include $(TOP)/config/common_make_rules + + diff --git a/src/modules/Duration/dur_aux.cc b/src/modules/Duration/dur_aux.cc new file mode 100644 index 0000000..a506f5c --- /dev/null +++ b/src/modules/Duration/dur_aux.cc @@ -0,0 +1,130 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Basic duration utilities common between different methods */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "durationP.h" + +float dur_get_stretch(void) +{ + LISP lstretch = ft_get_param("Duration_Stretch"); + float stretch; + + if (lstretch == NIL) + stretch = 1.0; + else + stretch = get_c_float(lstretch); + if (stretch < 0.1) + { + cerr << "Duration_Stretch: is too small (" << stretch << + ") ingnoring it\n"; + stretch = 1.0; + } + + return stretch; +} + +float dur_get_stretch_at_seg(EST_Item *s) +{ + float global_stretch = dur_get_stretch(); + EST_Item *nn = parent(parent(parent(s,"SylStructure")),"Token"); + EST_Item *syl = parent(s,"SylStructure"); + float local_stretch = 0.0; + float syl_stretch = 0.0; + float seg_stretch = 0.0; + float stretch = 1.0; + + if (nn) + local_stretch = ffeature(nn,"dur_stretch").Float(); + if (syl) + syl_stretch = ffeature(syl,"dur_stretch").Float(); + seg_stretch = ffeature(s,"dur_stretch").Float(); + if (local_stretch != 0.0) + stretch *= local_stretch; + if (syl_stretch != 0.0) + stretch *= syl_stretch; + if (seg_stretch != 0.0) + stretch *= seg_stretch; + + return stretch*global_stretch; + +} + +void festival_Duration_init(void) +{ + festival_def_utt_module("Duration_Averages",FT_Duration_Ave_Utt, + "(Duration_Averages UTT)\n\ + Label all segments with their average duration found from the assoc\n\ + list of phone names to averages in phoneme_durations. This module is\n\ + called through the module Duration when the Parameter Duration_Method\n\ + is set to Averages. [see Average durations]"); + festival_def_utt_module("Duration_Default",FT_Duration_Def_Utt, + "(Duration_Default UTT)\n\ + Label all segments with a fixed duration of 100ms. This module is\n\ + called through the module Duration when the Parameter Duration_Method\n\ + is unset or set to Default. [see Default durations]"); + festival_def_utt_module("Duration_Tree_ZScores", + FT_Duration_Tree_ZScores_Utt, + "(Duration_Tree_ZScores UTT)\n\ + Uses the CART tree in duration_cart_tree to predict z scores duration\n\ + values for each segment in UTT. The z scores are converted back to\n\ + absolute values by the assoc list of phones to means and standard\n\ + deviations in the variable duration_ph_info. This module is called\n\ + through the module Duration when the Parameter Duration_Method is set\n\ + to Tree_ZScores. This method modifies its predicted durations by the\n\ + factor set in the Parameter Duration_Stretch (if set).\n\ + [see CART durations]"); + festival_def_utt_module("Duration_Tree",FT_Duration_Tree_Utt, + "(Duration_Tree UTT)\n\ + Uses the CART tree in duration_cart_tree to predict absolute durations\n\ + for each segment in UTT. This module is called through the module\n\ + Duration when the Parameter Duration_Method is set to Tree. This\n\ + method modifies its predicted durations by the factor set in the\n\ + Parameter Duration_Stretch (if set). [see CART durations]"); + festival_def_utt_module("Duration_Klatt",FT_Duration_Klatt_Utt, + "(Duration_Klatt UTT)\n\ + This uses an implementation of the Klatt Duration rules to predict\n\ + durations for each segment in UTT. It uses the information in\n\ + duration_klatt_params for mean and lower bound for each phone. This\n\ + module is called through the module Duration when the Parameter\n\ + Duration_Method is set to Klatt. This method modifies its predicted \n\ + durations by the factor set in the Parameter Duration_Stretch (if set).\n\ + [see Klatt durations]"); + +} diff --git a/src/modules/Duration/duration.cc b/src/modules/Duration/duration.cc new file mode 100644 index 0000000..82de805 --- /dev/null +++ b/src/modules/Duration/duration.cc @@ -0,0 +1,179 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Duration averages and default and tree */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "durationP.h" + +LISP FT_Duration_Ave_Utt(LISP utt) +{ + // Predict average duration on segments + EST_Utterance *u = get_c_utt(utt); + EST_Item *s; + float end=0.0, dur; + LISP ph_durs,ldur; + float stretch; + + *cdebug << "Duration Average module\n"; + + ph_durs = siod_get_lval("phoneme_durations","no phoneme durations"); + + for (s=u->relation("Segment")->first(); s != 0; s = s->next()) + { + ldur = siod_assoc_str(s->name(),ph_durs); + stretch = dur_get_stretch_at_seg(s); + if (ldur == NIL) + { + cerr << "Phoneme: " << s->name() << " have no default duration " + << endl; + dur = 0.100; + } + else + dur = get_c_float(car(cdr(ldur))); + end += (dur*stretch); + s->set("end",end); + } + + return utt; +} + +LISP FT_Duration_Def_Utt(LISP utt) +{ + // Predict fixed duration on segments + EST_Utterance *u = get_c_utt(utt); + EST_Item *s; + float end=0.0; + float stretch; + + *cdebug << "Duration Default module\n"; + + for (s=u->relation("Segment")->first(); s != 0; s = s->next()) + { + stretch = dur_get_stretch_at_seg(s); + end += 0.100*stretch; + s->set("end",end); + } + + return utt; +} + +LISP FT_Duration_Tree_Utt(LISP utt) +{ + // Predict duration on segments using CART tree + EST_Utterance *u = get_c_utt(utt); + EST_Item *s; + float end=0.0, dur,stretch; + LISP tree; + EST_Val pdur; + + *cdebug << "Duration Tree module\n"; + + tree = siod_get_lval("duration_cart_tree","no duration cart tree"); + + for (s=u->relation("Segment")->first(); s != 0; s = s->next()) + { + pdur = wagon_predict(s,tree); + stretch = dur_get_stretch_at_seg(s); + if (pdur == 0.0) + { + cerr << "Phoneme: " << s->name() << " tree predicted 0.0 changing it" + << endl; + dur = 0.050; + } + else + dur = (float)pdur; + dur *= stretch; + end += dur; + s->set("end",end); + } + + return utt; +} + +#define PH_AVE(X) (get_c_float(car(cdr(X)))) +#define PH_STD(X) (get_c_float(car(cdr(cdr(X))))) + +LISP FT_Duration_Tree_ZScores_Utt(LISP utt) +{ + // Predict duration on segments using CART tree + EST_Utterance *u = get_c_utt(utt); + EST_Item *s; + float end=0.0, dur,stretch; + LISP tree,dur_info,ph_info; + float pdur; + float ave, std; + + *cdebug << "Duration Tree ZScores module\n"; + + tree = siod_get_lval("duration_cart_tree","no duration cart tree"); + dur_info = siod_get_lval("duration_ph_info","no duration phone info"); + + for (s=u->relation("Segment")->first(); s != 0; s = s->next()) + { + pdur = wagon_predict(s,tree); + ph_info = siod_assoc_str(s->name(),dur_info); + stretch = dur_get_stretch_at_seg(s); + if (ph_info == NIL) + { + cerr << "Phoneme: " << s->name() << " has no duration info\n"; + ave = 0.080; + std = 0.020; + } + else + { + ave = PH_AVE(ph_info); + std = PH_STD(ph_info); + } + if ((pdur > 3) || (pdur < -3)) + { + // cerr << "Duration tree extreme for " << s->name() << + // " " << pdur << endl; + pdur = ((pdur < 0) ? -3 : 3); + } + s->set("dur_factor",pdur); + dur = ave + (pdur*std); + dur *= stretch; + if (dur < 0.010) + dur = 0.010; // just in case it goes wrong + end += dur; + s->set("end",end); + } + + return utt; +} diff --git a/src/modules/Duration/durationP.h b/src/modules/Duration/durationP.h new file mode 100644 index 0000000..d66a261 --- /dev/null +++ b/src/modules/Duration/durationP.h @@ -0,0 +1,55 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Shared duration utilities */ +/* */ +/*=======================================================================*/ +#ifndef __DURATION_H__ +#define __DURATION_H__ + +LISP FT_Duration_Ave_Utt(LISP args); +LISP FT_Duration_Def_Utt(LISP args); +LISP FT_Duration_Tree_Utt(LISP args); +LISP FT_Duration_Tree_ZScores_Utt(LISP args); +LISP FT_Duration_Klatt_Utt(LISP args); + +float dur_get_stretch(void); +float dur_get_stretch_at_seg(EST_Item *s); + +#endif /* __DURATION_H__ */ + + + diff --git a/src/modules/Intonation/Makefile b/src/modules/Intonation/Makefile new file mode 100644 index 0000000..4d6b586 --- /dev/null +++ b/src/modules/Intonation/Makefile @@ -0,0 +1,53 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +# # +# Contains implementations of various intonation theories # +# # +########################################################################### +TOP=../../.. +DIRNAME=src/modules/Intonation +H = +SRCS = int_aux.cc duffint.cc simple.cc gen_int.cc int_tree.cc +OBJS = $(SRCS:.cc=.o) + +FILES = Makefile $(SRCS) $(H) + +LOCAL_INCLUDES = -I../include + +INLIB = $(TOP)/src/lib/libFestival.a + +ALL = .buildlib + +include $(TOP)/config/common_make_rules + + diff --git a/src/modules/Intonation/duffint.cc b/src/modules/Intonation/duffint.cc new file mode 100644 index 0000000..805747f --- /dev/null +++ b/src/modules/Intonation/duffint.cc @@ -0,0 +1,120 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Duff intonation */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "intonation.h" + +LISP FT_Intonation_Default_Utt(LISP utt) +{ + return utt; +} + +LISP FT_Int_Targets_Default_Utt(LISP utt) +{ + // Predict intonation from labels etc (producing F0 contour) + EST_Utterance *u = get_c_utt(utt); + EST_Item *s; + EST_Relation *seg; + LISP params; + float start,end; + + *cdebug << "Intonation duff module\n"; + + // should create some random targets + params = siod_get_lval("duffint_params",NULL); + start = get_param_float("start",params,130.0); + end = get_param_float("end",params,110.0); + u->create_relation("Target"); + + seg = u->relation("Segment"); + + if (seg->length() == 0) + return utt; + + add_target(u,seg->first(),0,start); + s = seg->last(); + add_target(u,s,(float)ffeature(s,"segment_end"),end); + + return utt; +} + +LISP FT_Int_Targets_Relation_Utt(LISP utt, LISP relname) +{ + // Predict intonation from labels etc (producing F0 contour) + EST_Utterance *u = get_c_utt(utt); + EST_Track *pm = 0; + LISP params; + float start,end; + int n_frames; + + *cdebug << "Intonation duff module\n"; + + // should create some random targets + params = siod_get_lval("duffint_params",NULL); + start = get_param_float("start",params,130.0); + end = get_param_float("end",params,110.0); + + pm = track(u->relation(get_c_string(relname))->head()->f("coefs")); + + float pp = 1.0/start; + // float end_time = ((float)pm->num_frames()) * pp; + float end_time = pm->end(); + + n_frames = (int)(ceil)(end_time/pp); + cout << "n_frames: " << n_frames << endl; + cout << "end_time: " << end_time << endl; + + EST_Track *f0 = new EST_Track; + f0->resize(n_frames, 1); + f0->fill_time(0.01); + + float m = (end-start) /end_time; + float c = start; + + for (int i = 0; i < n_frames; ++ i) + f0->a(i) = (m * ((float) i) * 0.01) + c; + + u->create_relation("f0"); + EST_Item *item = u->relation("f0")->append(); + item->set_val("f0", est_val(f0)); + + return utt; +} + diff --git a/src/modules/Intonation/gen_int.cc b/src/modules/Intonation/gen_int.cc new file mode 100644 index 0000000..6a111c3 --- /dev/null +++ b/src/modules/Intonation/gen_int.cc @@ -0,0 +1,131 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* A general intonation method for implementing various simple rule */ +/* intonation systems. It allows a list of targets to be predicted */ +/* in a way fully specified by the user without changing the C/C++ code */ +/* This was specifically designed to replace the simple intonation mode */ +/* monotone mode, and implemented generic ToBI type labels. */ +/* */ +/* This was to help Gregor Moehler do German ToBI as well as get a */ +/* we can use for a rule-based English ToBI for comparison with trained */ +/* versions */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "intonation.h" + +static void check_targs(EST_Utterance *u); +static EST_Item *find_nearest_seg(EST_Utterance *u,float pos); + +LISP FT_Int_Targets_General_Utt(LISP utt) +{ + // Predict F0 targets + EST_Utterance *u = get_c_utt(utt); + EST_Item *s; + EST_Item *seg; + EST_Relation *targrel; + LISP gen_params, targets, t; + LISP tfunc; // a lisp function that returns list of targets and values + + // Create some down step accents + gen_params = siod_get_lval("int_general_params", + "no general intonation simple params"); + tfunc = get_param_lisp("targ_func",gen_params,NIL); + if (tfunc == NIL) + { + cerr << "Int Target General: no target function specified" << endl; + festival_error(); + } + + targrel = u->create_relation("Target"); + + for (s=u->relation("Syllable")->first(); s != 0 ; s=s->next()) + { + targets = + leval(cons(tfunc,cons(utt,cons(siod(s),NIL))),NIL); + // Add the given targets + for (t=targets; t != NIL; t=cdr(t)) + { + seg = find_nearest_seg(u,get_c_float(car(car(t)))); + add_target(u,seg,get_c_float(car(car(t))), + get_c_float(car(cdr(car(t))))); + } + } + + check_targs(u); + + return utt; +} + +static EST_Item *find_nearest_seg(EST_Utterance *u,float pos) +{ + // Find the segment that this target falls within. + // This naively searchs from the start of the segments, + // this is not very efficient + EST_Item *seg; + + for (seg=u->relation("Segment")->first(); seg != 0;seg=seg->next()) + { + if (seg->F("end") >= pos) + return seg; + } + + cerr << "Int Target General: target past end of segments at " << + pos << endl; + festival_error(); + return NULL; +} + +static void check_targs(EST_Utterance *u) +{ + // Check targets are in order + EST_Item *t; + float l = 0.0; + + for (t=u->relation("Target")->first_leaf(); t != 0;t=next_leaf(t)) + { + if (t->F("pos") < l) + { + cerr << "Int Target General: targets out of order" << endl; + festival_error(); + } + l = t->F("pos"); + } +} + + diff --git a/src/modules/Intonation/int_aux.cc b/src/modules/Intonation/int_aux.cc new file mode 100644 index 0000000..9d6618d --- /dev/null +++ b/src/modules/Intonation/int_aux.cc @@ -0,0 +1,232 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Basic intonation utilities common between different */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "intonation.h" +#include "modules.h" +#include "lexicon.h" + +static EST_String IntEventname("IntEvent"); +static EST_String Targetname("Target"); + +EST_Item *add_target(EST_Utterance *u,EST_Item *seg, + float pos, float val) +{ + + // Check time is NOT the same as the last target, as this causes problems... + + float last_time; + EST_Item* last_item = u->relation(Targetname)->last_leaf(); + if (last_item) + last_time = last_item->f("pos"); + else + last_time = -1.0; // no last time. + + if(last_time == pos) + { + pos += 0.001; + *cdebug << "Repeated f0 target time, fix your generation function!\n"; + } + + if (seg->as_relation(Targetname) == 0) + u->relation(Targetname)->append(seg); + EST_Item *item = append_daughter(seg,Targetname); + + item->set("f0",val); + item->set("pos",pos); + + return item; + +} + + +EST_Item *add_IntEvent(EST_Utterance *u,EST_Item *syl, + const EST_String &label) +{ + if (syl->as_relation("Intonation") == 0) + u->relation("Intonation")->append(syl); + EST_Item *item = u->relation(IntEventname)->append(); + item->set_name(label); + append_daughter(syl,"Intonation",item); + return item; +} + +void targets_to_f0(EST_Relation &targ, EST_Track &f0, const float shift) +{ + float prev_f0=0.0; + float prev_pos=0, m; + EST_Item *s; + int i; + + f0.resize(int(ceil(targ.last_leaf()->F("pos",0) / shift)), 1); + f0.fill_time(shift); + + s = targ.first_leaf(); + + // fill with zeros until first target; + for (i = 0; i < f0.num_frames(); ++i) + { + if (f0.t(i) > s->F("pos",0)) + break; + f0.a(i) = 0.0; + } + + prev_pos = s->F("pos",0); + prev_f0 = s->F("f0",0); + + s = next_leaf(s); + + for (m=0.0,i = 0; i < f0.num_frames(); ++i) + { + if (s && f0.t(i) > s->F("pos")) + { + prev_pos = s->F("pos"); + prev_f0 = s->F("f0"); + s = next_leaf(s); + if (s == 0) + break; + m = (s->F("f0") - prev_f0)/ (s->F("pos") - prev_pos); + } + f0.a(i) = (m * (f0.t(i) - prev_pos)) + prev_f0; + } + + for ( ; i < f0.num_frames(); ++i) + f0.a(i) = 0.0; + +} + +LISP FT_us_targets_to_f0(LISP lutt) +{ + EST_Utterance *utt = get_c_utt(lutt); + EST_Track *f0 = new EST_Track; + + utt->create_relation("f0"); + EST_Item *f = utt->relation("f0")->append(); + + f->set("name", "f0"); + f->set_val("f0", est_val(f0)); + + targets_to_f0(*utt->relation("Target"), *f0, 0.01); + + return lutt; +} + + +LISP FT_Intonation_Default_Utt(LISP args); +LISP FT_Int_Targets_Default_Utt(LISP args); +LISP FT_Intonation_Simple_Utt(LISP args); +LISP FT_Int_Targets_Simple_Utt(LISP args); +LISP FT_Intonation_Tree_Utt(LISP args); +LISP FT_Int_Targets_LR_Utt(LISP args); +LISP FT_Int_Targets_LR_5_Utt(LISP args); +LISP FT_Int_Targets_General_Utt(LISP utt); +LISP FT_Int_Targets_Relation_Utt(LISP utt, LISP relname); + +void festival_Intonation_init(void) +{ + + festival_def_utt_module("Intonation_Default",FT_Intonation_Default_Utt, + "(Intonation_Default UTT)\n\ + this method is such a bad intonation module that it does nothing at all.\n\ + This utterance module is called when the Parameter Int_Method is not\n\ + set or set to Default. This module is called through the Intonation\n\ + module. [see Default intonation]"); + + init_subr_2("Int_Targets_Relation", FT_Int_Targets_Relation_Utt, + "(Int_Targets_Relation UTT)"); + + init_subr_1("targets_to_f0", FT_us_targets_to_f0, + "(targets_to_f0 UTT)\n\ + Make f0 relation, and place an f0 contour in it, using F0 targets\n\ + from the Target Relation\n"); + + festival_def_utt_module("Int_Targets_Default",FT_Int_Targets_Default_Utt, + "(Int_Targets_Default UTT)\n\ + This module creates two Targets causing a simple downward continuous\n\ + F0 through the whole utterance. The code is in an appropriate named file\n\ + called duffint. This module is called when the Parameter\n\ + Int_Method is not set or set to Default. This module is called through\n\ + the Int_Targets module. Optional parameters for a start value (default\n\ + 130) and end value (default 110) may be set in the variable\n\ + diffint_params. This can be used to generate a monotone intonation\n\ + with a setting like (set! duffint_params '((start 100) (end 100))).\n\ + [see Default intonation]"); + festival_def_utt_module("Intonation_Simple",FT_Intonation_Simple_Utt, + "(Intonation_Simple)\n\ + Assign accents to each content word, creating an IntEvent stream. This \n\ + utterance module is called when the Parameter Int_Method is set to \n\ + Simple. This module is called through the Intonation module.\n\ + [see Simple intonation]"); + festival_def_utt_module("Int_Targets_Simple",FT_Int_Targets_Simple_Utt, + "(Int_Targets_Simple UTT)\n\ + Naively add targets for hat shaped accents for each accent in the \n\ + IntEvent stream. This module is called when the Parameter Int_Method is\n\ + set to Simple. This module is called through the Int_Targets module.\n\ + [see Simple intonation]"); + festival_def_utt_module("Int_Targets_General",FT_Int_Targets_General_Utt, + "(Int_Targets_General UTT)\n\ + Add targets based on the functions defined in int_general_params. This\n\ + method allows quite detailed control over the general of targets per\n\ + syllable, see manual for details and examples. This module is called\n\ + when the Parameter Int_Method is set to General. This module is called\n\ + through the Int_Targets module. [see General intonation]"); + festival_def_utt_module("Intonation_Tree",FT_Intonation_Tree_Utt, + "(Intonation_Tree UTT)\n\ + Use the CART trees in int_tone_cart_tree and int_accent_cart_tree to\n\ + create an IntEvent stream of tones and accents related to syllables.\n\ + This module is called through the Intonation module and is selected\n\ + when the Parameter Int_Method is ToBI. [see Tree intonation]"); + festival_def_utt_module("Int_Targets_LR",FT_Int_Targets_LR_Utt, + "(Int_Targets_LR UTT)\n\ + Predict Target F0 points using linear regression from factors such as\n\ + accent, tone, stress, position in phrase etc. This utterance module is\n\ + called through the module Int_Targets when the Parameter Int_Method is\n\ + set to ToBI, even though this technique is not restricted to the ToBI\n\ + labelling system. [see Tree intonation]"); + festival_def_utt_module("Int_Targets_5_LR",FT_Int_Targets_LR_5_Utt, + "(Int_Targets_5_LR UTT)\n\ + Predict Target F0 points using linear regression from factors such as\n\ + accent, tone, stress, position in phrase etc. This utterance module is\n\ + called through the module Int_Targets when the Parameter Int_Method is\n\ + set to ToBI, even though this technique is not restricted to the ToBI\n\ + labelling system. [see Tree intonation]"); + +} + diff --git a/src/modules/Intonation/int_tree.cc b/src/modules/Intonation/int_tree.cc new file mode 100644 index 0000000..70b5af3 --- /dev/null +++ b/src/modules/Intonation/int_tree.cc @@ -0,0 +1,420 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : May 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Tree-based prediction of intonation. Uses accent and end */ +/* tone prediction trees, could be ToBI could be something */ +/* else, its up to the trees to decide ... */ +/* */ +/* Accents and boundaries are predicted by CART tree while */ +/* the F0 targets are predicted by linear regression (as */ +/* described in Black and Hunt ICSLP96) */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "intonation.h" + +enum lr_tpos {tp_start, tp_left, tp_mid, tp_right, tp_end}; + +static EST_String accent_specified(EST_Item *s); +static EST_String tone_specified(EST_Item *s); +static int after_pause(EST_Item *s); +static int before_pause(EST_Item *s); +static EST_Item *vowel_seg(EST_Item *syl); +static void init_int_lr_params(void); +static void add_target_at(EST_Utterance *u, EST_Item *seg, + float val,lr_tpos pos); +static float apply_lr_model(LISP model, EST_FVector &feats); +static void find_feat_values(EST_Item *s, LISP model,EST_FVector &feats); + +static LISP Intonation_Endtone_Tree_Utt(LISP utt); // ... mh 99-08-06 +static LISP Intonation_Accent_Tree_Utt(LISP utt); + +static float target_f0_mean = 0.0; +static float target_f0_std = 1.0; +static float model_f0_mean = 0.0; +static float model_f0_std = 1.0; + +#define MZSCORE(X) (((X)-model_f0_mean)/model_f0_std) +#define UNTZSCORE(X) (((X)*target_f0_std)+target_f0_mean) +#define MAP_F0(X) (UNTZSCORE(MZSCORE(X))) + +LISP FT_Intonation_Tree_Utt(LISP utt) +{ + // For each syllable predict intonation events. Potentially + // two forms, accents and ent tones + EST_Utterance *u = get_c_utt(utt); + + u->create_relation("IntEvent"); + u->create_relation("Intonation"); + + utt = Intonation_Endtone_Tree_Utt(utt); + utt = Intonation_Accent_Tree_Utt(utt); + + return utt; +} + +LISP Intonation_Accent_Tree_Utt(LISP utt) +{ + // For each syllable predict intonation events. + // here only accents + EST_Utterance *u = get_c_utt(utt); + EST_Item *s; + EST_String paccent; + LISP accent_tree; + + accent_tree = siod_get_lval("int_accent_cart_tree","no accent tree"); + + for (s=u->relation("Syllable")->first(); s != 0; s=s->next()) + { + if ((paccent = accent_specified(s)) == "0") // check if pre-specified + paccent = (EST_String)wagon_predict(s,accent_tree); + if (paccent != "NONE") + add_IntEvent(u,s,paccent); + } + return utt; +} + +LISP Intonation_Endtone_Tree_Utt(LISP utt) +{ + // For each syllable predict intonation events. + // here only endtones + EST_Utterance *u = get_c_utt(utt); + EST_Item *s; + EST_String ptone; + LISP endtone_tree; + + endtone_tree = siod_get_lval("int_tone_cart_tree","no tone cart tree"); + + for (s=u->relation("Syllable")->first(); s != 0; s=s->next()) + { + if ((ptone = tone_specified(s)) == "0") + ptone = (EST_String)wagon_predict(s,endtone_tree); + if (ptone != "NONE") + add_IntEvent(u,s,ptone); + } + return utt; +} + +static EST_String accent_specified(EST_Item *s) +{ + // If there is an explicit accent specifed on the related token + // If there is check the syllable to see if its stress or a singleton + EST_Item *word = parent(s,"SylStructure"); + if (!word) return "0"; + EST_Item *token = parent(word,"Token"); + EST_String paccent("0"); + if (token) + paccent = (EST_String)ffeature(token,"accent"); + + if (paccent == "0") + { + paccent = (EST_String)ffeature(word,"accent"); + if (paccent == "0") + return paccent; + } + if (ffeature(s,"stress") == "1") + { // only goes on first stressed syllable + EST_Item *p; + for (p=as(s,"SylStructure")->prev(); p != 0; p=p->prev()) + if (ffeature(s,"stress") == "1") + return "NONE"; // specified but not on this syllable + return paccent; // first stressed syl in word + } + else if (daughter1(word)->length() == 1) + return paccent; + else + return "NONE"; // pre-specified but inappropriate syllable in word +} + +static EST_String tone_specified(EST_Item *s) +{ + // If there is an explicit accent specifed on the related token + // If there is check the syllable to see if its strees or a singleton + EST_Item *ss = s->as_relation("SylStructure"); + EST_Item *word = parent(ss); + if (!word) return "0"; + EST_Item *token = parent(word,"Token"); + EST_String ptone("0"); + if (token) + ptone = (EST_String)ffeature(token,"tone"); + + if (ptone == "0") + { + ptone = (EST_String)ffeature(word,"tone"); + if (ptone == "0") + return ptone; + } + if (ss->next() == 0) // final syllable in word + return ptone; + else + return "NONE"; // pre-specified but inappropriate syllable in word +} + +LISP FT_Int_Targets_LR_Utt(LISP utt) +{ + // Predict F0 targets using Linear regression + EST_Utterance *u = get_c_utt(utt); + EST_Item *s; + float pstart, pmid, pend; + LISP start_lr, mid_lr, end_lr; + + init_int_lr_params(); + // Note the models must *all* be the same size + start_lr = siod_get_lval("f0_lr_start","no f0 start lr model"); + mid_lr = siod_get_lval("f0_lr_mid","no f0 mid lr model"); + end_lr = siod_get_lval("f0_lr_end","no f0 end lr model"); + + u->create_relation("Target"); + pend = 0; + EST_FVector feats; + feats.resize(siod_llength(start_lr)); + + for (s=u->relation("Syllable")->first(); s != 0; s=s->next()) + { + find_feat_values(s,start_lr,feats); + pstart = apply_lr_model(start_lr,feats); + pstart = MAP_F0(pstart); + if (after_pause(s)) + add_target_at(u,daughter1(s,"SylStructure"),pstart,tp_start); + else + add_target_at(u,daughter1(s,"SylStructure"), + (pstart+pend)/2.0,tp_start); + + pmid = apply_lr_model(mid_lr,feats); + pmid = MAP_F0(pmid); + add_target_at(u,vowel_seg(s),pmid,tp_mid); + + pend = apply_lr_model(end_lr,feats); + pend = MAP_F0(pend); + if (before_pause(s)) + add_target_at(u,daughtern(s,"SylStructure"),pend,tp_end); + } + + return utt; + +} + +LISP FT_Int_Targets_LR_5_Utt(LISP utt) +{ + // Predict F0 targets using Linear regression + // This version uses 5 points rather than 3. + EST_Utterance *u = get_c_utt(utt); + EST_Item *s; + float pstart, pleft, pmid, pright, pend; + LISP start_lr, left_lr, mid_lr, right_lr, end_lr; + + init_int_lr_params(); + // Note the models must *all* be the same size + start_lr = siod_get_lval("f0_lr_start","no f0 start lr model"); + left_lr = siod_get_lval("f0_lr_left","no f0 left lr model"); + mid_lr = siod_get_lval("f0_lr_mid","no f0 mid lr model"); + right_lr = siod_get_lval("f0_lr_right","no f0 right lr model"); + end_lr = siod_get_lval("f0_lr_end","no f0 end lr model"); + + u->create_relation("Target"); + pend = 0; + EST_FVector feats; + feats.resize(siod_llength(start_lr)); + + for (s=u->relation("Syllable")->first(); s != 0; s=s->next()) + { + find_feat_values(s,start_lr,feats); + pstart = apply_lr_model(start_lr,feats); + pstart = MAP_F0(pstart); + if (after_pause(s)) + add_target_at(u,daughter1(s,"SylStructure"),pstart,tp_start); + else + add_target_at(u,daughter1(s,"SylStructure"), + (pstart+pend)/2.0,tp_start); + + pleft = apply_lr_model(left_lr,feats); + pleft = MAP_F0(pleft); + add_target_at(u,vowel_seg(s),pleft,tp_left); + pmid = apply_lr_model(mid_lr,feats); + pmid = MAP_F0(pmid); + add_target_at(u,vowel_seg(s),pmid,tp_mid); + pright = apply_lr_model(right_lr,feats); + pright = MAP_F0(pright); + add_target_at(u,vowel_seg(s),pright,tp_right); + + pend = apply_lr_model(end_lr,feats); + pend = MAP_F0(pend); + if (before_pause(s)) + add_target_at(u,daughtern(s,"SylStructure"),pend,tp_end); + } + + return utt; + +} + + +#define FFEATURE_NAME(X) (get_c_string(car(X))) +#define FFEATURE_WEIGHT(X) (get_c_float(car(cdr(X)))) +#define FFEATURE_MAPCLASS(X) (car(cdr(cdr(X)))) + +static void find_feat_values(EST_Item *s, LISP model,EST_FVector &feats) +{ + EST_Val v = 0.0; + int i; + LISP f; + const char *ffeature_name, *last_name=""; + + feats[0] = 1; + for (i=1,f=cdr(model); CONSP(f); f=CDR(f),i++) + { + ffeature_name = FFEATURE_NAME(CAR(f)); + if (!streq(ffeature_name,last_name)) + v = ffeature(s,ffeature_name); + if (siod_llength(CAR(f)) == 3) + { // A map class is specified + if (siod_member_str(v.string(),FFEATURE_MAPCLASS(CAR(f))) != NIL) + feats[i] = 1; + else + feats[i] = 0; + } + else + feats[i] = (float)v; + last_name = ffeature_name; + } +} + +static float apply_lr_model(LISP model, EST_FVector &feats) +{ + float answer = FFEATURE_WEIGHT(car(model)); + int i; + LISP f; + + for(i=1,f=cdr(model); iF("end")), + val); + else if (pos == tp_end) + add_target(u,seg,seg->F("end"),val); + else + { + cerr << "add_target_at: unknown position type\n"; + festival_error(); + } +} + +static int after_pause(EST_Item *s) +{ + // TRUE if segment immediately previous to this is a silence + EST_Item *p; + if (s->prev() == 0) + return TRUE; + EST_Item *ss = s->as_relation("SylStructure"); + if (s->prev() == ss->prev()) + return FALSE; + + p = daughter1(ss)->as_relation("Segment")->prev(); + if (p == 0) + return TRUE; + else if (ph_is_silence(p->name())) + return TRUE; + else + return FALSE; +} + +static int before_pause(EST_Item *s) +{ + // TRUE is segment immediately after this is a silence + if (s->next() == 0) + return TRUE; + EST_Item *ss = s->as_relation("SylStructure"); + EST_Item *n = daughtern(ss)->as_relation("Segment")->next(); + if (ph_is_silence(n->name())) + return TRUE; + else + return FALSE; +} + +static EST_Item *vowel_seg(EST_Item *syl) +{ + // return related to vowel segment + EST_Item *p; + + for (p=daughter1(syl,"SylStructure"); p != 0; p=p->next()) + if (ph_is_vowel(p->name())) + return p; + + // No vowel found, so return first daughter. + return daughter1(syl,"SylStructure"); +} + + diff --git a/src/modules/Intonation/simple.cc b/src/modules/Intonation/simple.cc new file mode 100644 index 0000000..2905b7a --- /dev/null +++ b/src/modules/Intonation/simple.cc @@ -0,0 +1,144 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black (and Paul Taylor) */ +/* Date : April 199[4|6] */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Simple intonation prediction: a hat shape on each content word */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "intonation.h" + +static void add_targets(EST_Utterance *u,EST_Item *syl, + float baseline,float peak); + +LISP FT_Intonation_Simple_Utt(LISP utt) +{ + // Predict some accents + EST_Utterance *u = get_c_utt(utt); + EST_Item *s; + LISP accent_tree; + EST_Val paccent; + + *cdebug << "Simple intonation module" << endl; + + accent_tree = siod_get_lval("int_accent_cart_tree","no accent tree"); + + u->create_relation("IntEvent"); + u->create_relation("Intonation"); + + for (s=u->relation("Syllable")->first(); s != 0; s = s->next()) + { + paccent = wagon_predict(s,accent_tree); + if (paccent != "NONE") + add_IntEvent(u,s,paccent.string()); + } + + return utt; +} + +LISP FT_Int_Targets_Simple_Utt(LISP utt) +{ + // Predict F0 targets + EST_Utterance *u = get_c_utt(utt); + EST_Item *s, *p, start_word, end_word; + float start,end,duration; + float baseline, decline; + EST_Item *start_syl, *end_syl; + LISP simple_params; + float f0_mean, f0_std; + + *cdebug << "Simple int targets module" << endl; + + // Create some down step accents + simple_params = siod_get_lval("int_simple_params","no simple params"); + f0_mean = get_param_int("f0_mean",simple_params,110); + f0_std = get_param_int("f0_std",simple_params,25); + + u->create_relation("Target"); + + for (p=u->relation("Phrase")->first(); p != 0 ; p=p->next()) + { + baseline = f0_mean + (f0_std * 0.6); + start = ffeature(p,"R:Phrase.daughter1.word_start"); + end = ffeature(p,"R:Phrase.daughtern.word_end"); + duration = end - start; + decline = f0_std / duration; + start_syl = daughter1(daughter1(p),"SylStructure"); + end_syl = daughtern(daughtern(p),"SylStructure"); + + if (start_syl) + add_target(u,daughter1(start_syl,"SylStructure"), + ffeature(start_syl,"R:SylStructure.daughter1.segment_start"), + baseline); + for (s=start_syl->as_relation("Syllable"); s != end_syl->next(); + s = s->next()) + { + if (ffeature(s,"accented") == 1) + add_targets(u,s,baseline,f0_std); + baseline -= decline*(ffeature(s,"syllable_duration").Float()); + } + + if (end_syl) + add_target(u,daughtern(end_syl,"SylStructure"), + ffeature(end_syl,"R:SylStructure.daughtern.segment_end"), + f0_mean-f0_std); + } + + return utt; +} + +static void add_targets(EST_Utterance *u,EST_Item *syl, + float baseline,float peak) +{ + // Add a down stepped accent at this point + EST_Item *first_seg = daughter1(syl,"SylStructure"); + EST_Item *end_seg = daughter1(syl,"SylStructure"); + EST_Item *t=0,*vowel_seg; + + add_target(u,first_seg,ffeature(first_seg,"segment_start"),baseline); + + vowel_seg = end_seg; // by default + for (t = first_seg; t != 0; t = t->next()) + if (ph_is_vowel(t->name())) + { + vowel_seg = t; + break; + } + add_target(u,vowel_seg,ffeature(vowel_seg,"segment_mid"),baseline+peak); + add_target(u,end_seg,ffeature(end_seg,"segment_end"),baseline); +} + + diff --git a/src/modules/Lexicon/Makefile b/src/modules/Lexicon/Makefile new file mode 100644 index 0000000..979fcad --- /dev/null +++ b/src/modules/Lexicon/Makefile @@ -0,0 +1,53 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +# # +# Contains implementation of a general lexicon system # +# # +########################################################################### +TOP=../../.. +DIRNAME=src/modules/Lexicon +H = lts.h lexiconP.h +SRCS = lex_aux.cc lexicon.cc lts.cc lts_rules.cc complex.cc lex_ff.cc +OBJS = $(SRCS:.cc=.o) + +FILES=Makefile $(SRCS) $(H) + +LOCAL_INCLUDES = -I../include + +INLIB = $(TOP)/src/lib/libFestival.a + +ALL = .buildlib + +include $(TOP)/config/common_make_rules + + diff --git a/src/modules/Lexicon/complex.cc b/src/modules/Lexicon/complex.cc new file mode 100644 index 0000000..48d1972 --- /dev/null +++ b/src/modules/Lexicon/complex.cc @@ -0,0 +1,221 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : June 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Compile a lexicon from set of exntries */ +/* */ +/*=======================================================================*/ +#include +#include +#include "festival.h" +#include "lexicon.h" +#include "lexiconP.h" +#include "lts.h" + +static LISP check_and_fix(LISP entry); +static void check_sylphones(const char *name,LISP syls); + +struct LIST_ent_struct { + EST_String word; + char *pos; + char *entry; + LIST_ent_struct *next; +}; +typedef LIST_ent_struct *LIST_ent; + +static LISP lex_lts_set = NIL; +static LISP lex_syllabification = NIL; + +int entry_compare(const void *e1, const void *e2) +{ + LIST_ent le1 = *((LIST_ent *)e1); + LIST_ent le2 = *((LIST_ent *)e2); + int rcode; + + if ((rcode = fcompare(le1->word,le2->word)) != 0) + return rcode; + else if ((rcode = strcmp(le1->pos,le2->pos)) != 0) + return rcode; + else if ((rcode = strcmp(le1->word,le2->word)) != 0) + return rcode; + else + { // its a homograph but to ensure its in the same order + // not matter which machine we're on + return strcmp(le1->entry,le2->entry); + } +} + +LISP lexicon_compile(LISP finname, LISP foutname) +{ + // Take a file of entries and process them checking phones + // syllabifying if necessary. Sorts them and writes them to + // fout + FILE *fin, *fout; + LISP entry; + LIST_ent entries = NULL,e; + LIST_ent *ent_list; + int num_entries=0,i; + EST_String tmpname; + + if ((fin=fopen(get_c_string(finname),"rb")) == NULL) + { + cerr << "Lexicon compile: unable to open " << get_c_string(finname) + << " for reading\n"; + festival_error(); + } + + lex_lts_set = siod_get_lval("lex_lts_set",NULL); + lex_syllabification = siod_get_lval("lex_syllabification",NULL); + + while (!siod_eof((entry = lreadf(fin)))) + { + e = new LIST_ent_struct; + *cdebug << "Processing entry " << get_c_string(car(entry)) << + endl; + entry = check_and_fix(entry); + e->word = get_c_string(car(entry)); + e->pos = wstrdup(siod_sprint(car(cdr(entry)))); + e->entry = wstrdup(siod_sprint(entry)); + e->next = entries; + entries = e; + num_entries++; + } + fclose(fin); + + // Make it into an array for sorting + ent_list = new LIST_ent[num_entries]; + for (i=0,e=entries; i < num_entries; i++,e=e->next) + ent_list[i] = e; + qsort(ent_list,num_entries,sizeof(LIST_ent),entry_compare); + + if ((fout=fopen(get_c_string(foutname),"wb")) == NULL) + { + cerr << "Lexicon compile: unable to open " << get_c_string(foutname) + << " for writing\n"; + fclose(fin); + festival_error(); + } + fprintf(fout,"MNCL\n"); + for (i=0; i < num_entries; i++) + { + fprintf(fout,"%s\n",ent_list[i]->entry); + wfree(ent_list[i]->pos); + wfree(ent_list[i]->entry); + delete ent_list[i]; + } + delete ent_list; + fclose(fout); + + cwarn << "Compiled lexicon \"" << get_c_string(finname) << + "\" into \"" << get_c_string(foutname) << "\" " << + num_entries << " entries\n"; + + return NIL; +} + +static LISP check_and_fix(LISP entry) +{ + // Check shape and that phones are in current phone set + // Syllabify entry if required + LISP syls; + + if (siod_llength(entry) < 2) + { + cerr << "Lexicon compile: entry: "; + lprint(entry); + cerr << "has too few fields\n"; + festival_error(); + } + else if (CONSP(car(entry))) + { + cerr << "Lexicon compile: entry: "; + lprint(entry); + cerr << "has non-atomic head word\n"; + festival_error(); + } + // else if (CONSP(car(cdr(entry)))) // The lookup code allows for this so why not allow it here. + // { + // cerr << "Lexicon compile: entry: "; + // lprint(entry); + // cerr << "has non-atomic pos field\n"; + // festival_error(); + // } + + if ((lex_syllabification == NIL) && + (siod_atomic_list(car(cdr(cdr(entry)))))) + { + // syllabify them (in an old special way) + LISP phones = car(cdr(cdr(entry))); + if (lex_lts_set != NIL) + phones = lts_apply_ruleset(phones,lex_lts_set); + syls = lex_syllabify_phstress(phones); + check_sylphones(get_c_string(car(entry)),syls); + } + else if ((lex_syllabification != NIL) && + (atomp(lex_syllabification)) && + (streq(get_c_string(lex_syllabification),"NONE"))) + syls = car(cdr(cdr(entry))); + else + syls = apply_hooks(lex_syllabification,car(cdr(cdr(entry)))); + + return cons(car(entry),cons(car(cdr(entry)), + cons(syls,cdr(cdr(cdr(entry)))))); +} + +static void check_sylphones(const char *name,LISP syls) +{ + // check shape of syllables, and that phones are valid + LISP s,p; + + for (s=syls; s != NIL; s=cdr(s)) + { + if (siod_llength(car(s)) != 2) + { + cerr << "Malformed lexical entry: \"" << name << + "\" syllable malformed\n"; + festival_error(); + } + if (!siod_atomic_list(car(car(s)))) + { + cerr << "Malformed lexical entry: \"" << name << + "\" syllable phone list malformed\n"; + festival_error(); + } + for (p=car(car(s)); p != NIL; p=cdr(p)) + ; + } +} + + diff --git a/src/modules/Lexicon/lex_aux.cc b/src/modules/Lexicon/lex_aux.cc new file mode 100644 index 0000000..91fb8e8 --- /dev/null +++ b/src/modules/Lexicon/lex_aux.cc @@ -0,0 +1,193 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Basic lexicon utilities */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "lexicon.h" +#include "lexiconP.h" + +static void split_stress(LISP phones, LISP &phs, LISP &stresses); +static char *v_stress(const char *ph,int &stress); +static int syl_contains_vowel(LISP phones); +static int syl_breakable(LISP syl, LISP rest); + +LISP lex_syllabify(LISP phones) +{ + /* Given a simple list of phones, syllabify them and add stress */ + LISP syl,syls,p; + int stress = 1; + + for (syl=NIL,syls=NIL,p=phones; p != NIL; p=cdr(p)) + { + syl = cons(car(p),syl); + if (syl_breakable(syl,cdr(p))) + { + syls = cons(cons(reverse(syl),cons(flocons(stress),NIL)),syls); + stress = 0; + syl = NIL; + } + } + + return reverse(syls); +} + +LISP lex_syllabify_phstress(LISP phones) +{ + /* Given a list of phones where vowels may have stress numeral, */ + /* as found in BEEP and CMU syllabify them */ + LISP syl,syls,p,phs,stresses,s; + int stress = 0; + const char *ph; + + split_stress(phones,phs,stresses); + + for (syl=NIL,syls=NIL,p=phs,s=stresses; + p != NIL; + p=cdr(p),s=cdr(s)) + { + ph = get_c_string(car(p)); + if (!streq(ph,ph_silence())) + syl = cons(car(p),syl); + if (car(s) && (!streq(get_c_string(car(s)),"0"))) + stress = 1; // should worry about 2 stress too + if (streq(ph,ph_silence()) || syl_breakable(syl,cdr(p))) + { + syls = cons(cons(reverse(syl),cons(flocons(stress),NIL)),syls); + stress = 0; + syl = NIL; + } + } + + return reverse(syls); +} + +static void split_stress(LISP phones, LISP &phs, LISP &stresses) +{ + // unpack the list of phones. When they come from certain types + // of lexical entries (CMU, BEEP) vowels may have a 1 or 2 at their + // end to denote stress. + // This returns two list of equal length, one with the phones and + // one with nils (for each phone) except when there is an explicit + // stress number + LISP p,np,ns; + char *nph; + int stress; + + for (p=phones,np=ns=NIL; p != NIL; p=cdr(p)) + { + stress = 0; + nph = v_stress(get_c_string(car(p)),stress); + if (streq(nph,"-")) // a break of some sort + np = cons(rintern(ph_silence()),np); + else + np = cons(rintern(nph),np); + wfree(nph); + if (stress != 0) + ns = cons(flocons(stress),ns); + else + ns = cons(NIL,ns); + } + + phs = reverse(np); + stresses = reverse(ns); +} + +static char *v_stress(const char *ph,int &stress) +{ + // Checks to see if final character is a numeral, if so treats + // is as stress value. + char *nph; + + if ((ph[strlen(ph)-1] == '1') || + (ph[strlen(ph)-1] == '2') || + (ph[strlen(ph)-1] == '0')) + { + stress = ph[strlen(ph)-1]-'0'; + nph = wstrdup(ph); + nph[strlen(ph)-1] = '\0'; + return nph; + } + else + return wstrdup(ph); + +} + +static int syl_breakable(LISP syl, LISP rest) +{ + if (rest == NIL) + return TRUE; + else if (!syl_contains_vowel(rest)) + return FALSE; // must be a vowel remaining in rest + else if (syl_contains_vowel(syl)) + { + if (ph_is_vowel(get_c_string(car(rest)))) + return TRUE; + else if (cdr(rest) == NIL) + return FALSE; + int p = ph_sonority(get_c_string(car(syl))); + int n = ph_sonority(get_c_string(car(rest))); + int nn = ph_sonority(get_c_string(car(cdr(rest)))); + + if ((p <= n) && (n <= nn)) + return TRUE; + else + return FALSE; + } + else + return FALSE; +} + +static int syl_contains_vowel(LISP phones) +{ + // So we can support "vowels" like ah2, oy2 (i.e. vowels with + // stress markings) we need to make this a hack. Vowels are + // assumed to start with one of aiueo + LISP p; + + for (p=phones; p !=NIL; p=cdr(p)) + if (strchr("aiueoAIUEO",get_c_string(car(p))[0]) != NULL) + return TRUE; + else if (ph_is_vowel(get_c_string(car(p)))) + return TRUE; + else if (ph_is_silence(get_c_string(car(p)))) + return FALSE; + + return FALSE; +} + diff --git a/src/modules/Lexicon/lex_ff.cc b/src/modules/Lexicon/lex_ff.cc new file mode 100644 index 0000000..ba03275 --- /dev/null +++ b/src/modules/Lexicon/lex_ff.cc @@ -0,0 +1,306 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1998 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : May 1998 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Word based ffeature functions */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "lexiconP.h" + +static EST_String Phrase("Phrase"); +static EST_Val f_content("content"); +static EST_Val f_string0("0"); +static EST_Val f_string1("1"); + +static EST_Val ff_word_gpos(EST_Item *s) +{ + /* Part of speech by guessing, returns, prep, det, aux, content */ + /* from simple lookup list */ + EST_String word; + LISP l; + LISP guess_pos; + + word = downcase(s->name()); + + guess_pos = siod_get_lval("guess_pos","no guess_pos set"); + + for (l=guess_pos; l != NIL; l=cdr(l)) + if (siod_member_str(word,cdr(car(l)))) + return EST_Val(get_c_string(car(car(l)))); + + return f_content; +} + +EST_Val ff_word_contentp(EST_Item *s) +{ + /* 1 if this is a content word, 0 otherwise */ + + if (ff_word_gpos(s) == "content") + return f_string1; + else + return f_string0; +} + +static EST_Val ff_word_n_content(EST_Item *s) +{ + // returns the next content word after s + EST_Item *p; + + for (p=s->as_relation("Word")->next(); p != 0; p = p->next()) + { + if (ff_word_gpos(p) == "content") + return EST_Val(p->name()); + } + + return f_string0; +} + +static EST_Val ff_word_nn_content(EST_Item *s) +{ + // returns the next next content word after s + int count = 0; + EST_Item *p; + + for (p=s->as_relation("Word")->next(); p != 0; p = p->next()) + { + if (ff_word_gpos(p) == "content") + { + count ++; + if (count == 2) + return EST_Val(p->name()); + } + } + + return f_string0; +} + +static EST_Val ff_word_p_content(EST_Item *s) +{ + // returns the previous content word after s + EST_Item *p; + + for (p=s->as_relation("Word")->prev(); p != 0; p = p->prev()) + if (ff_word_gpos(p) == "content") + return EST_Val(p->name()); + + return f_string0; +} + +static EST_Val ff_word_pp_content(EST_Item *s) +{ + // returns the previous previous content word after s + int count = 0; + EST_Item *p; + + for (p=s->as_relation("Word")->prev(); p != 0; p = p->prev()) + { + if (ff_word_gpos(p) == "content") + { + count ++; + if (count == 2) + return EST_Val(p->name()); + } + } + + return f_string0; +} + +static EST_Val ff_content_words_out(EST_Item *s) +{ + EST_Item *nn = s->as_relation(Phrase); + EST_Item *p; + int pos=0; + + for (p=nn->next(); p; p=p->next()) + { + if (ff_word_gpos(p) == "content") + pos++; + } + // don't think you can get here + return EST_Val(pos); +} + +static EST_Val ff_content_words_in(EST_Item *s) +{ + EST_Item *nn = s->as_relation(Phrase); + EST_Item *p; + int pos=0; + + for (p=nn->prev(); p; p=p->prev()) + { + if (ff_word_gpos(p) == "content") + pos++; + } + // don't think you can get here + return EST_Val(pos); +} + +static EST_Val ff_word_cap(EST_Item *s) +{ + // "1" is the word starts with a capital letter + const char *word = s->name(); + + if ((word[0] >= 'A') && (word[0] <='Z')) + return f_string1; + else + return f_string0; +} + +static EST_Val ff_syl_onset_type(EST_Item *s) +{ + // Return van Santen's classification of onset type in to one + // of three forms: + // -V contains only voiceless consonants + // +V-S contains voiced obstruents but no sonorants + // +S contains just sonorants + EST_Item *nn = s->as_relation("SylStructure"); + EST_Item *p; + int vox=FALSE; + int sonorant=FALSE; + + for (p=daughter1(nn); p->next() != 0; p=p->next()) + { + if (ph_is_vowel(p->name())) + break; + if (ph_is_voiced(p->name())) + vox = TRUE; + if (ph_is_sonorant(p->name())) + sonorant = TRUE; + } + + if (p==daughter1(nn)) // null-onset case + return EST_Val("+V-S"); + else if (sonorant) + return EST_Val("+S"); + else if (vox) + return EST_Val("+V-S"); + else + return EST_Val("-V"); +} + +static EST_Val ff_syl_coda_type(EST_Item *s) +{ + // Return van Santen's classification of onset type in to one + // of three forms: + // -V contains only voiceless consonants + // +V-S contains voiced obstruents but no sonorants + // +S contains just sonorants + EST_Item *nn = s->as_relation("SylStructure"); + EST_Item *p; + int vox=FALSE; + int sonorant=FALSE; + + for (p=daughter1(nn); p->next() != 0; p=p->next()) + { + if (ph_is_vowel(p->name())) + break; + } + + if (p->next() == 0) // empty coda + return EST_Val("+S"); + + for (p=p->next(); p != 0; p=p->next()) + { + if (ph_is_voiced(p->name())) + vox = TRUE; + if (ph_is_sonorant(p->name())) + sonorant = TRUE; + } + + if (sonorant) + return EST_Val("+S"); + else if (vox) + return EST_Val("+V-S"); + else + return EST_Val("-V"); +} + +void festival_lex_ff_init(void) +{ + + festival_def_nff("gpos","Word",ff_word_gpos, + "Word.gpos\n\ + Returns a guess at the part of speech of this word. The lisp a-list\n\ + guess_pos is used to load up this word. If no part of speech is\n\ + found in there \"content\" is returned. This allows a quick efficient\n\ + method for part of speech tagging into closed class and content words."); + festival_def_nff("contentp","Word",ff_word_contentp, + "Word.contentp\n\ + Returns 1 if this word is a content word as defined by gpos, 0 otherwise."); + festival_def_nff("cap","Word",ff_word_cap, + "Word.cap\n\ + Returns 1 if this word starts with a capital letter, 0 otherwise."); + festival_def_nff("n_content","Word",ff_word_n_content, + "Word.n_content\n\ + Next content word. Note this doesn't use the standard n. notation as\n\ + it may have to search a number of words forward before finding a\n\ + non-function word. Uses gpos to define content/function word distinction.\n\ + This also works for Tokens."); + festival_def_nff("nn_content","Word",ff_word_nn_content, + "Word.nn_content\n\ + Next next content word. Note this doesn't use the standard n.n. notation\n\ + as it may have to search a number of words forward before finding the \n\ + second non-function word. Uses gpos to define content/function word\n\ + distinction. This also works for Tokens."); + festival_def_nff("p_content","Word",ff_word_p_content, + "Word.p_content\n\ + Previous content word. Note this doesn't use the standard p. notation\n\ + as it may have to search a number of words backward before finding the \n\ + first non-function word. Uses gpos to define content/function word\n\ + distinction. This also works for Tokens."); + festival_def_nff("pp_content","Word",ff_word_pp_content, + "Word.pp_content\n\ + Previous previous content word. Note this doesn't use the standard p.p.\n\ + notation as it may have to search a number of words backward before\n\ + finding the first non-function word. Uses gpos to define \n\ + content/function word distinction. This also works for Tokens."); + festival_def_nff("content_words_out","Word",ff_content_words_out, + "Word.content_words_out\n\ + Number of content words to end of this phrase."); + festival_def_nff("content_words_in","Word",ff_content_words_in, + "Word.content_words_in\n\ + Number of content words from start this phrase."); + festival_def_nff("syl_onset_type","Syllable",ff_syl_onset_type, + "Syllable.syl_onset_type\n\ + Return the van Santen and Hirschberg classification. -V for unvoiced,\n\ + +V-S for voiced but no sonorants, and +S for sonorants."); + festival_def_nff("syl_coda_type","Syllable",ff_syl_coda_type, + "Syllable.syl_coda_type\n\ + Return the van Santen and Hirschberg classification. -V for unvoiced,\n\ + +V-S for voiced but no sonorants, and +S for sonorants."); + +} diff --git a/src/modules/Lexicon/lexicon.cc b/src/modules/Lexicon/lexicon.cc new file mode 100644 index 0000000..fbb58ef --- /dev/null +++ b/src/modules/Lexicon/lexicon.cc @@ -0,0 +1,797 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* lexicons: lookup */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "lexicon.h" +#include "lexiconP.h" +#include "lts.h" + +static int bl_match_entry(LISP entry,const EST_String &word); +static int match_features(LISP req_feats, LISP act_feats); + +// doesn't need to be exact -- used it jumping back chunks when +// searching for entries with same head word +#define DEFAULT_LEX_ENTRY_SIZE 40 + +// Depth to build the index cache to +#define CACHE_DEPTH 8 + +static LISP lexicon_list = NIL; +static Lexicon *current_lex = NULL; + +VAL_REGISTER_CLASS(lexicon,Lexicon) +SIOD_REGISTER_CLASS(lexicon,Lexicon) + +Lexicon::Lexicon() +{ + type = lex_external; + name = ""; + binlexfp = NULL; + posmap = NIL; + gc_protect(&posmap); + addenda = NIL; + gc_protect(&addenda); + index_cache = NIL; + gc_protect(&index_cache); + matched_lexical_entries = NIL; + gc_protect(&matched_lexical_entries); + pre_hooks = NIL; + gc_protect(&pre_hooks); + post_hooks = NIL; + gc_protect(&post_hooks); + bl_filename = ""; + lts_method=""; +} + + +Lexicon::~Lexicon() +{ + if (binlexfp != NULL) + fclose(binlexfp); + gc_unprotect(&addenda); + gc_unprotect(&index_cache); + gc_unprotect(&posmap); + gc_unprotect(&matched_lexical_entries); + gc_unprotect(&pre_hooks); + gc_unprotect(&post_hooks); + +} + +LISP Lexicon::lookup(const EST_String &word, const LISP features) +{ + LISP entry,mapped,hooked, entry2; + EST_String sword; + + if (pre_hooks != NIL) + { // We could just call it and it wont do anything if NIL + // but this check may save some extra consing + hooked = apply_hooks_right(pre_hooks, + cons(strintern(word),cons(features,NIL))); + sword = get_c_string(car(hooked)); + mapped = map_pos(posmap,car(cdr(hooked))); + } + else + { + sword = word; + mapped = map_pos(posmap,features); + } + + // Check addenda, them complex, then lts + if ((entry = lookup_addenda(sword,mapped)) == NIL) + { if ((entry = lookup_complex(sword,mapped)) == NIL) + entry = lookup_lts(sword,mapped); + } // After adding the city Nice to the addendum, FT said "nice" wrong, there- + else if(mapped != NIL) // fore we try to find an entry with matching pos. + if(car(cdr(entry)) != NIL && // addendum pos is not NIL + mapped != car(cdr(entry))) // and does not match + if((entry2 = lookup_complex(sword,mapped)) != NIL) + if(mapped == car(cdr(entry2))) // comp. lex. pos does match + entry = entry2; + + + if (post_hooks != NIL) + { + return apply_hooks_right(post_hooks,cons(entry,NIL)); + } + else + return entry; +} + +LISP Lexicon::lookup_all(const EST_String &word) +{ + // Find all entries that match word. + LISP entries = NIL; + LISP l; + + for (l=addenda; l != 0; l=cdr(l)) + if (bl_match_entry(car(l),word) == 0) + entries = cons(car(l),entries); + + lookup_complex(word,flocons(-1)); + + return reverse(append(matched_lexical_entries,entries)); +} + +int Lexicon::in_lexicon(const EST_String &word,LISP features) +{ + // Checks to see if this word is in the lexicons (addenda or + // compiled lexicon. Ignores any letter to sound method. + // This is used to determine if further analysis on this word might + // be worthwhile + + if ((lookup_addenda(word,features) != NIL) || + (lookup_complex(word,features) != NIL)) + return TRUE; + else + return FALSE; +} + +EST_String Lexicon::str_lookup(const EST_String &word, const LISP features) +{ + LISP entry = lookup(word,features); + return siod_sprint(entry); +} + +LISP Lexicon::lookup_addenda(const EST_String &word,LISP features) +{ + // Lookup addenda (for first match) + LISP l,potential=NIL; + + for (l=addenda; l != 0; l=cdr(l)) + if (bl_match_entry(car(l),word) == 0) + { + if (potential == NIL) // if nothing matches features this'll do + potential = car(l); + if (match_features(features,(car(cdr(car(l)))))) + return car(l); + } + + // If there isn't a complete match, a match of name is sufficient + return potential; +} + +LISP Lexicon::lookup_complex(const EST_String &word,LISP features) +{ + // Lookup the word in a compile lexicon file + int start,end; + + if (bl_filename == "") + return NIL; // there isn't a compiled lexicon + + binlex_init(); + int depth = 0; + matched_lexical_entries = NIL; + lex_entry_match = 0; + + bl_lookup_cache(index_cache,word,start,end,depth); + + return bl_bsearch(word,features,start,end,depth); +} + +void Lexicon::bl_lookup_cache(LISP cache, const EST_String &word, + int &start, int &end, int &depth) +{ + // Look up word in the index cache to get a better idea where + // to look in the real file + int fc; + + if (cdr(cache) == NIL) // hit bottom + { + start = get_c_int(car(car(cache))); + end = get_c_int(cdr(car(cache))); + } + else if ((fc=fcompare(word,get_c_string(car(cdr(cache))),NULL)) < 0) + bl_lookup_cache(siod_nth(2,cache),word,start,end,++depth); + else if (fc == 0) + { + start = get_c_int(car(car(cache))); + end = get_c_int(cdr(car(cache))); + } + else + bl_lookup_cache(siod_nth(3,cache),word,start,end,++depth); +} + +void Lexicon::add_to_cache(LISP cache, + const EST_String &word, + int start,int mid, int end) +{ + int fc; + + if (cdr(cache) == NIL) // hit bottom + { + LISP a,b; + + a = cons(cons(flocons(start),flocons(mid)),NIL); + b = cons(cons(flocons(mid),flocons(end)),NIL); + + setcdr(cache,cons(strintern(word),cons(a,cons(b,NIL)))); + } + else if ((fc = fcompare(word,get_c_string(car(cdr(cache))),NULL)) < 0) + add_to_cache(siod_nth(2,cache),word,start,mid,end); + else if (fc == 0) + return; /* already in cache */ + else + add_to_cache(siod_nth(3,cache),word,start,mid,end); +} + +void Lexicon::binlex_init(void) +{ + // Open the compiled lexicon file if not already open + char magic_number[20]; + int end; + + if (binlexfp == NULL) + { + if (bl_filename == "") + { + cerr << "Lexicon: no compile file given" << endl; + festival_error(); + } + binlexfp = fopen(bl_filename,"rb"); + if (binlexfp == NULL) + { + cerr << "Lexicon: compile file \"" << bl_filename << + "\" not found or unreadble " << endl; + festival_error(); + } + fread(magic_number,sizeof(char),4,binlexfp); + magic_number[4] = '\0'; + if ((EST_String)"MNCM" == (EST_String)magic_number) + { // A compiled lexicon plus features + // Also different entry format (pos distributions) + LISP features = lreadf(binlexfp); + comp_num_entries = get_param_int("num_entries",features,-1); + } + else if ((EST_String)"MNCL" == (EST_String)magic_number) + { + comp_num_entries = -1; + } + else + { + cerr << "Lexicon: compile file \"" << bl_filename << + "\" not a compiled lexicon " << endl; + festival_error(); + } + blstart = ftell(binlexfp); + fseek(binlexfp,0L, SEEK_END); + end = ftell(binlexfp); + index_cache = cons(cons(flocons(blstart),flocons(end)),NIL); + } +} + +LISP Lexicon::lookup_lts(const EST_String &word,LISP features) +{ + // Look up using letter to sound rules + + if ((lts_method == "") || + (lts_method == "Error")) + { + cerr << "LEXICON: Word " << word << " (plus features) not found in lexicon " + << endl; + festival_error(); + } + else if (lts_method == "lts_rules") + return lts(word,features,lts_ruleset); + else if (lts_method == "none") + return cons(strintern(word),cons(NIL,cons(NIL,NIL))); + else if (lts_method == "function") + return leval(cons(rintern("lex_user_unknown_word"), + cons(quote(strintern(word)), + cons(quote(features),NIL))), + NIL); + else + return leval(cons(rintern(lts_method), + cons(quote(strintern(word)), + cons(quote(features),NIL))), + NIL); + + return NIL; +} + +LISP Lexicon::bl_bsearch(const EST_String &word,LISP features, + int start,int end, int depth) +{ + // Do a binary search for word in file + LISP closest_entry; + int mid, compare; + + if (start==end) // only happens if first item has been tested (and failed) + return NIL; + else if ((end-start) < 10) + { + if (start == blstart) + { + mid = start; + end = start; + } + else + return NIL; // failed + } + else + mid = start + (end-start)/2; + + closest_entry = bl_find_next_entry(mid); + if ((depth < CACHE_DEPTH) && + (end-start > 256)) + { + add_to_cache(index_cache,get_c_string(car(closest_entry)), + start,mid,end); + } + + compare = bl_match_entry(closest_entry,word); + +/* printf("%s %s %d %d %d\n", + (const char *)word, + get_c_string(car(closest_entry)), + start,mid,end); */ + + if (compare == 0) + return bl_find_actual_entry(mid,word,features); + else if (compare < 0) // too far + return bl_bsearch(word,features,start,mid,++depth); + else // not far enough + return bl_bsearch(word,features,mid,end,++depth); +} + +LISP Lexicon::bl_find_next_entry(int pos) +{ + // Read the next full entry after this point + int c; + + fseek(binlexfp,(long)pos,SEEK_SET); + + while ((c = getc(binlexfp)) != '\n') + if (c == EOF) return NIL; + + return lreadf(binlexfp); +} + +static int bl_match_entry(LISP entry,const EST_String &word) +{ + return fcompare(word,get_c_string(car(entry)),NULL); +} + +LISP Lexicon::bl_find_actual_entry(int pos,const EST_String &word,LISP features) +{ + // Well there may be a number of entries with the same head word + // we want to find the one that matches (with features) or the + // the first one. To do this we must go back until we find one that + // doesn't match and then search through each for a real match + // If no features have a match the first headword match is returned + // as that pronunciation is probably still better than letter to + // sound rules + LISP n_entry; + LISP f_entry = NIL; + + do + { + pos -= DEFAULT_LEX_ENTRY_SIZE; + if (pos < blstart) + { + pos = blstart; fseek(binlexfp,pos,SEEK_SET); + break; + } + n_entry = bl_find_next_entry(pos); + } + while (bl_match_entry(n_entry,word) == 0); + + // now pos is definitely before the first entry whose head is word + n_entry = lreadf(binlexfp); + lex_entry_match = 0; + matched_lexical_entries = NIL; + while (bl_match_entry(n_entry,word) >= 0) + { + if (bl_match_entry(n_entry,word) == 0) + { + if (f_entry == NIL) f_entry = n_entry; + matched_lexical_entries = cons(n_entry,matched_lexical_entries); + lex_entry_match++; + if (match_features(features,(car(cdr(n_entry))))) + return n_entry; + } + n_entry = lreadf(binlexfp); + if (siod_eof(n_entry)) + return NIL; + } + + return f_entry; +} + +static int match_features(LISP req_feats, LISP act_feats) +{ + // Match required features with actualy features + // required should be less specific than actual + LISP l,m; + + if ((req_feats == NIL) || + (eql(req_feats,act_feats))) + return TRUE; + else if (consp(req_feats) && consp(act_feats)) + { + for (l=req_feats; l != 0; l=cdr(l)) + { + for (m=act_feats; m != 0; m=cdr(m)) + if (eql(car(l),car(m))) + break; + if (m == NIL) + return FALSE; + } + return TRUE; + } + else + return FALSE; +} + +static void check_current_lex(void) +{ + // check there is a current lexicon + + if (current_lex == NULL) + { + cerr << "No lexicon" << endl; + festival_error(); + } +} + +static void lex_add_lexicon(const EST_String &name, Lexicon *l) +{ + // Add lexicon to list of lexicons + LISP lpair; + + lpair = siod_assoc_str(name,lexicon_list); + + if (lexicon_list == NIL) + gc_protect(&lexicon_list); + + if (lpair == NIL) + { + lexicon_list = cons(cons(strintern(name), + cons(siod(l),NIL)), + lexicon_list); + } + else + { + cwarn << "lexicon " << name << " recreated" << endl; + setcar(cdr(lpair),siod(l)); + } + + return; +} + +// +// This functions provide the LISP access to the lexicons including +// access and selection +// +static LISP lex_set_compile_file(LISP fname) +{ + EST_String filename = get_c_string(fname); + + check_current_lex(); + + current_lex->set_bl_filename(filename); + + return fname; +} + +static LISP lex_add_entry(LISP entry) +{ + + check_current_lex(); + + current_lex->add_addenda(entry); + + return NIL; +} + +static LISP lex_set_lts_method(LISP method) +{ + EST_String smethod; + + check_current_lex(); + + if (method == NIL) + smethod = "none"; + else + smethod = get_c_string(method); + + current_lex->set_lts_method(smethod); + + return method; +} + +static LISP lex_set_lts_ruleset(LISP rulesetname) +{ + EST_String sruleset; + + check_current_lex(); + + if (rulesetname == NIL) + { + cerr << "LEXICON: no ruleset name given\n"; + festival_error(); + } + else + sruleset = get_c_string(rulesetname); + + current_lex->set_lts_ruleset(sruleset); + + return rulesetname; +} + +static LISP lex_set_pos_map(LISP posmap) +{ + check_current_lex(); + + current_lex->set_pos_map(posmap); + + return posmap; +} + +static LISP lex_set_phoneset(LISP psname) +{ + EST_String phoneset = get_c_string(psname); + + check_current_lex(); + current_lex->set_phoneset_name(phoneset); + + return psname; +} + +static LISP lex_set_pre_hooks(LISP hooks) +{ + LISP last_hooks; + check_current_lex(); + + last_hooks = current_lex->pre_hooks; + current_lex->pre_hooks = hooks; + + return last_hooks; +} + +static LISP lex_set_post_hooks(LISP hooks) +{ + LISP last_hooks; + check_current_lex(); + + last_hooks = current_lex->post_hooks; + current_lex->post_hooks = hooks; + + return last_hooks; +} + +EST_String lex_current_phoneset(void) +{ + check_current_lex(); + return current_lex->phoneset_name(); +} + +static LISP lex_create_lex(LISP lexname) +{ + // Make a new lexicon current (and select it) + Lexicon *l = new Lexicon; + EST_String name = get_c_string(lexname); + + l->set_lex_name(name); + + lex_add_lexicon(name,l); + + current_lex = l; + + return lexname; +} + +LISP lex_select_lex(LISP lexname) +{ + // Select named lexicon and make it current, return name of previous + EST_String name = get_c_string(lexname); + LISP lpair, lastname; + + lpair = siod_assoc_str(name,lexicon_list); + if (current_lex == NULL) + { + cerr << "lexicon: no current lexicon -- shouldn't happen\n"; + festival_error(); + } + else + lastname = rintern((const char *)current_lex->get_lex_name()); + + if (lpair == NIL) + { + cerr << "lexicon " << name << " not defined" << endl; + festival_error(); + } + else + current_lex = lexicon(car(cdr(lpair))); + + return lastname; +} + +LISP lex_list(void) +{ + // List names of all current defined lexicons + LISP lexica = NIL; + LISP l; + + for (l=lexicon_list; l != NIL; l=cdr(l)) + lexica = cons(car(car(l)),lexica); + + return lexica; +} + +static LISP lex_lookup_lisp(LISP lword, LISP features) +{ + return lex_lookup_word(get_c_string(lword),features); +} + +static LISP lex_lookup_all(LISP lword) +{ + check_current_lex(); + + return current_lex->lookup_all(get_c_string(lword)); +} + +static LISP lex_entrycount(LISP lword) +{ + check_current_lex(); + + // assumes -1 is never in the pos field + current_lex->lookup(get_c_string(lword),flocons(-1)); + + return flocons(current_lex->num_matches()); +} + +LISP lex_lookup_word(const EST_String &word, LISP features) +{ + check_current_lex(); + + return current_lex->lookup(word,features); +} + +int in_current_lexicon(const EST_String &word, LISP features) +{ + check_current_lex(); + return current_lex->in_lexicon(word,features); +} + +void festival_lex_ff_init(void); + +void festival_Lexicon_init(void) +{ + // define lexicon related functions + + festival_lex_ff_init(); + + init_subr_1("lex.set.compile.file",lex_set_compile_file, + "(lex.set.compile.file COMPFILENAME)\n\ + Set the current lexicon's compile file to COMPFILENAME. COMPFILENAME\n\ + is a compiled lexicon file created by lex.compile.\n\ + [see Defining lexicons]"); + init_subr_0("lex.list",lex_list, + "(lex.list)\n\ + List names of all currently defined lexicons."); + init_subr_1("lex.set.lts.method",lex_set_lts_method, + "(lex.set.lts.method METHOD)\n\ + Set the current lexicon's letter-to-sound method to METHOD. METHOD\n\ + can take any of the following values: Error (the default) signal a\n\ + festival error if a word is not found in the lexicon; lts_rules use the\n\ + letter to sound rule set named by lts_ruleset; none return\n\ + simply nil in the pronunciation field; function use call the two argument\n\ + function lex_user_unknown_word (as set by the user) with the word and\n\ + features to provide an entry. [see Letter to sound rules]"); + init_subr_1("lex.set.lts.ruleset",lex_set_lts_ruleset, + "(lex.set.lts.ruleset RULESETNAME)\n\ + Set the current lexicon's letter-to-sound ruleset to RULESETNAME.\n\ + A ruleset of that name must already be defined. This is used if\n\ + lts.method is set to lts_rules. [see Letter to sound rules]"); + init_subr_1("lex.set.pos.map",lex_set_pos_map, + "(lex.set.pos.map POSMAP)\n\ + A reverse assoc-list mapping part of speech tags to the lexical\n\ + part of speech tag set. [see Lexical entries]"); + init_subr_1("lex.set.pre_hooks",lex_set_pre_hooks, + "(lex.set.pre_hooks HOOKS)\n\ + Set a function or list of functions that are to be applied to the entry\n\ + before lookup. Returns previous value [see Lexical entries]"); + init_subr_1("lex.set.post_hooks",lex_set_post_hooks, + "(lex.set.post_hooks HOOKS)\n\ + Set a function or list of functions that are to be applied to the entry\n\ + after lookup. Returns previous value [see Lexical entries]"); + init_subr_1("lex.set.phoneset",lex_set_phoneset, + "(lex.set.phoneset PHONESETNAME)\n\ + Set current lexicon's phone set to PHONESETNAME. PHONESETNAME must be\n\ + a currently defined (and, of course, loaded) phone set.\n\ + [see Defining lexicons]"); + init_subr_1("lex.add.entry",lex_add_entry, + "(lex.add.entry ENTRY)\n\ + Add ENTRY to the addenda of the current lexicon. As the addenda is\n\ + checked before the compiled lexicon or letter to sound rules, this will\n\ + cause ENTRY to be found before all others. If a word already in the\n\ + addenda is added again the most recent addition will be found (part of\n\ + speech tags are respected in the look up). [see Lexical entries]"); + init_subr_1("lex.select",lex_select_lex, + "(lex.select LEXNAME)\n\ + Select LEXNAME as current lexicon. The name of the previously selected\n\ + lexicon is returned."); + init_subr_1("lex.create",lex_create_lex, + "(lex.create LEXNAME)\n\ + Create a new lexicon of name LEXNAME. If it already exists, the old one\n\ + is deleted first. [see Defining lexicons]"); + init_subr_2("lex.lookup",lex_lookup_lisp, + "(lex.lookup WORD FEATURES)\n\ + Lookup word in current lexicon. The addenda is checked first, if WORD\n\ + with matching FEATURES (so far this is only the part of speech tag) is\n\ + not found the compiled lexicon is checked. Only if the word is still not\n\ + found the letter to sound rules (or whatever method specified by the\n\ + current lexicon's lts.method is used). [see Lookup process]"); + init_subr_1("lex.lookup_all",lex_lookup_all, + "(lex.lookup_all WORD)\n\ + Return list of all entries in the addenda and compiled lexicon that\n\ + match this word. The letter to sound rules and user defined unknown\n\ + word function is ignored."); + init_subr_1("lex.entrycount",lex_entrycount, + "(lex.entrycount WORD)\n\ + Return the number of entries in the compiled lexicon that match this\n\ + word. This is used in detecting homographs."); + init_subr_1("lex.syllabify.phstress",lex_syllabify_phstress, + "(lex.syllabify.phstress PHONELIST)\n\ + Syllabify the given phone list (if current phone set). Vowels may have\n\ + the numerals 0, 1, or 2 as suffixes, if so these are taken to be stress\n\ + for the syllable they are in. This format is similar to the entry format\n\ + in the CMU and BEEP lexicons. [see Defining lexicons]"); + init_subr_2("lex.compile",lexicon_compile, + "(lex.compile ENTRYFILE COMPILEFILE)\n\ + Compile the list of lexical entries in ENTRYFILE into a compiled file in\n\ + COMPILEFILE. [see Defining lexicons]"); + init_fsubr("lts.ruleset",lts_def_ruleset, + "(lts.ruleset NAME RULES SETS)\n\ + Define a new set of letter to sound rules. [see Letter to sound rules]"); + init_subr_2("lts.apply",lts_apply_ruleset, + "(lts.apply WORD RULESETNAME)\n\ + Apply lts ruleset RULESETNAME to word returning result. \n\ + [see Letter to sound rules]"); + init_subr_2("lts.in.alphabet",lts_in_alphabet, + "(lts.in.alphabet WORD RULESETNAME)\n\ + Returns t is all characters in symbol word (or items in list WORD)\n\ + are in the alphabet of letter to sound ruleset name RULESETNAME. nil\n\ + otherwise. [see Letter to sound rules]"); + init_subr_0("lts.list",lts_list, + "(lts.list)\n\ + Return list of all current defined LTS rulesets."); + +} diff --git a/src/modules/Lexicon/lexiconP.h b/src/modules/Lexicon/lexiconP.h new file mode 100644 index 0000000..ce0f7eb --- /dev/null +++ b/src/modules/Lexicon/lexiconP.h @@ -0,0 +1,48 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : February 1997 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Private functions for lexicon directory */ +/* */ +/*=======================================================================*/ +#ifndef __LEXICONP_H__ +#define __LEXICONP_H__ + +LISP lexicon_compile(LISP finname, LISP foutname); + +#endif /* __LEXICONP_H__ */ + + + diff --git a/src/modules/Lexicon/lts.cc b/src/modules/Lexicon/lts.cc new file mode 100644 index 0000000..3c79957 --- /dev/null +++ b/src/modules/Lexicon/lts.cc @@ -0,0 +1,105 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* A front end to the letter to sound rule system(s) */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "lexicon.h" +#include "lts.h" + +static LISP lts_create_entry(const EST_String &word, + LISP features,LISP syllables); +static LISP map_phones(LISP phones); + +LISP lts(const EST_String &word,LISP features,const EST_String &rulesetname) +{ + /* Return lexical entry for given word in best possible way */ + LISP phones; + EST_String dword = downcase(word); + LISP lword = strintern(dword); + LISP lrulesetname = rintern(rulesetname); + + if (lts_in_alphabet(lword,lrulesetname) != NIL) + { // this check doesn't guarantee success + phones = lts_apply_ruleset(lword,lrulesetname); + } + else + phones = NIL; // otherwise can't do anything + + return lts_create_entry(word, features, + lex_syllabify(map_phones(phones))); + +} + +static LISP lts_create_entry(const EST_String &word,LISP features,LISP syllables) +{ + /* build an entry from information */ + + return cons(strcons(strlen(word),word), + cons(features,cons(syllables,NIL))); +} + +static LISP map_phones(LISP phones) +{ + // map list of phones to lexical list of phones + + // Users responsibility to get right phone set + return phones; + + // If lts rulesets get their own phonesets then the following should + // be used (with appropriate changes) +#if 0 + LISP mapped = NIL,p; + EST_String lexset,mappedph; + + lexset = lex_current_phoneset(); + + if (lexset != "nrl") + { + for (p=phones; p != NIL; p=cdr(p)) + { + mappedph = map_phone(get_c_string(car(p)),"nrl",lexset); + mapped = cons(rintern(mappedph),mapped); + } + return reverse(mapped); + } + else + return phones; +#endif +} + diff --git a/src/modules/Lexicon/lts.h b/src/modules/Lexicon/lts.h new file mode 100644 index 0000000..9822cc8 --- /dev/null +++ b/src/modules/Lexicon/lts.h @@ -0,0 +1,52 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* access to letter to sound rule system(s) */ +/* */ +/*=======================================================================*/ +#ifndef __LTS_H__ +#define __LTS_H__ + +LISP lts(const EST_String &word,LISP features,const EST_String &rulesetname); +LISP lts_apply_ruleset(LISP word, LISP rulesetname); +LISP lts_in_alphabet(LISP word, LISP rulesetname); +LISP lts_def_ruleset(LISP args, LISP penv); +LISP lts_list(); + +#endif /* __LTS_H__ */ + + + diff --git a/src/modules/Lexicon/lts_rules.cc b/src/modules/Lexicon/lts_rules.cc new file mode 100644 index 0000000..cc383f4 --- /dev/null +++ b/src/modules/Lexicon/lts_rules.cc @@ -0,0 +1,438 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : September 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* A letter to sound rule system that allows rules to be specified */ +/* externally. This is specified desined to use the existing Welsh */ +/* letter to rules developed by Briony Williams for the Welsh */ +/* synthesizer. This form came from a program by Greg Lee (Univ of */ +/* Hawaii), but this is not using his code. */ +/* */ +/* Multiple rule sets are supported alloing multiple applications of */ +/* varying rule sets */ +/* */ +/* A set of rules consists of set definitions, and rules */ +/* Each rule consists of a left hand side and a write hand side */ +/* The LHS consist of a left context [ change chars ] right context */ +/* The RHS consist of the new symbols */ +/* */ +/* The rules are interpreted in order, the first match rule gives */ +/* the replacement, and the search start again with the pointer */ +/* incremented */ +/* */ +/*=======================================================================*/ +#include +#include +#include "festival.h" +#include "lts.h" + +class LTS_Ruleset{ + private: + EST_String p_name; + int num_rules; + LISP p_rules; + LISP p_alphabet; + LISP p_sets; // short hand for sets + LISP normalize(LISP rules); + int item_match(LISP actual_item, LISP rule_item); + int context_match(LISP actual_context, LISP rule_context); + int match_rule(LISP lc, LISP remainder, LISP rule, LISP *rest); + LISP rewrite(LISP lc, LISP remainder, LISP rules, LISP *rest); + LISP this_match(LISP remainder, LISP rule_this); + void update_alphabet(LISP newitems); + public: + LTS_Ruleset(LISP name, LISP rules, LISP sets); + ~LTS_Ruleset(void); + const EST_String &name(void) const {return p_name;} + LISP apply(LISP word); + LISP check_alpha(LISP word); +}; + +static LISP fix_postfix_ops(LISP l); + +static LISP lts_rules_list = NIL; + +#define LTS_LC(R) (car(R)) +#define LTS_THIS(R) (car(cdr(R))) +#define LTS_RC(R) (car(cdr(cdr(R)))) +#define LTS_RHS(R) (car(cdr(cdr(cdr(R))))) + +VAL_REGISTER_CLASS(ltsruleset,LTS_Ruleset) +SIOD_REGISTER_CLASS(ltsruleset,LTS_Ruleset) + +LTS_Ruleset::LTS_Ruleset(LISP name, LISP rules, LISP sets) +{ + p_alphabet = NIL; + gc_protect(&p_alphabet); + p_name = get_c_string(name); + p_sets = sets; + gc_protect(&p_sets); + p_rules = normalize(rules); + gc_protect(&p_rules); +} + +LTS_Ruleset::~LTS_Ruleset(void) +{ + gc_unprotect(&p_sets); + gc_unprotect(&p_rules); + gc_unprotect(&p_alphabet); +} + +LISP LTS_Ruleset::normalize(LISP rules) +{ + // Change the rule format to I can access it faster + LISP r, rc, t, lc, rhs, c; + LISP nrs = NIL; + int state; + + for (r=rules; r != NIL; r=cdr(r)) + { + lc = t = rc = rhs = NIL; + state = 0; + for (c=car(r); c != NIL; c = cdr(c)) + { + if (state == 0) + { + if (streq("[",get_c_string(car(c)))) + state = 1; + else + lc = cons(car(c),lc); + } + else if (state == 1) + { + if (streq("]",get_c_string(car(c)))) + state = 2; + else + t = cons(car(c),t); + } + else if (state == 2) + { + if (streq("=",get_c_string(car(c)))) + { + state = 3; + rhs = cdr(c); + break; + } + else + rc = cons(car(c),rc); + } + else + { + cerr << "LTS_Rules:: misparsed a rule\n"; + cerr << "LTS_Rules:: "; + pprint(car(r)); + festival_error(); + } + } + update_alphabet(t); + if ((state != 3) || + (t == NIL)) + { + cerr << "LTS_Rules:: misparsed a rule\n"; + cerr << "LTS_Rules:: "; + pprint(car(r)); + festival_error(); + } + nrs = cons(cons(fix_postfix_ops(lc),cons(reverse(t),cons(reverse(rc), + cons(rhs,NIL)))),nrs); + } + + return reverse(nrs); +} + +void LTS_Ruleset::update_alphabet(LISP newitems) +{ + // Add new items to alphabet is not already there + LISP p; + + for (p=newitems; p != NIL; p=cdr(p)) + if (siod_member_str(get_c_string(car(p)),p_alphabet) == NIL) + p_alphabet = cons(car(p),p_alphabet); +} + +static LISP fix_postfix_ops(LISP l) +{ + // This list have been built in reverse so the postfix operators * and + + // are wrong. Destrictively fix them + LISP p,q; + + for (p=l; p != NIL; p=cdr(p)) + if ((streq("*",get_c_string(car(p)))) || + (streq("+",get_c_string(car(p))))) + { + if (cdr(p) == NIL) + { + cerr << "LTS_Rules:: malformed left context\n"; + pprint(reverse(l)); + } + q = car(p); + CAR(p) = car(cdr(p)); + CAR(cdr(p)) = q; + p = cdr(p); + } + + return l; +} + +LISP LTS_Ruleset::apply(LISP word) +{ + // Apply rules to word + LISP lc,remainder,result,newremainder,r,l; + int i; + + lc = cons(rintern("#"),NIL); + remainder = append(word,lc); // add # at end of right context + result = NIL; + + for (; + !streq("#",get_c_string(car(remainder))); + ) + { + r = rewrite(lc,remainder,p_rules,&newremainder); + result = append(reverse(r),result); + for (i=0,l=remainder; + i < siod_llength(remainder)-siod_llength(newremainder); + i++,l=cdr(l)) + lc=cons(car(l),lc); + remainder = newremainder; + } + return reverse(result); +} + +LISP LTS_Ruleset::check_alpha(LISP word) +{ + // Check all characters in word can (possibly) be mapped by this ruleset + LISP word_chars,p; + + if (consp(word)) + word_chars = word; + else + word_chars = symbolexplode(word); + + for (p=word_chars; p != NIL; p=cdr(p)) + if (siod_member_str(get_c_string(car(p)),p_alphabet) == NIL) + return NIL; + + return rintern("t"); +} + +LISP LTS_Ruleset::rewrite(LISP lc, LISP remainder, LISP rules, LISP *rest) +{ + // Find a rule to match this context + LISP r,t; + + for (r=rules; r != NIL; r=cdr(r)) + if (match_rule(lc,remainder,car(r),rest) == TRUE) + return LTS_RHS(car(r)); + + cerr << "LTS_Ruleset " << p_name << ": no rule matches: \n"; + cerr << "LTS_Ruleset: "; + for (t=reverse(lc); t != NIL; t = cdr(t)) + cerr << get_c_string(car(t)) << " "; + cerr << "*here* "; + for (t=remainder; t != NIL; t = cdr(t)) + cerr << get_c_string(car(t)) << " "; + cerr << endl; + festival_error(); + return NIL; +} + +int LTS_Ruleset::match_rule(LISP lc, LISP remainder, LISP rule, LISP *rest) +{ + // Match this rule + + *rest = this_match(remainder,LTS_THIS(rule)); + + return ((*rest != NIL) && + (context_match(*rest,LTS_RC(rule))) && + (context_match(lc,LTS_LC(rule)))); +} + +int LTS_Ruleset::context_match(LISP acontext, LISP rcontext) +{ + // TRUE if rule context is initial sub list of actual context + + if (rcontext == NIL) + return TRUE; + else if ((cdr(rcontext) != NIL) && + (streq("*",get_c_string(car(cdr(rcontext)))))) + return ((context_match(acontext,cdr(cdr(rcontext)))) || + (context_match(acontext,cons(car(rcontext), + cdr(cdr(rcontext))))) || + ((item_match(car(acontext),car(rcontext))) && + context_match(cdr(acontext),rcontext))); + else if ((cdr(rcontext) != NIL) && + (streq("+",get_c_string(car(cdr(rcontext)))))) + return ((item_match(car(acontext),car(rcontext))) && + (context_match(cdr(acontext), + cons(car(rcontext), + cons(rintern("*"), + cdr(cdr(rcontext))))))); + else if (item_match(car(acontext),car(rcontext))) + return context_match(cdr(acontext),cdr(rcontext)); + else + return FALSE; + +#if 0 + for (a=actual_context,r=rule_context; r != NIL; a=cdr(a),r=cdr(r)) + { + if (item_match(car(a),car(r))) + return FALSE; + } + + return TRUE; +#endif +} + +LISP LTS_Ruleset::this_match(LISP remainder, LISP rule_this) +{ + // Match the centre of the rule to the remainder. Returning + // the new remainder if the match is successful + LISP a,r; + + for (a=remainder,r=rule_this; r != NIL; a=cdr(a),r=cdr(r)) + if (!item_match(car(a),car(r))) + return NIL; + + return a; +} + +int LTS_Ruleset::item_match(LISP actual_item, LISP rule_item) +{ + // Checks for a match, possible rule_item is a set name + // returns remainder if match, NIL otherwise. + + if (streq(get_c_string(actual_item),get_c_string(rule_item))) + return TRUE; + else + { + LISP lpair = assq(rule_item,p_sets); + if (lpair == NIL) + return FALSE; + else if (siod_member_str(get_c_string(actual_item),cdr(lpair)) != NIL) + return TRUE; + else + return FALSE; + } +} + +LISP lts_def_ruleset(LISP args, LISP penv) +{ + // Define a new rule set + (void)penv; + LTS_Ruleset *rs = new LTS_Ruleset(car(args), + car(cdr(cdr(args))), + car(cdr(args))); + LISP name = car(args); + LISP lpair; + + if (lts_rules_list == NIL) + gc_protect(<s_rules_list); + + lpair = siod_assoc_str(get_c_string(name),lts_rules_list); + + if (lpair == NIL) + { + lts_rules_list = cons(cons(name, + cons(siod(rs),NIL)), + lts_rules_list); + } + else + { + cwarn << "LTS_Rules: " << get_c_string(name) << " recreated" << endl; + setcar(cdr(lpair),siod(rs)); + } + + return name; +} + +LISP lts_in_alphabet(LISP word, LISP rulesetname) +{ + // check if this word is in the input alphabet for ruleset + LTS_Ruleset *rs; + LISP lpair; + + lpair = siod_assoc_str(get_c_string(rulesetname),lts_rules_list); + if (lpair == NIL) + { + cerr << "LTS_Rules: no rule set named \"" << + get_c_string(rulesetname) << "\"\n"; + festival_error(); + } + else + { + rs = ltsruleset(car(cdr(lpair))); + return rs->check_alpha(word); + } + + return NIL; +} + +LISP lts_list() +{ + // List all currently defined rule sets + LISP r,rulesets = NIL; + + for (r=lts_rules_list; r != NIL; r=cdr(r)) + rulesets = cons(car(car(r)),rulesets); + + return rulesets; +} + +LISP lts_apply_ruleset(LISP word, LISP rulesetname) +{ + // Apply the rule set to word (an atom or list of atoms) + LTS_Ruleset *rs; + LISP lpair; + + lpair = siod_assoc_str(get_c_string(rulesetname),lts_rules_list); + if (lpair == NIL) + { + cerr << "LTS_Rule: no rule set named \"" << + get_c_string(rulesetname) << "\"\n"; + festival_error(); + } + else + { + rs = ltsruleset(car(cdr(lpair))); + if (consp(word)) + return rs->apply(word); + else + return rs->apply(symbolexplode(word)); + } + return NIL; +} + + diff --git a/src/modules/Makefile b/src/modules/Makefile new file mode 100644 index 0000000..555cb75 --- /dev/null +++ b/src/modules/Makefile @@ -0,0 +1,107 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## ## +## Automatically constructs init_modules.cc with each modules own ## +## initialization routines ## +## ## +########################################################################### +TOP=../.. +DIRNAME=src/modules +CPPSRCS = +BASE_DIRS = Lexicon base Duration Intonation Text \ + UniSyn donovan parser UniSyn_diphone +# these last four are potentially optional + +LIB_BUILD_DIRS = $(BASE_DIRS) +BUILD_DIRS = $(LIB_BUILD_DIRS) +OPTIONAL = diphone clunits clustergen java rxp UniSyn_phonology MultiSyn hts_engine + +ALL_DIRS = $(BASE_DIRS) $(OPTIONAL) + +OBJS = init_modules.o + +LOCAL_CLEAN = init_modules.cc + +LOCAL_INCLUDES = -I./Database + +FILES = Makefile $(CPPSRCS) VCLocalRules + +INLIB = $(TOP)/src/lib/libFestival.a + +ALL = .buildlib + +include $(TOP)/config/common_make_rules + +# Builds a call to the int function in each sub-directory +# +#init_modules.cc: utilities Makefile $(TOP)/config/modincludes.inc +# @ echo "Making init_modules.cc" +# LD_LIBRARY_PATH='$(EST)/lib:$(LD_LIBRARY_PATH):$(SYSTEM_LD_LIBRARY_PATH)' utilities/find_inits $(LIB_BUILD_DIRS) $(EXTRA_LIB_BUILD_DIRS) > init_modules.cc + + +init_modules.cc: Makefile $(TOP)/config/modincludes.inc + @ echo "Making init_modules.cc" + @ rm -f init_modules.cc + @ echo >init_modules.cc + @ echo "/* This file is autogenerated by src/modules/Makefile */" >>init_modules.cc + @ echo >>init_modules.cc + @ echo "#include \"EST_unix.h\"" >>init_modules.cc + @ echo "#include \"stdio.h\"" >>init_modules.cc + @ echo "#include \"festival.h\"" >>init_modules.cc + @ echo >>init_modules.cc + @ for i in $(LIB_BUILD_DIRS) $(EXTRA_LIB_BUILD_DIRS) ; \ + do \ + if [ ! -f "$$i/NoInit" ] ; then \ + echo "void festival_"$$i"_init(void);" ; \ + fi \ + done >> init_modules.cc + @ echo >>init_modules.cc + @ echo "void festival_init_modules(void)" >>init_modules.cc + @ echo "{" >>init_modules.cc + @ for i in $(LIB_BUILD_DIRS) $(EXTRA_LIB_BUILD_DIRS) ; \ + do \ + if [ ! -f "$$i/NoInit" ] ; then \ + echo " festival_"$$i"_init();" ; \ + fi \ + done >> init_modules.cc + @ echo "}" >>init_modules.cc + @ echo >>init_modules.cc + + + + + + + + + diff --git a/src/modules/MultiSyn/DiphoneBackoff.cc b/src/modules/MultiSyn/DiphoneBackoff.cc new file mode 100644 index 0000000..8ab7488 --- /dev/null +++ b/src/modules/MultiSyn/DiphoneBackoff.cc @@ -0,0 +1,256 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 2004 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Rob Clark */ +/* Date: Jan 2004 */ +/* --------------------------------------------------------------------- */ +/* Diphone backing off procedure */ +/* */ +/* Substitute a target phone for another (or multiple others). */ +/* */ +/* The class holds a list of rules. Each rule is represented by a list */ +/* The rule is interpreted as "The head can be replaced by the tail" */ +/* */ +/* */ +/*************************************************************************/ + + +#include "DiphoneBackoff.h" + + +const EST_String DiphoneBackoff::default_match = "_"; + + +DiphoneBackoff::DiphoneBackoff(LISP l_backofflist) +{ + + EST_StrList list; + LISP l; + + + for ( l = l_backofflist ; l != NIL ; l = cdr(l)) + { + siod_list_to_strlist(car(l),list); + + if(list.length() < 2) + EST_warning("BackoffList: ignoring invalid entry %s\n" , (const char*)list.first()); + else + backofflist.append(list); + } +} + +// The function EST_TList::append copies data, so this is not necessary. +// +// DiphoneBackoff::~DiphoneBackoff() +// { +// EST_Litem *p; +// +// for (p = backofflist.head(); p != 0; p = p->next()) +// delete backofflist(p); +// } + + +/* + * This version of backoff just takes two phone names as input and + * returns a diphone name as output. It is entirely up to the caller + * what to do with this information. + * Note that it assumes substitutions are 1 phone for 1 phone. + */ + +EST_String DiphoneBackoff::backoff(EST_String left, EST_String right) +{ + + EST_Litem *p; + EST_String head,sub,result,rl,rr; + + rl = left; + rr = right; + p = backofflist.head(); + while( p!= 0 ) + { + int i = 0; + head = backofflist(p).nth(i++); + sub = backofflist(p).nth(i++); + + if ( head == left || ( head == default_match && left != sub) ) + { + rl = sub; + p = 0; + } + else if ( head == right || ( head == default_match && right != sub) ) + { + rr = sub; + p = 0; + } + else + p = p->next(); + } + + if ( left != rl || right != rr ) + result = EST_String::cat(rl,"_",rr); + else + result = EST_String::Empty; + + return result; +} + + +/* + * This version of backoff takes a pointer to a target diphone in the + * Segment as input. It changes the target phone sequence by + * substituting a phone for one OR MORE phone as the rules + * dictate. If the left phone of the diphone is altered a diphone + * mismatch will occur in the between sucessive candidate lists. If + * the right phone is substituted this will not happen, as the + * candidate list for the next diphone will be constrcted from the + * corrected list. + */ + +int DiphoneBackoff::backoff(EST_Item *p1) +{ + EST_Item *p2, *pp, *pps; + EST_String n1,n2,head,sub,full_sub,bo; + + bool done = false; + EST_Litem *p; + + + if(! p1) + EST_error("Backoff received null item."); + if ( ! (p2 = p1->next()) ) + EST_error("Backoff didn't get passed a diphone."); + + n1=p1->S("name"); + n2=p2->S("name"); + + p = backofflist.head(); + // for each rule. + while( p!= 0 && !done ) + { + + int i = 0 ; + head = backofflist(p).nth(i++); + + pp = 0; + + // Match head of rule to left phone, or if head of the rule is the defualt substitution + // do it if it hasn't already been done. + if( (head == n1) || ( (head == default_match) && !is_defaultbackoff(p1) ) ) + pp = p1; + // if this fails, try the right phone. + else if( (head == n2) || ( (head == default_match) && !is_defaultbackoff(p2) ) ) + pp = p2; + + if(pp) + { + bo = pp->S("name"); + sub = backofflist(p).nth(i++); + full_sub = sub; + + pp->set("name",sub); + set_backoff(pp); + if(head.matches(default_match)) + set_defaultbackoff(pp); + while (i < backofflist(p).length()) + { + sub = backofflist(p).nth(i++); + full_sub = EST_String::cat(full_sub," ",sub); + pp->insert_after(); + pps = pp->as_relation("SylStructure"); + pp = pp->next(); + // insert in SylStructure as well. + pps->insert_after(pp); + + pp->set("name",sub); + set_backoff(pp); + if(head.matches(default_match)) + set_defaultbackoff(pp); + } + EST_warning("Missing diphone: %s_%s. Changing %s to %s.\n", + (const char *)n1, + (const char *)n2, + (const char *)bo, + (const char *)full_sub); + done = true; + } + p = p->next(); + } + + if (done) + return 0; + else + return 1; +} + + +ostream& DiphoneBackoff::print(ostream &st) const +{ + EST_Litem *p; + + for (p = backofflist.head(); p != 0; p = p->next()) + st << backofflist(p); + return st; +} + +ostream& operator << (ostream &st, const DiphoneBackoff dbo) +{ + dbo.print(st); + return st; +} + +void DiphoneBackoff::set_defaultbackoff(EST_Item *it) const +{ + it->set("defaultbackoff",1); +} + +void DiphoneBackoff::set_backoff(EST_Item *it) const +{ + if(it->f_present("backoff")) + it->set("backoff",it->I("backoff")+1); + else + it->set("backoff",1); +} + +int DiphoneBackoff::is_defaultbackoff(const EST_Item *it) const +{ + if(it->f_present("defaultbackoff")) + return 1; + else return 0; +} + +int DiphoneBackoff::is_backoff(const EST_Item *it) const +{ + if(it->f_present("backoff")) + return 1; + else return 0; +} diff --git a/src/modules/MultiSyn/DiphoneBackoff.h b/src/modules/MultiSyn/DiphoneBackoff.h new file mode 100644 index 0000000..dc7dc92 --- /dev/null +++ b/src/modules/MultiSyn/DiphoneBackoff.h @@ -0,0 +1,74 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 2004 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Rob Clark */ +/* Date: Jan 2004 */ +/* --------------------------------------------------------------------- */ +/* Diphone backing off procedure */ +/*************************************************************************/ + + +#ifndef __DIPHONEBACKOFF_H__ +#define __DIPHONEBACKOFF_H__ + +#include "siod.h" +#include "EST_types.h" +#include "ling_class/EST_Utterance.h" + +class DiphoneBackoff { + + private: + static const EST_String default_match; + EST_StrListList backofflist; + + + public: + DiphoneBackoff(LISP l_backofflist); + //~DiphoneBackoff(); + + EST_String backoff(EST_String left, EST_String right); + int backoff(EST_Item *left_phone); + + ostream& print(ostream &st = cout) const; + friend ostream& operator << (ostream &st, const DiphoneBackoff &dbo); + + private: + int is_defaultbackoff(const EST_Item *it) const; + void set_defaultbackoff(EST_Item *it) const; + int is_backoff(const EST_Item *it) const; + void set_backoff(EST_Item *it) const; + +}; + +#endif // __DIPHONEBACKOFF_H__ + diff --git a/src/modules/MultiSyn/DiphoneUnitVoice.cc b/src/modules/MultiSyn/DiphoneUnitVoice.cc new file mode 100644 index 0000000..2a1e5c8 --- /dev/null +++ b/src/modules/MultiSyn/DiphoneUnitVoice.cc @@ -0,0 +1,900 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Korin Richmond */ +/* Date: Aug 2002 */ +/* --------------------------------------------------------------------- */ +/* first stab at a diphone unit selection "voice" - using a list of */ +/* utterance objects */ +/*************************************************************************/ + +#include "festival.h" +#include "DiphoneUnitVoice.h" +#include "DiphoneVoiceModule.h" +#include "EST_DiphoneCoverage.h" +#include "EST_rw_status.h" +#include "EST_viterbi.h" +#include "EST_Track.h" +#include "EST_track_aux.h" +#include "EST_Wave.h" +#include "EST_THash.h" +#include "EST_TList.h" +#include "EST_types.h" +#include "ling_class/EST_Utterance.h" +#include "siod.h" +#include "siod_est.h" +#include "safety.h" +#include + +#include "EST_TargetCost.h" +#include "TargetCostRescoring.h" +#include "EST_JoinCost.h" +#include "EST_JoinCostCache.h" + +#include "EST_Val.h" + +SIOD_REGISTER_TYPE(itemlist,ItemList) +VAL_REGISTER_TYPE(itemlist,ItemList) + +// from src/modules/UniSyn_diphone/us_diphone.h +// this won't be staying here long... +void parse_diphone_times(EST_Relation &diphone_stream, + EST_Relation &source_lab); + +SIOD_REGISTER_CLASS(du_voice,DiphoneUnitVoice) +VAL_REGISTER_CLASS(du_voice,DiphoneUnitVoice) + +static void my_parse_diphone_times(EST_Relation &diphone_stream, + EST_Relation &source_lab) +{ + EST_Item *s, *u; + float dur1, dur_u, p_time=0.0; + + // NOTE: because of the extendLeft/extendRight phone join hack for missing diphones, + // the unit linked list *may be* shorter that the segment list. + //(admittedly could cause confusion) + + for( s=source_lab.head(), u=diphone_stream.head(); (u!=0)&&(s!=0); u=u->next(), s=s->next()){ + EST_Track *pm = track(u->f("coefs")); + + int end_frame = pm->num_frames() - 1; + int mid_frame = u->I("middle_frame"); + + dur1 = pm->t(mid_frame); + dur_u = pm->t(end_frame); + + s->set("end", (p_time+dur1) ); + + p_time += dur_u; + u->set("end", p_time); + + if( u->f_present("extendRight") ){//because diphone squeezed out (see above) + s = s->next(); + s->set("end", p_time ); + } + } + + if(s) + s->set("end", (p_time)); +} + +// temporary hack necessary because decoder can only take a +// function pointer (would be better to relax this restriction in +// the EST_Viterbi_Decoder class, or in a replacement class, rather +// than using this hack) +static DiphoneUnitVoice *globalTempVoicePtr = 0; + +DiphoneUnitVoice::DiphoneUnitVoice( const EST_StrList& basenames, + const EST_String& uttDir, + const EST_String& wavDir, + const EST_String& pmDir, + const EST_String& coefDir, + unsigned int sr, + const EST_String& uttExt, + const EST_String& wavExt, + const EST_String& pmExt, + const EST_String& coefExt ) + : pruning_beam( -1 ), + ob_pruning_beam( -1 ), + tc_rescoring_beam( -1 ), + tc_rescoring_weight( 0.0 ), + tc_weight( 1.0 ), + jc_weight( 1.0 ), + jc_f0_weight( 1.0 ), + jc_power_weight( 1.0 ), + jc_spectral_weight( 1.0 ), + prosodic_modification( 0 ), + wav_srate( sr ), + jc( 0 ), + jc_delete( false ), + tc( 0 ), + tc_delete( false ), + tcdh( 0 ) + +{ + // make the default voice module with the supplied parameters + addVoiceModule( basenames, uttDir, wavDir, pmDir, coefDir, + wav_srate, + uttExt, wavExt, pmExt, coefExt ); + + diphone_backoff_rules = 0; +} + +void DiphoneUnitVoice::initialise( bool ignore_bad_tag ) +{ + if( jc == 0 ) + EST_error( "Need to set join cost calculator for voice" ); + + if( tc == 0 ) + EST_error( "Need to set target cost calculator for voice" ); + + EST_TList::Entries it; + + for( it.begin(voiceModules); it; it++ ) + (*it)->initialise( tc, ignore_bad_tag ); +} + +bool DiphoneUnitVoice::addVoiceModule( const EST_StrList& basenames, + const EST_String& uttDir, + const EST_String& wavDir, + const EST_String& pmDir, + const EST_String& coefDir, + unsigned int srate, + const EST_String& uttExt, + const EST_String& wavExt, + const EST_String& pmExt, + const EST_String& coefExt ) + +{ + DiphoneVoiceModule *vm; + + if( srate != wav_srate ) + EST_error( "Voice samplerate: %d\nmodule samplerate: %d", + wav_srate, srate ); + + vm = new DiphoneVoiceModule( basenames, uttDir, wavDir, pmDir, coefDir, + srate, + uttExt, wavExt, pmExt, coefExt ); + CHECK_PTR(vm); + + registerVoiceModule( vm ); + + return true; +} + + +void DiphoneUnitVoice::registerVoiceModule( DiphoneVoiceModule *vm ) +{ + voiceModules.append( vm ); +} + + +void DiphoneUnitVoice::setJoinCost( EST_JoinCost *jcost, bool del ) +{ + if( jc_delete == true ) + if( jc != 0 ) + delete jc; + + jc = jcost; + jc_delete = del; +} + +void DiphoneUnitVoice::setTargetCost( EST_TargetCost *tcost, bool del ) +{ + if( tc_delete == true ) + if( tc != 0 ) + delete tc; + + tc = tcost; + tc_delete = del; +} + + +DiphoneUnitVoice::~DiphoneUnitVoice() +{ + EST_TList::Entries it; + + for( it.begin(voiceModules); it; it++ ) + delete( *it ); + + if(diphone_backoff_rules) + delete diphone_backoff_rules; + + if( jc_delete == true ) + if( jc != 0 ) + delete jc; + + if( tc_delete == true ) + if( tc != 0 ) + delete tc; + + if(tcdh) + delete tcdh; + +} + + +void DiphoneUnitVoice::addToCatalogue( const EST_Utterance *utt ) +{ + // needed? +} + + +void DiphoneUnitVoice::getDiphone( const EST_VTCandidate *cand, + EST_Track* coef, EST_Wave* sig, int *midframe, + bool extendLeft, bool extendRight ) +{ + // The need for this function in this class is a bit messy, it would be far + // nicer just to be able to ask the Candidate itself to hand over the relevant + // synthesis parameters. In future, it will work that way ;) + + // put there by DiphoneVoiceModule::getCandidateList + const DiphoneCandidate *diphcand = diphonecandidate( cand->name ); + + const DiphoneVoiceModule* parentModule = diphcand->dvm; + EST_Item *firstPhoneInDiphone = cand->s; + + // need to call right getDiphone to do the actual work + parentModule->getDiphone( firstPhoneInDiphone, coef, sig, midframe, extendLeft, extendRight ); +} + +// REQUIREMENT: the unit relation must have previously been used to initialise the +// Viterbi decoder from which the path was produced. +void DiphoneUnitVoice::fillUnitRelation( EST_Relation *units, const EST_VTPath *path ) +{ + EST_Item *it=units->tail(); + + for ( ; path != 0 && it != 0; path=path->from, it=it->prev() ){ + EST_Track *coefs = new EST_Track; + CHECK_PTR(coefs); + EST_Wave *sig = new EST_Wave; + CHECK_PTR(sig); + int midf; + + getDiphone( path->c, coefs, sig, &midf, + it->f_present("extendLeft"), it->f_present("extendRight")); + + EST_Item *firstPhoneInDiphone = path->c->s; + it->set_val( "sig", est_val( sig ) ); + it->set_val( "coefs", est_val( coefs ) ); + it->set( "middle_frame", midf ); + it->set( "source_utt", firstPhoneInDiphone->relation()->utt()->f.S("fileid")); + it->set_val( "source_ph1", est_val( firstPhoneInDiphone )); + it->set( "source_end", firstPhoneInDiphone->F("end")); + it->set( "target_cost", path->c->score ); + + //have to recalculate join cost as it's not currently saved anywhere + if( path->from == 0 ) + it->set( "join_cost", 0.0); + else{ + // join cost between right edge of left diphone and vice versa + const DiphoneCandidate *l_diph = diphonecandidate(path->from->c->name); + const DiphoneCandidate *r_diph = diphonecandidate(path->c->name); + + it->set( "join_cost", (*jc)( l_diph, r_diph ) ); + } + } +} + +// The use of the globalFunctionPtr in this function is a really just a temporary hack +// necessary because the decoder as it stands at present can only take a function pointer +// (would be better to relax this restriction in the EST_Viterbi_Decoder class, or in a +// replacement class, rather than using this hack) +// static EST_VTPath* extendPath( EST_VTPath *p, EST_VTCandidate *c, +// EST_Features&) +// { +// EST_VTPath *np = new EST_VTPath; +// CHECK_PTR(np); + +// if( globalTempVoicePtr ==0 ) +// EST_error( "globalTempVoicePtr is not set, can't continue" ); + +// const EST_JoinCost &jcost = globalTempVoicePtr->getJoinCostCalculator(); + +// np->c = c; +// np->from = p; +// np->state = c->pos; + +// if ((p == 0) || (p->c == 0)) +// np->score = c->score; +// else{ +// // join cost between right edge of left diphone and vice versa +// np->score = p->score + c->score + jcost( p->c->s->next(), c->s ); +// } +// return np; +// } +static EST_VTPath* extendPath( EST_VTPath *p, EST_VTCandidate *c, + EST_Features&) +{ + EST_VTPath *np = new EST_VTPath; + CHECK_PTR(np); + + if( globalTempVoicePtr ==0 ) + EST_error( "globalTempVoicePtr is not set, can't continue" ); + + const EST_JoinCost &jcost = globalTempVoicePtr->getJoinCostCalculator(); + + np->c = c; + np->from = p; + np->state = c->pos; + + if ((p == 0) || (p->c == 0)) + np->score = c->score; + else{ + const DiphoneCandidate *l_diph = diphonecandidate(p->c->name); + const DiphoneCandidate *r_diph = diphonecandidate(c->name); + + // join cost between right edge of left diphone and vice versa + np->score = p->score + c->score + jcost( l_diph, r_diph ); + } + return np; +} + +// This function is a really just a temporary hack necessary because the decoder +// as it stands at present can only take a function pointer (would be better to relax +// this restriction in the EST_Viterbi_Decoder class, or in a replacement class, rather +// than using this hack) +static EST_VTCandidate* getCandidatesFunction( EST_Item *s, + EST_Features &f) +{ + DiphoneUnitVoice *duv = globalTempVoicePtr; + if( duv==0 ) + EST_error( "Candidate source voice is unset" ); + + return duv->getCandidates( s, f ); +} + +// Function which, given an item from the timeline relation that +// was originally used to initialise the EST_Viterbi_Decoder +// returns a pointer to a linked list of EST_VTCandidates +// (this is provided to the viterbi decoder upon its construction +// and (in)directly called by it as part of the decoding process...) +EST_VTCandidate* DiphoneUnitVoice::getCandidates( EST_Item *s, + EST_Features &f) const +{ + EST_VTCandidate *c = 0; + EST_VTCandidate *moduleListHead = 0; + EST_VTCandidate *moduleListTail = 0; + + // these objects [c/sh]ould be a parameter visible in the user's script + // land, and will be in future... + + // tc now a member + // EST_DefaultTargetCost default_target_cost; + // EST_TargetCost *tc = &default_target_cost; + // or + // EST_SchemeTargetCost scheme_target_cost(rintern( "targetcost")); + // EST_TargetCost *tc = &scheme_target_cost; + + EST_TList::Entries module_iter; + int nfound, total=0; + + //////////////////////////////////////////////////////////////// + // join linked list of candidates from each module into one list + for( module_iter.begin(voiceModules); module_iter; module_iter++ ){ + nfound = (*module_iter)->getCandidateList( *s, + tc, + tcdh, + tc_weight, + &moduleListHead, + &moduleListTail ); + if( nfound>0 ){ + moduleListTail->next = c; + c = moduleListHead; + total += nfound; + } + } + + if( total==0 ) + EST_error( "Couldn't find diphone %s", (const char*)s->S("name") ); + + if( verbosity() > 0 ) + printf( "Number of candidates found for target \"%s\": %d\n", + (const char*)s->S("name"), total ); + + if( ! ((tc_rescoring_beam == -1.0) || (tc_rescoring_weight <= 0.0)) ) + rescoreCandidates( c, tc_rescoring_beam, tc_rescoring_weight ); + + return c; +} + +void DiphoneUnitVoice::diphoneCoverage(const EST_String filename) const +{ + + EST_DiphoneCoverage dc; + EST_TList::Entries module_iter; + + // for each module + for( module_iter.begin(voiceModules); module_iter; module_iter++ ) + (*module_iter)->getDiphoneCoverageStats(&dc); + + dc.print_stats(filename); + +} + + + +bool DiphoneUnitVoice::synthesiseWave( EST_Utterance *utt ) +{ + getUnitSequence( utt ); + + return true; +} + + + +void DiphoneUnitVoice::getUnitSequence( EST_Utterance *utt ) +{ + EST_Relation *segs = utt->relation( "Segment" ); + EST_Relation *units = utt->create_relation( "Unit" ); + + if(!tcdh) + tcdh = new TCDataHash(20); + else + tcdh->clear(); + + // Initialise the Unit relation time index for decoder + EST_String diphone_name; + EST_StrList missing_diphones; + + EST_Item *it=segs->head(); + if( it == 0 ) + EST_error( "Segment relation is empty" ); + + bool extendLeftFlag = false; + for( ; it->next(); it=it->next() ) + { + EST_String l = it->S("name"); + EST_String r = it->next()->S("name"); + + EST_String diphone_name = EST_String::cat(l,"_",r); + EST_String orig = diphone_name; + + if(tc->is_flatpack()) + tcdh->add_item( it , ((EST_FlatTargetCost *)tc)->flatpack(it) ); + + + // First attempt back off: + // If missing diphone is an interword diphone, insert a silence! + // Perceptual results say this is prefered. + + if ( diphone_name != EST_String::Empty && + !this->unitAvailable(diphone_name) ) + { + EST_Item *s1,*s2; + EST_Item *w1=0,*w2=0; + + cout << "Missing diphone: "<< diphone_name << endl; + + if((s1 = parent(it,"SylStructure"))) + w1= parent(s1,"SylStructure"); + if( (s2 = parent(it->next(),"SylStructure"))) + w2= parent(s2,"SylStructure"); + + if( w1 && w2 && (w1 != w2) ) + { + EST_Item *sil; + + cout << " Interword so inseting silence.\n"; + + sil = it->insert_after(); + sil->set("name",ph_silence()); + + r = it->next()->S("name"); + diphone_name = EST_String::cat(l,"_",r); + + } + } + + + // Simple back off. + // Change diphone name for one we actually have. + + while(diphone_name != EST_String::Empty && + !this->unitAvailable(diphone_name) && + diphone_backoff_rules) + { + + cout << " diphone still missing, backing off: " << diphone_name << endl; + + diphone_name = diphone_backoff_rules->backoff(l,r); + l = diphone_name.before("_"); + r = diphone_name.after("_"); + + cout << " backed off: " << orig << " -> " << diphone_name << endl; + + if( verbosity() > 0 ){ + EST_warning("Backing off requested diphone %s to %s", + orig.str(), + diphone_name.str() ); + } + } + + + //// Complex backoff. Changes the segment stream to the right, + //// may still leave a discontinuity to the left. This could be + //// fixed, but it would requires a better search. Rob's thoughts + //// are that the simple method works better, unless it resorts to + //// a bad default rule. + + + // while(!this->unitAvailable(diphone_name) && + // diphone_backoff_rules && + // !diphone_backoff_rules->backoff(it)) + // diphone_name = EST_String::cat(it->S("name"),"_",it->next()->S("name")); + + if( !this->unitAvailable( diphone_name ) ){ + missing_diphones.append( diphone_name ); + if(units->tail()) + units->tail()->set( "extendRight", 1 ); + extendLeftFlag = true; // trigger for next unit to make up second half of missing diphone + } + else{ + EST_Item *t = units->append(); + t->set( "name", diphone_name ); + if(orig != diphone_name) + t->set( "missing_diphone",orig); + t->set_val( "ph1", est_val(it) ); + if( extendLeftFlag == true ){ + t->set( "extendLeft", 1 ); + extendLeftFlag = false; + } + } + } + + // stop if necessary units are still missing. + if( missing_diphones.length() > 0 ){ + for( EST_Litem *it=missing_diphones.head(); it!=0 ; it=it->next() ) + printf( "requested diphone missing: %s\n", missing_diphones(it).str() ); + + EST_warning("Making phone joins to compensate..."); + // EST_error("Unable to synthesise utterance due to missing diphones"); + } + + // Make the decoder do its thing + // -1 means number of states at each time point not fixed + EST_Viterbi_Decoder v( getCandidatesFunction, extendPath, -1 ); + + // turn on pruning if necessary + if( (pruning_beam>0) || (ob_pruning_beam>0) ) + v.set_pruning_parameters( pruning_beam, ob_pruning_beam ); + + // temporary hack necessary because decoder can only take a + // function pointer (would be better to relax this restriction in + // the EST_Viterbi_Decoder class, or in a replacement class, rather + // than using this hack) + globalTempVoicePtr = this; + + v.set_big_is_good(false); + + if( verbosity() > 0 ) + v.turn_on_trace(); + + v.initialise( units ); + v.search(); + + // take hold of the best path (end thereof) + EST_VTPath *bestp=0; + if( !v.result( &bestp ) ) + EST_error( "No best candidate sequence found" ); + + // fill in the best path features in the Unit Relation + fillUnitRelation( units, bestp ); + + my_parse_diphone_times( *units, *segs ); +} + + +///////////////////////////////////////////////////////////////////////////////////// +// Canned example experimental code (proof of concept rather than intelligently done) + +static inline bool itemListContainsItem( const ItemList* il, const EST_Item *item ) +{ + ItemList::Entries it; + + for( it.begin( *il ); it; it++ ) + if( (*it) == item ) + return true; + + return false; +} + + +static EST_VTCandidate* getCandidatesWithOmissionsFunction( EST_Item *s, EST_Features &f ) +{ + DiphoneUnitVoice *duv = globalTempVoicePtr; + if( duv==0 ) + EST_error( "Candidate source voice is unset" ); + + //get candidate list as usual + EST_VTCandidate *candlist = duv->getCandidates( s, f ); + + //filter out candidates on basis of omission list (yes, this is quite dumb) + if( s->f_present( "omitlist" ) ){ + + EST_warning( "omitlist found in unit %s", s->S("name").str() ); + + ItemList *omitlist = itemlist( s->f("omitlist") ); + + //until one candidate remains as head (to keep hold of list head) + while( candlist != 0 && itemListContainsItem( omitlist, candlist->s ) ){ + EST_VTCandidate *del_cand = candlist; + candlist = candlist->next; + del_cand->next = 0; //so deletion doesn't trigger total list deletion + delete del_cand; + } + + //then continue down list + EST_VTCandidate *prev = candlist; + EST_VTCandidate *cand = candlist->next; + while( cand!=0 ){ + if( itemListContainsItem( omitlist, cand->s ) ){ //delete cand on true + prev->next = cand->next; + cand->next = 0; //so deletion doesn't trigger total list deletion + delete cand; + cand = prev; + } + cand = cand->next; + } + + if( candlist == 0 ) + EST_error( "zero candidates remain after filtering" ); + + } + + return candlist; +} + +// For when the utterance already has the unit sequence, with certain candidates +// flagged as to be avoided, or mandatory and so on... +void DiphoneUnitVoice::regetUnitSequence( EST_Utterance *utt ) +{ + // Unit relation should already be in existence for decoder + EST_Relation *units = utt->relation( "Unit" ); + EST_Item *it=units->head(); + if( it == 0 ) + EST_error( "Unit relation is empty" ); + + // Make the decoder do its thing (again) + // -1 means number of states at each time point not fixed + EST_Viterbi_Decoder v( getCandidatesWithOmissionsFunction, extendPath, -1 ); + + // turn on pruning if necessary + if( (pruning_beam>0) || (ob_pruning_beam>0) ) + v.set_pruning_parameters( pruning_beam, ob_pruning_beam ); + + // temporary hack necessary because decoder can only take a + // function pointer (would be better to relax this restriction in + // the EST_Viterbi_Decoder class, or in a replacement class, rather + // than using this hack) + globalTempVoicePtr = this; + + v.set_big_is_good(false); + + if( verbosity() > 0 ) + v.turn_on_trace(); + + v.initialise( units ); + v.search(); + + // take hold of the best path (end thereof) + EST_VTPath *bestp=0; + if( !v.result( &bestp ) ) + EST_error( "No best candidate sequence found" ); + + // fill in the best path features in the Unit Relation + fillUnitRelation( units, bestp ); + + EST_Relation *segs = utt->relation("Segment"); + my_parse_diphone_times( *units, *segs ); +} + +// End canned example experimental code /////////////////////////////////////////// +/////////////////////////////////////////////////////////////////////////////////// + + +bool DiphoneUnitVoice::unitAvailable( const EST_String &diphone ) const +{ + EST_TList::Entries it; + + for( it.begin(voiceModules); it; it++ ) + if( (*it)->numAvailableCandidates(diphone) > 0 ) + return true; + + return false; +} + +unsigned int DiphoneUnitVoice::numAvailableCandidates( const EST_String &diphone ) const +{ + unsigned int number = 0; + EST_TList::Entries it; + + for( it.begin(voiceModules); it; it++ ) + number += (*it)->numAvailableCandidates(diphone); + + return number; +} + + +//////////////////////////////////////////////////////////////////////// +//////////////////////////////////////////////////////////////////////// +// special case of the above for utterances structures that are +// actually in the voice database, which doesn't do any search +// This is useful for doing copy synthesis of utterances (eg. +// to test out resynthesis, prosodic modification and so on) +void DiphoneUnitVoice::getCopyUnitUtterance( const EST_String &utt_fname, + EST_Utterance **utt_out ) const +{ + // need to find which, if any, voice module has this utterance + // in its list + EST_TList::Entries module_iter; + EST_Utterance *db_utt=0; + for( module_iter.begin(voiceModules); module_iter; module_iter++ ) + if( (*module_iter)->getUtterance(&db_utt, "fileid", utt_fname) == true ) + break; + + if( db_utt == 0 ) + EST_error( "Could not find Utterance %s in any voice module", + utt_fname.str() ); + else{ + // deep copy database utterance and fill in Unit relation + *utt_out = new EST_Utterance( *db_utt ); + CHECK_PTR(utt_out); + + EST_Utterance myUtt( *db_utt ); + + cerr << myUtt.relation_present( "Segment" ) << " " + << myUtt.num_relations() <relation_present( "Segment" ) << " " + << (*utt_out)->relation_present( "Segment" ) << " " + << (*utt_out)->num_relations() <relation( "Segment" ); + EST_Relation *units = (*utt_out)->create_relation( "Unit" ); + + // Initialise the Unit relation + fill in necessary/suitable + // synthesis parameters + EST_String ph1, ph2; + EST_Item *it = segs->tail(); + EST_Item *db_utt_seg_it = db_utt->relation( "Segment" )->tail(); + if( it == 0 ) + EST_error( "Segment relation is empty" ); + else{ + ph2 = it->S("name"); + while( ((it=it->prev())!=0) && + ((db_utt_seg_it=db_utt_seg_it->prev())!=0) ){ + EST_Track *coefs = new EST_Track; + CHECK_PTR(coefs); + EST_Wave *sig = new EST_Wave; + CHECK_PTR(sig); + int midf; + + (*module_iter)->getDiphone( db_utt_seg_it, coefs, sig, &midf ); + + ph1 = it->S("name"); + EST_Item *t = units->prepend(); + t->set( "name", EST_String::cat(ph1,"_",ph2) ); + t->set_val( "ph1", est_val(it) ); + t->set_val( "sig", est_val( sig ) ); + t->set_val( "coefs", est_val( coefs ) ); + t->set( "middle_frame", midf ); + t->set( "source_utt", db_utt->f.S("fileid")); + t->set_val( "source_ph1", est_val( db_utt_seg_it )); + t->set( "source_end", db_utt_seg_it->F("end")); + t->set( "target_cost", 0.0 ); + t->set( "join_cost", 0.0); + + ph2 = ph1; + } + } + my_parse_diphone_times( *units, *segs ); + + // this is for copy synthesis, so copy actual timings + //for( EST_Item *seg = segs->head(); it!=0; it=it->next() ) + //seg->set( "end", seg->F("source_end") ); + } +} + +//////////////////////////////////////////////////////////////////////// +//////////////////////////////////////////////////////////////////////// + + + +unsigned int DiphoneUnitVoice::numUnitTypes() const +{ + //necessary? + return 0; +} + +unsigned int DiphoneUnitVoice::numDatabaseUnits() const +{ + unsigned int sum=0; + + EST_TList::Entries it; + + for( it.begin( voiceModules ); it; it++ ) + sum += (*it)->numModuleUnits(); + + return sum; +} + + +////////////////////////////////////////////////////////////////////////// + +void DiphoneUnitVoice::set_diphone_backoff(DiphoneBackoff *dbo) +{ + if (diphone_backoff_rules) + delete diphone_backoff_rules; + diphone_backoff_rules = dbo; +} + + +int DiphoneUnitVoice::getPhoneList( const EST_String &phone, ItemList &list ) +{ + unsigned int n=0; + + EST_TList::Entries it; + for( it.begin( voiceModules ); it; it++ ) + n += (*it)->getPhoneList( phone, list ); + + return n; +} + + + +void DiphoneUnitVoice::precomputeJoinCosts( const EST_StrList &phones, bool verbose ) +{ + EST_StrList::Entries it; + for( it.begin( phones ); it; it++ ){ + ItemList *l = new ItemList; + CHECK_PTR(l); + + unsigned int n = getPhoneList( (*it), *l ); + + if( verbose==true ) + cerr << "phone " << (*it) << " " << n << " instances\n"; + + if( n>0 ){ + jc->computeAndCache( *l, true ); //verbose=true + } + else + EST_warning( "Phone %s not listed in voice", (*it).str() ); + + delete l; + } +} diff --git a/src/modules/MultiSyn/DiphoneUnitVoice.h b/src/modules/MultiSyn/DiphoneUnitVoice.h new file mode 100644 index 0000000..d4d9e89 --- /dev/null +++ b/src/modules/MultiSyn/DiphoneUnitVoice.h @@ -0,0 +1,226 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Korin Richmond */ +/* Date: Aug 2002 */ +/* --------------------------------------------------------------------- */ +/* first stab at a diphone unit selection "voice" - using a list of */ +/* utterance objects */ +/*************************************************************************/ + + +#ifndef __DIPHONEUNITVOICE_H__ +#define __DIPHONEUNITVOICE_H__ + +#include "VoiceBase.h" +#include "DiphoneBackoff.h" +#include "siod_defs.h" +#include "EST_Val_defs.h" +#include "EST_String.h" +#include "EST_FlatTargetCost.h" + +#include "EST_types.h" // for EST_StrList + +class EST_Utterance; +class EST_Relation; +class EST_VTCandidate; +class EST_VTPath; +class EST_Features; +class EST_Track; +class EST_Wave; +class EST_Item; +class DiphoneVoiceModule; +class EST_JoinCost; + + +#include "EST_THash.h" +template class EST_TList; +typedef EST_TList ItemList; + +SIOD_REGISTER_TYPE_DCLS(itemlist, ItemList) +VAL_REGISTER_TYPE_DCLS(itemlist, ItemList) + +SIOD_REGISTER_CLASS_DCLS(du_voice,DiphoneUnitVoice) +VAL_REGISTER_CLASS_DCLS(du_voice,DiphoneUnitVoice) + + +class DiphoneUnitVoice : public VoiceBase { +public: + DiphoneUnitVoice( const EST_StrList& basenames, + const EST_String& uttDir, + const EST_String& wavDir, + const EST_String& pmDir, + const EST_String& coefDir, + unsigned int srate = 16000, + const EST_String& uttExt = ".utt", + const EST_String& wavExt = ".wav", + const EST_String& pmExt = ".pm", + const EST_String& coefExt = ".coef" ); + + virtual ~DiphoneUnitVoice(); + + virtual void initialise( bool ignore_bad_tag=false ); + virtual unsigned int numDatabaseUnits() const; + virtual unsigned int numUnitTypes() const; + + virtual bool synthesiseWave( EST_Utterance *utt ); + + virtual void getUnitSequence( EST_Utterance *utt ); + + void regetUnitSequence( EST_Utterance *utt ); + + void getCopyUnitUtterance( const EST_String &utt_fname, + EST_Utterance **utt_out ) const; + + EST_VTCandidate* getCandidates( EST_Item *s, EST_Features &f ) const; + void diphoneCoverage(const EST_String filename) const; + + virtual bool unitAvailable( const EST_String &diphone ) const; + virtual unsigned int numAvailableCandidates( const EST_String &unit ) const; + + unsigned int numModules() const { return voiceModules.length(); } + + bool addVoiceModule( const EST_StrList& basenames, + const EST_String& uttDir, + const EST_String& wavDir, + const EST_String& pmDir, + const EST_String& coefDir, + unsigned int srate = 16000, + const EST_String& uttExt = ".utt", + const EST_String& wavExt = ".wav", + const EST_String& pmExt = ".pm", + const EST_String& coefExt = ".coef" ); + + + // assume responsibility to delete vm when done with it + void registerVoiceModule( DiphoneVoiceModule *vm ); + + // del=true means it's ok to delete the join cost when we're done + // with it + void setJoinCost( EST_JoinCost *jcost, bool del=false ); + const EST_JoinCost& getJoinCostCalculator( ) const { return *jc; } + + void setTargetCost( EST_TargetCost *tcost, bool del=false ); + const EST_TargetCost& getTargetCostCalculator( ) const { return *tc; } + + void set_pruning_beam( float width ) { pruning_beam=width; } + float get_pruning_beam( ) const { return pruning_beam; } + + void set_ob_pruning_beam( float width ){ ob_pruning_beam=width; } + float get_ob_pruning_beam( ) const { return ob_pruning_beam; } + + void set_jc_f0_weight( float val ) { jc_f0_weight=val; } + float get_jc_f0_weight() { return jc_f0_weight; } + EST_JoinCost * get_jc() { return jc; } + + void set_jc_power_weight( float val ) { jc_power_weight=val; } + float get_jc_power_weight() { return jc_power_weight; } + + void set_jc_spectral_weight( float val ) { jc_spectral_weight=val; } + float get_jc_spectral_weight() { return jc_spectral_weight; } + + void set_tc_rescoring_beam( float width ){ tc_rescoring_beam = width; } + float get_tc_rescoring_beam( ) const { return tc_rescoring_beam; } + + void set_tc_rescoring_weight( float weight ){ tc_rescoring_weight = weight; } + float get_tc_rescoring_weight( ) const { return tc_rescoring_weight; } + + void set_target_cost_weight( float w ){ tc_weight=w; } + float get_target_cost_weight() const { return tc_weight; } + + void set_join_cost_weight( float w ){ jc_weight=w; } + float get_join_cost_weight() const { return jc_weight; } + + void set_prosodic_modification( int m ){ prosodic_modification=m; } + int get_prosodic_modification() const { return prosodic_modification; } + + void set_wav_samplerate( unsigned int sr ) { wav_srate = sr; } + unsigned int get_wav_samplerate( ) const { return wav_srate; } + + void precomputeJoinCosts( const EST_StrList &phones, bool verbose=true ); + +private: + // don't allow copying of Voices (for now?) + DiphoneUnitVoice( const DiphoneUnitVoice& ); + DiphoneUnitVoice& operator=( const DiphoneUnitVoice& ); + + void addToCatalogue( const EST_Utterance *utt ); + + void getDiphone( const EST_VTCandidate *cand, + EST_Track* coef, EST_Wave* sig, int *midframe, + bool extendLeft=0, bool extendRight=0 ); + + int getPhoneList( const EST_String &phone, ItemList &list ); + + void fillUnitRelation( EST_Relation *units, const EST_VTPath *path ); + +private: + EST_TList voiceModules; + float pruning_beam; // beam pruning + float ob_pruning_beam; // observation beam pruning + + float tc_rescoring_beam; + float tc_rescoring_weight; + + float tc_weight; + float jc_weight; + + float jc_f0_weight; // join cost f0 weight + float jc_power_weight; // join cost f0 weight + float jc_spectral_weight; // join cost spectral weight + + int prosodic_modification; + + unsigned int wav_srate; + + EST_JoinCost *jc; + bool jc_delete; + + EST_TargetCost *tc; + bool tc_delete; + + TCDataHash *tcdh; + +private: + DiphoneBackoff *diphone_backoff_rules; // diphone backoff rules + +public: + void set_diphone_backoff(DiphoneBackoff *dbo); + +}; + + +#endif // __DIPHONEUNITVOICE_H__ + diff --git a/src/modules/MultiSyn/DiphoneVoiceModule.cc b/src/modules/MultiSyn/DiphoneVoiceModule.cc new file mode 100644 index 0000000..c228eb9 --- /dev/null +++ b/src/modules/MultiSyn/DiphoneVoiceModule.cc @@ -0,0 +1,640 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Korin Richmond */ +/* Date: Aug 2002 */ +/* --------------------------------------------------------------------- */ +/* A diphone unit selection "voice module" */ +/* (implemented using a list of utterance objects) */ +/*************************************************************************/ + +#include "DiphoneVoiceModule.h" +#include "EST_TargetCost.h" +#include "EST_viterbi.h" +#include "EST_rw_status.h" +#include "EST_Track.h" +#include "EST_track_aux.h" +#include "EST_Wave.h" +#include "EST_THash.h" +#include "EST_TList.h" +#include "EST_types.h" +#include "ling_class/EST_Utterance.h" +#include "siod.h" +#include "siod_est.h" +#include "safety.h" +#include + +#include "EST_Val.h" + +// from src/modules/UniSyn_diphone/us_diphone.h +// this won't be staying here long... +void parse_diphone_times(EST_Relation &diphone_stream, + EST_Relation &source_lab); + +SIOD_REGISTER_CLASS(du_voicemodule,DiphoneVoiceModule) +VAL_REGISTER_CLASS(du_voicemodule,DiphoneVoiceModule) + +VAL_REGISTER_CLASS(diphonecandidate,DiphoneCandidate) + +// defined in a single place to avoid inconsistency. +// Given a phone segment item, return the standard cut point +// time, calculated in the standard way. +float getJoinTime( const EST_Item *seg ) +{ + float midt=0.0; + + // hack to avoid overhead of string creation and deletion + // (EST feature access should really be changed to take + // const char* instead of const EST_String& ) + static const EST_String cl_end_str( "cl_end" ); + static const EST_String dipth_str( "dipth" ); + static const EST_String start_str( "start" ); + + // work out boundary for diphone join + if( seg->f_present(cl_end_str) ) // join at cl_end point for stops + midt = seg->features().val("cl_end").Float(); + else if( seg->f_present(dipth_str) ) // join at 25% through a diphthong + midt = 0.75*seg->F(start_str) + + 0.25*seg->features().val("end").Float(); + else + midt = ( seg->F(start_str) + + seg->features().val("end").Float() ) / 2.0; + + return midt; +} + +DiphoneVoiceModule::DiphoneVoiceModule( const EST_StrList& basenames, + const EST_String& uttDir, + const EST_String& wavDir, + const EST_String& pmDir, + const EST_String& coefDir, + unsigned int sr, + const EST_String& uttExt, + const EST_String& wavExt, + const EST_String& pmExt, + const EST_String& coefExt ) + + : fileList( basenames ), + utt_dir ( uttDir ), + utt_ext ( uttExt ), + pm_dir( pmDir ), + pm_ext( pmExt ), + coef_dir( coefDir ), + coef_ext( coefExt ), + wave_dir( wavDir ), + wave_ext( wavExt ), + wav_srate( sr ), + tcdatahash ( 0 ), + utt_dbase( 0 ), + catalogue( 0 ) +{ + +} + +void DiphoneVoiceModule::addCoefficients( EST_Relation *segs, const EST_Track& coefs ) +{ + float startt, midt, endt; + EST_FVector *startf, *midf, *endf; + const int num_coefs = coefs.num_channels(); + + // hack to avoid overhead of string creation and deletion + // (EST feature access should really be changed to take + // const char* instead of const EST_String& ) + static const EST_String startcoef_str("startcoef"); + static const EST_String midcoef_str("midcoef"); + static const EST_String endcoef_str("endcoef"); + static const EST_String start_str("start"); + + EST_Item *seg=segs->head(); + startt = seg->F(start_str); + + startf = new EST_FVector(num_coefs); + CHECK_PTR(startf); + coefs.copy_frame_out(coefs.index(startt), *startf); //this one not shared + + for( ; seg!=0; seg=seg->next() ){ + + // work out boundary for diphone join + midt = getJoinTime( seg ); + + // copy frames out and set as features + seg->features().set_val( startcoef_str, est_val(startf) ); + + midf = new EST_FVector(num_coefs); + CHECK_PTR(midf); + coefs.copy_frame_out(coefs.index(midt), *midf); + seg->features().set_val( midcoef_str, est_val(midf) ); + + endt = seg->features().val("end").Float(); + endf = new EST_FVector(num_coefs); + CHECK_PTR(endf); + coefs.copy_frame_out(coefs.index(endt), *endf); + seg->features().set_val( endcoef_str, est_val(endf) ); + + startf = endf; // phones share frame at phone boundary (reference counted in EST_Val) + } +} + +void DiphoneVoiceModule::flatPack( EST_Relation *segs, + const EST_TargetCost *tc ) const +{ + + const EST_FlatTargetCost *ftc = (EST_FlatTargetCost *)tc; + + for( EST_Item *seg=segs->head(); seg->next() !=0; seg=seg->next() ) + tcdatahash->add_item(seg, ftc->flatpack(seg)); + +} + +void DiphoneVoiceModule::initialise( const EST_TargetCost *tc, bool ignore_bad_tag ) +{ + EST_Utterance *u=0; + EST_Relation *segs=0; + + tcdatahash = new TCDataHash(500); + + utt_dbase = new EST_TList; + CHECK_PTR(utt_dbase); + + catalogue = new EST_TStringHash( 2500 ); + CHECK_PTR(catalogue); + + int numIgnoredPhones=0; + + if(ignore_bad_tag) + EST_warning( "Looking for bad flags"); + else + EST_warning( "Ignoring bad flags"); + + + for( EST_Litem *it=fileList.head(); it!=0 ; it=it->next() ){ + u = new EST_Utterance; + CHECK_PTR(u); + + if( (u->load(utt_dir+fileList(it)+utt_ext)) != read_ok ) + EST_error( "Couldn't load utterance %s\n", + (const char*)fileList(it) ); + + segs = u->relation( "Segment" ); + + // add join cost coefficients (at middle of phones) + EST_Track coefs; + if( (coefs.load((coef_dir+fileList(it)+coef_ext))) != read_ok ) + EST_error( "Couldn't load data file %s", + (const char*) (coef_dir+fileList(it)+coef_ext) ); + + addCoefficients( segs, coefs ); + + if (tc->is_flatpack()) + { + flatPack(segs,tc); + u->remove_relation("Token"); + u->remove_relation("Word"); + u->remove_relation("Phrase"); + u->remove_relation("Syllable"); + u->remove_relation("SylStructure"); + u->remove_relation("IntEvent"); + u->remove_relation("Intonation"); + } + + addToCatalogue( u, &numIgnoredPhones, ignore_bad_tag ); + utt_dbase->append( u ); + } + + if(ignore_bad_tag) + EST_warning( "Ignored %d phones with bad flag set\n", numIgnoredPhones ); +} + +DiphoneVoiceModule::~DiphoneVoiceModule() +{ + if( utt_dbase != 0 ){ + EST_Litem *it = utt_dbase->head(); + for( ; it!=0 ; it=it->next() ) + delete (*utt_dbase)(it); + delete utt_dbase; + } + + delete catalogue; + + if(tcdatahash) + delete tcdatahash; + +} + +void DiphoneVoiceModule::addToCatalogue( const EST_Utterance *utt, int *num_ignored, bool ignore_bad ) +{ + EST_Item *item, *next_item; + ItemList *diphoneList; + const EST_String *ph1, *ph2; + int found=0; + + static const EST_String bad_str( "bad" ); + + item = (utt->relation( "Segment" ))->tail(); + if( item!=0 ){ + ph2 = &(item->features().val("name").String()); + + while( (item=item->prev()) != 0 ){ + + next_item = item->next(); + + // You'd think we need to check both item->f_present(bad_str) and + // next_item->f_present(bad_str) like this: + //if((item->f_present(bad_str) || next_item->f_present(bad_str)) && ignore_bad == true){ + // But experiment showed that then each time one diphone too many would be + // ignored. This was partly compensated by a bug pesent up to r1.14 + // (a iteration within "if(item=item->prev()!=0)" just before the "continue") + // which caused the leftmost bad phone in a row of bad phones NOT to be ignored + // when the length of the row was even (or when it was odd and ended in the + // utterance-final phone, which is never checked for badness). + if(item->f_present(bad_str) && ignore_bad == true){ + + (*num_ignored)++; + + EST_warning( "Ignoring diphone \"%s_%s\" (LEFT %s in %s at %fs, bad flag \"%s\")", + item->S("name").str(), + next_item->S("name").str(), + item->S("name").str(), + utt->f.S("fileid").str(), + item->F("end"), + item->S("bad").str() ); + + if(item->prev() != 0){ + continue; + } + else + break; //already at start of list, so finish up + } + + ph1 = &(item->features().val("name").String()); + +// EST_warning( "Adding phone \"%s\" (%s, %f) to diphoneList %s_%s", +// item->S("name").str(), +// utt->f.S("fileid").str(), +// item->F("end"), +// item->S("name").str(), +// next_item->S("name").str()); + + diphoneList = catalogue->val(EST_String::cat(*ph1,"_",*ph2), found); + + if( !found ){ + diphoneList = new ItemList; + CHECK_PTR(diphoneList); + catalogue->add_item(EST_String::cat(*ph1,"_",*ph2), diphoneList, 1); // no_search=1 + } + + diphoneList->append( item ); + + ph2 = ph1; + } + } +} + +void DiphoneVoiceModule::getDiphone( const EST_Item *phone1, + EST_Track* coef, EST_Wave* sig, int *midframe, + bool extendLeft, bool extendRight ) const +{ + EST_Item *phone2 = phone1->next(); + + // load the relevant parts + const EST_String &fname = phone1->relation()->utt()->f.val("fileid").String(); + + static const EST_String start_str( "start" ); + + float startt,midt,endt; + + if( extendLeft==true ) + startt = phone1->F(start_str); + else + startt = getJoinTime( phone1 ); + + midt = phone1->features().val("end").Float(); + + if( extendRight==true ) + endt = phone2->features().val("end").Float(); + else + endt = getJoinTime( phone2 ); + + // get pitchmarks for pitch synchronous synthesis + EST_Track *tempcoef = new EST_Track; + CHECK_PTR(tempcoef); + if( (tempcoef->load((pm_dir+fname+pm_ext))) != read_ok ) + EST_error( "Couldn't load data file %s", + (const char*) (pm_dir+fname+pm_ext) ); + + // following few lines effectively moves segment boundaries to + // line up with pitch periods. + int copy_start = tempcoef->index( startt ); + int copy_end = tempcoef->index( endt ); + //copy_end -= 1; //so that adjacent units don't start and end with same frame + + int copy_len = copy_end - copy_start; + //int copy_len = copy_end - copy_start + 1; + + startt = tempcoef->t( copy_start ); + endt = tempcoef->t( copy_end ); + + if( copy_len == 0 ){ + EST_warning( "%s(%f->%f): %s_%s diphone length means 1 pitchmark will be duplicated", + fname.str(), startt, endt, phone1->S("name").str(), phone2->S("name").str() ); + copy_len=1; + } + else if( copy_len < 0 ){ + EST_error( "%s(%f->%f): %s_%s diphone length renders %d pitchmark", + fname.str(), startt, endt, phone1->S("name").str(), phone2->S("name").str(), copy_len ); + } + + tempcoef->copy_sub_track( *coef, copy_start, copy_len ); + + *midframe = coef->index( midt ); + + // adjust timing, which Festival synthesis code makes assumptions about + // SPECIFICALLY, the unisyn module wants all units to start from + // the first value above 0.0 (as the first pitch mark) + float t_off = (copy_start!=0) ? tempcoef->t(copy_start-1) : 0.0; + int nframes = coef->num_frames(); + for( int i=0; it(i) -= t_off; + + //start waveform at previous pitchmark (this is period approximation used) + int st_sample = (int)rint( t_off * (float) wav_srate ); + + //preferably end waveform at following pitchmark (follows convention in UniSyn module) + int end_sample; + if( copy_end < tempcoef->num_frames() ) + end_sample = (int) rint( tempcoef->t(copy_end) * (float) wav_srate ); + //if( copy_end+1 < tempcoef->num_frames() ) + // end_sample = (int) rint( tempcoef->t(copy_end+1) * (float) wav_srate ); + else{ + // estimate from previous pitch mark shift + int pp_centre_sample = (int) rint( endt * (float) wav_srate ); + int pp_first_sample = (int) rint( tempcoef->t(copy_end) * (float) wav_srate ); + //int pp_first_sample = (int) rint( tempcoef->t(copy_end-1) * (float) wav_srate ); + end_sample = (2*pp_centre_sample)-pp_first_sample; + } + + // (obviously, we would want to load and cache any files // + // which haven't been loaded yet, rather than just load // + // the parts each and every time) // + if( sig->load( wave_dir+fname+wave_ext, // + st_sample, end_sample-st_sample+1) != read_ok ) // + EST_error( "Couldn't load data file %s", // + (const char*) (wave_dir+fname+wave_ext) ); // + + delete tempcoef; +} + + +inline EST_VTCandidate* makeCandidate( const EST_Item *target_ph1, + const EST_Item *cand_ph1, + const EST_TargetCost *tc, + const TCData *tcd, + const TCDataHash *tcdatahash, + float tc_weight, + const DiphoneVoiceModule *dvm_p ) +{ + // hack to avoid overhead of string creation and deletion + // (EST feature access should really be changed to take + // const char* instead of const EST_String& ) + static const EST_String extendLeft_str("extendLeft"); + static const EST_String extendRight_str("extendRight"); + static const EST_String jccid_str("jccid"); + + EST_VTCandidate *c = new EST_VTCandidate; + CHECK_PTR(c); + + EST_Item *cand_ph2 = cand_ph1->next(); + + // set up all the members we can here + c->s = const_cast(cand_ph1); + + EST_FVector *left, *right; + if( target_ph1->f_present( extendLeft_str ) ) + left = fvector( cand_ph1->features().val( "startcoef" ) ); + else + left = fvector( cand_ph1->features().val( "midcoef" ) ); + + if( target_ph1->next()->f_present( extendRight_str ) ) + right = fvector( cand_ph2->features().val( "endcoef" ) ); + else + right = fvector( cand_ph2->features().val( "midcoef" ) ); + + // an abuse of the "name" EST_Val member to store data we want instead + // of what is intended to go there + // (will become unnecessary with a more general candidate class) + DiphoneCandidate *cand = new DiphoneCandidate( cand_ph1, dvm_p, left, right ); + CHECK_PTR(cand); + c->name = est_val( cand ); //to get synthesis parameters (deleted by EST_Val c->name) + + if( cand_ph1->f_present( jccid_str ) ){ + cand->ph1_jccid = cand_ph1->features().val( "jccid" ).Int(); + cand->ph1_jccindex = cand_ph1->features().val( "jccindex" ).Int(); + } + + if( cand_ph2->f_present( jccid_str ) ){ + cand->ph1_jccid = cand_ph2->features().val( "jccid" ).Int(); + cand->ph1_jccindex = cand_ph2->features().val( "jccindex" ).Int(); + } + + if(tc->is_flatpack()) + c->score = tc_weight* + ((const EST_FlatTargetCost *)tc) + ->operator()( tcd, + tcdatahash->val( const_cast(cand_ph1) ) ); + else + c->score = tc_weight*tc->operator()( target_ph1, cand_ph1 ); + + + return c; +} + +inline void itemListToCandidateList( ItemList::Entries &it, + EST_VTCandidate **head, + EST_VTCandidate **tail, + const EST_Item *target_ph1, + const EST_TargetCost *tc, + const TCDataHash *tcdh, + const TCDataHash *tcdatahash, + float tc_weight, + int count, + const DiphoneVoiceModule *dvm_p ) + +{ + int i=0; + + if( count > 0 ){ + TCData *tcd = tcdh->val( const_cast(target_ph1) ); + EST_VTCandidate *nextc = 0; + + // make last one first + EST_VTCandidate *c = makeCandidate( target_ph1, (*it), tc, tcd, tcdatahash, tc_weight, dvm_p ); + c->next = nextc; + *tail = c; + + // then iterate back prepending to linked list + // (order reversed because using c->next) + nextc = c; + it++; i++; + for( ; (it && inext = nextc; + nextc = c; + } + + *head = c; // keep hold of last one set up + } + + return; +} + +int DiphoneVoiceModule::getCandidateList( const EST_Item& target, + const EST_TargetCost* tc, + const TCDataHash *tcdh, + float tc_weight, + EST_VTCandidate **head, + EST_VTCandidate **tail ) const +{ + int nfound = 0; + const EST_Item *target_ph1 = item(target.f("ph1")); + + int found = 0; + const ItemList *candidateItemList = catalogue->val( target.S("name"), found ); + if( found != 0 ){ + nfound = candidateItemList->length(); + + ItemList::Entries it = ItemList::Entries(*candidateItemList); + + itemListToCandidateList( it, + head, tail, + target_ph1, + tc, tcdh, tcdatahash, tc_weight, + nfound, this ); + } + + return nfound; +} + + +int DiphoneVoiceModule::getPhoneList( const EST_String &phone, ItemList &list ) +{ + unsigned int n=0; + + if( utt_dbase != 0 ){ + for( EST_Litem *it=utt_dbase->head(); it!=0 ; it=it->next() ){ + EST_Item *ph=(*utt_dbase)(it)->relation("Segment")->head(); + for( ; ph!=0; ph=ph->next() ){ + if( ph->S("name") == phone ){ + list.append( ph ); + n++; + } + } + } + } + + return n; +} + +bool DiphoneVoiceModule::getUtterance( EST_Utterance** utt, int n ) const +{ + if( n<0 || n>(utt_dbase->length()-1) ) + EST_error( "Utterance index out of bounds" ); + + if( utt == 0 ) + EST_error( "Invalid utterance" ); + + // deep copy the utterance in question + *utt = new EST_Utterance( *(utt_dbase->nth(n)) ); + CHECK_PTR(utt); + + return true; +} + + +bool DiphoneVoiceModule::getUtterance( EST_Utterance **utt, + const EST_String &feat_name, + const EST_Val &value ) const +{ + //search down list of utterance structures, comparing + // fileid feature. If find a match, return pointer to that + // utterance. + for( EST_Litem *it=utt_dbase->head(); it!=0 ; it=it->next() ) + if( (*utt_dbase)(it)->f.val(feat_name) == value ){ + *utt = (*utt_dbase)(it); + return true; + } + + return false; +} + +void DiphoneVoiceModule::getDiphoneCoverageStats(EST_DiphoneCoverage *dc) const +{ + for( EST_Litem *it=utt_dbase->head(); it!=0 ; it=it->next() ) + dc->add_stats((*utt_dbase)(it)); +} + + + +unsigned int DiphoneVoiceModule::numUnitTypes() const +{ + return catalogue ? catalogue->num_entries() : 0; +} + +unsigned int DiphoneVoiceModule::numModuleUnits() const +{ + unsigned int sum=0; + + if( catalogue != 0 ){ + EST_TStringHash::Entries it; + + for( it.begin( *catalogue ); it; it++ ) + sum += it->v->length(); //EST_UList.length() counts the entries :( + } + + return sum; +} + + +unsigned int DiphoneVoiceModule::numAvailableCandidates( const EST_String &unit ) const +{ + int number=0; + + int found=0; + const ItemList *candidateItemList = catalogue->val( unit, found ); + + if( found > 0 ) + number = candidateItemList->length(); + + return number; +} diff --git a/src/modules/MultiSyn/DiphoneVoiceModule.h b/src/modules/MultiSyn/DiphoneVoiceModule.h new file mode 100644 index 0000000..324c742 --- /dev/null +++ b/src/modules/MultiSyn/DiphoneVoiceModule.h @@ -0,0 +1,196 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Korin Richmond */ +/* Date: Aug 2002 */ +/* --------------------------------------------------------------------- */ +/* A diphone unit selection "voice module" */ +/* (implemented using a list of utterance objects) */ +/*************************************************************************/ + +#ifndef __DIPHONEVOICEMODULE_H__ +#define __DIPHONEVOICEMODULE_H__ + +#include "VoiceModuleBase.h" +#include "EST_DiphoneCoverage.h" +#include "siod_defs.h" +#include "EST_Val_defs.h" +#include "EST_String.h" + +#include "EST_viterbi.h" + +#include "EST_types.h" // for EST_StrList + +#include "EST_FlatTargetCost.h" + +class EST_Utterance; +class EST_Relation; +class EST_VTCandidate; +class EST_VTPath; +class EST_Features; +class EST_Track; +class EST_Wave; +class EST_Item; + +// return standard join point time for this segment +// (one half of a diphone) +float getJoinTime( const EST_Item *seg ); + +//template class EST_TStringHash; +#include "EST_THash.h" +template class EST_TList; +typedef EST_TList ItemList; + + +SIOD_REGISTER_CLASS_DCLS(du_voicemodule,DiphoneVoiceModule) +VAL_REGISTER_CLASS_DCLS(du_voicemodule,DiphoneVoiceModule) + +// following is necessary to make some of a candidate's information +// available in a faster way that EST_Item feature lookups (critically +// the join cost coefficients EST_FVectors for example) +// i.e. yet another temporary hack... (would be better if EST_viterbi +// code allowed other things apart from just EST_Item* in order to +// perform the search) +class DiphoneCandidate { +public: + DiphoneCandidate( const EST_Item *phone1, + const DiphoneVoiceModule *p, + const EST_FVector *left, + const EST_FVector *right ) + : ph1(phone1), dvm( p ), l_coef(left), r_coef(right), + ph1_jccid(-1), ph1_jccindex(-1), ph2_jccid(-1), ph2_jccindex(-1){}; + + const EST_Item *ph1; + const DiphoneVoiceModule *dvm; + const EST_FVector *l_coef; + const EST_FVector *r_coef; + int ph1_jccid, ph1_jccindex; + int ph2_jccid, ph2_jccindex; +}; + +VAL_REGISTER_CLASS_DCLS(diphonecandidate,DiphoneCandidate) + +class DiphoneVoiceModule : public VoiceModuleBase { +public: + DiphoneVoiceModule( const EST_StrList& basenames, + const EST_String& uttDir, + const EST_String& wavDir, + const EST_String& pmDir, + const EST_String& coefDir, + unsigned int srate = 16000, + const EST_String& uttExt = ".utt", + const EST_String& wavExt = ".wav", + const EST_String& pmExt = ".pm", + const EST_String& coefExt = ".coef" ); + + virtual ~DiphoneVoiceModule(); + + virtual void initialise(const EST_TargetCost *tc, bool ignore_bad_tag=false ); + virtual unsigned int numModuleUnits() const; + virtual unsigned int numUnitTypes() const; + virtual unsigned int numAvailableCandidates( const EST_String &unit ) const; + + + ///// Some "debugging" functions - deliberately don't mind doing + // slow things like returning copies of things. Such functions are + // not intended to do important things, but just to make it easier + // to work out whats "in" the voice database object. + + // return copy of utterance number + bool getUtterance( EST_Utterance **utt, int n ) const; + + + // return pointer to utterance which has feature "feat_name" + // set to value "value" + bool getUtterance( EST_Utterance **utt, + const EST_String &feat_name, + const EST_Val &value ) const; + void getDiphoneCoverageStats(EST_DiphoneCoverage *dc) const; + +// int DiphoneVoiceModule::getCandidateList( const EST_Item& target, +// const EST_TargetCost& tc, +// EST_VTCandidate *head, +// EST_VTCandidate *tail ) const; + + + + int getCandidateList( const EST_Item& target, + const EST_TargetCost *tc, + const TCDataHash *tcdh, + const float tc_weight, + EST_VTCandidate **head, + EST_VTCandidate **tail ) const; + + // append all instances of a certain phone present in the utterances + // in this voice. Returns the number added + int getPhoneList( const EST_String &phone, ItemList &list ); + +private: + // don't allow copying of Voices (for now?) + DiphoneVoiceModule( const DiphoneVoiceModule& ); + DiphoneVoiceModule& operator=( const DiphoneVoiceModule& ); + + // Flatpack + void flatPack( EST_Relation *segs, const EST_TargetCost *tc) const; + + void addCoefficients( EST_Relation *segs, const EST_Track& coefs ); + void addToCatalogue( const EST_Utterance *utt, int *num_ignored, bool ignore_bad=false ); + void getDiphone( const EST_Item *phone1, + EST_Track* coef, EST_Wave* sig, int* midframe, + bool extendLeft=0, bool extendRight=0 ) const; + + friend class DiphoneUnitVoice; + +private: + EST_StrList fileList; + EST_String utt_dir; // utterance files + EST_String utt_ext; + EST_String pm_dir; // pitch marks + EST_String pm_ext; + EST_String coef_dir; // for coefficients that aren't pitch syncronous + EST_String coef_ext; + EST_String wave_dir; // waveform (or residual) + EST_String wave_ext; + + unsigned int wav_srate; //sample rate of voice waveform data + + TCDataHash *tcdatahash; + + EST_TList *utt_dbase; + EST_TStringHash *catalogue; +}; + +#endif // __DIPHONEVOICEMODULE_H__ + diff --git a/src/modules/MultiSyn/EST_DiphoneCoverage.cc b/src/modules/MultiSyn/EST_DiphoneCoverage.cc new file mode 100644 index 0000000..9247d0e --- /dev/null +++ b/src/modules/MultiSyn/EST_DiphoneCoverage.cc @@ -0,0 +1,170 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) */ +/* */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Rob Clark */ +/* Date: April 2005 */ +/* --------------------------------------------------------------------- */ +/* */ +/* */ +/* */ +/* */ +/* */ +/* */ +/*************************************************************************/ +#include +#include "EST_rw_status.h" +#include "festival.h" +#include "ling_class/EST_Item.h" +#include "EST_TargetCost.h" +#include "EST_DiphoneCoverage.h" + +/* + * This file contains some functions which may be useful for diphone + * selection. Not quite sure exactly how yet... + * Target cost should probably be rewritten in terms of these functions. + */ + + + +static EST_String pos_map[4] = + { "INI", "MED", "FIN", "INT" } ; + + +static EST_String stress_map[4] = + { "UU" , "US" , "SU" , "SS"} ; + +static EST_String get_diphone_name(const EST_Item *seg1); +static int get_stress_index(const EST_Item *seg1); +static EST_String get_stress_name(const int index); +static int get_syl_pos_index(const EST_Item *seg1); +static EST_String get_syl_pos_name(const int index); + + +void EST_DiphoneCoverage::add_stats(const EST_Utterance *utt) +{ + EST_Relation *segs = utt->relation("Segment"); + EST_Item *it=segs->head(); + + for( ; it->next(); it=it->next() ) + if(it->next()) + { + EST_String key = + EST_String::cat(get_diphone_name(it),"-", + get_stress_name(get_stress_index(it)),"-", + get_syl_pos_name(get_syl_pos_index(it))); + int val = 0; + if (strhash.present(key)) + { + val = strhash.val(key); + strhash.remove_item(key); + } + ++val; + strhash.add_item(key,val); + } +} + +void EST_DiphoneCoverage::print_stats(const EST_String filename) +{ + ostream *outf; + + if (filename == "-") + outf = &cout; + else + outf = new ofstream(filename); + + EST_THash::Entries them; + + for(them.begin(strhash); them; them++) + *outf << them->k << " " << them->v << "\n"; + + if (outf != &cout) + delete outf; +} + + + + + +static EST_String get_diphone_name(const EST_Item *seg1) +{ + return EST_String::cat(seg1->S("name"),"_",seg1->next()->S("name")); +} + + +static int get_stress_index(const EST_Item *seg1) +{ + int i1 = 0, i2=0; + + if( ph_is_vowel(seg1->S("name")) && + !ph_is_silence(seg1->S("name")) ) + i1 = (parent(seg1,"SylStructure")->I("stress") > 0) ? 1 : 0; + + if( ph_is_vowel(seg1->next()->S("name")) && + !ph_is_silence(seg1->next()->S("name")) ) + i2 = (parent(seg1->next(),"SylStructure")->I("stress") > 0) ? 1 : 0; + + return i2+2*i1; +} + +static EST_String get_stress_name(const int index) +{ + return stress_map[index] ; +} + +static int get_syl_pos_index(const EST_Item *seg1) +{ + int pos = TCPOS_MEDIAL; + + const EST_Item *syl = parent(seg1,"SylStructure"); + const EST_Item *next_syl = parent(seg1->next(),"SylStructure"); + const EST_Item *next_next_syl = parent(seg1->next()->next(),"SylStructure"); + const EST_Item *prev_syl = parent(seg1->prev(),"SylStructure"); + + if( syl != next_syl ) + pos = TCPOS_INTER; + else if( syl != prev_syl) + pos = TCPOS_INITIAL; + else if( next_syl != next_next_syl) + pos = TCPOS_FINAL; + + return pos; +} + +static EST_String get_syl_pos_name(const int index) +{ + return pos_map[index] ; +} + + diff --git a/src/modules/MultiSyn/EST_DiphoneCoverage.h b/src/modules/MultiSyn/EST_DiphoneCoverage.h new file mode 100644 index 0000000..2c49c8f --- /dev/null +++ b/src/modules/MultiSyn/EST_DiphoneCoverage.h @@ -0,0 +1,68 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) */ +/* */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Rob Clark */ +/* Date: April 2005 */ +/* --------------------------------------------------------------------- */ +/* */ +/* */ +/* */ +/* */ +/* Diphone Coverage stats class */ +/* */ +/*************************************************************************/ + +#ifndef __EST_DIPHONECOVERAGE_H__ +#define __EST_DIPHONECOVERAGE_H__ + +#include "EST_THash.h" + + +class EST_DiphoneCoverage { + + public: + EST_DiphoneCoverage() : strhash(100) {}; + + private: + EST_TStringHash strhash; + + public: + void add_stats(const EST_Utterance *utt); + void print_stats(const EST_String filename="-"); + + +}; + +#endif // __EST_DIPHONECOVERAGE_H__ diff --git a/src/modules/MultiSyn/EST_FlatTargetCost.cc b/src/modules/MultiSyn/EST_FlatTargetCost.cc new file mode 100644 index 0000000..d855869 --- /dev/null +++ b/src/modules/MultiSyn/EST_FlatTargetCost.cc @@ -0,0 +1,559 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Rob Clark */ +/* Copyright (c) 2006 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Rob Clark */ +/* Date: June 2006 */ +/* --------------------------------------------------------------------- */ +/* */ +/* */ +/* */ +/* */ +/* */ +/* */ +/*************************************************************************/ + +#include +#include "festival.h" +#include "ling_class/EST_Item.h" +#include "EST_FlatTargetCost.h" +#include "siod.h" + +static const int simple_phone(const EST_String&); +static const int simple_id(const EST_String&); +static const int simple_pos(const EST_String &s); +static const int simple_punc(const EST_String &s); +static const int get_bad_f0(const EST_Item *seg); +static const EST_Item* tc_get_syl(const EST_Item *seg); +static const EST_Item* tc_get_word(const EST_Item *seg); + +static EST_TStringHash phonehash(10); +static int phone_count=0; + +//VAL_REGISTER_TYPE(tcdata,TCData) + +/* + * BASE CLASS: EST_TargetCost + */ + + +/* Individual cost functions */ + + + + +TCData *EST_FlatTargetCost::flatpack(EST_Item *seg) const +{ + + const EST_Item *syl, *nsyl, *nnsyl, *word; + + TCData *f =new TCData(TCHI_LAST); + + syl=tc_get_syl(seg); + nsyl=tc_get_syl(seg->next()); + if(seg->next()->next()) + nnsyl=tc_get_syl(seg->next()->next()); + else nnsyl = 0; + + // This segment features + + //cout << "SEG: " << seg->S("name") << " is vowel: " + // << ph_is_vowel(seg->S("name")) << endl; + + if(ph_is_vowel(seg->S("name"))) + (*f)[VOWEL]=1; + else + (*f)[VOWEL]=0; + + //cout << "SEG: " << seg->S("name") << " is sil: " + // << ph_is_silence(seg->S("name")) << endl; + + if(ph_is_silence(seg->S("name"))) + (*f)[SIL]=1; + else + (*f)[SIL]=0; + + if(seg->f_present("bad_dur")) + (*f)[BAD_DUR]=1; + else + (*f)[BAD_DUR]=0; + + if(seg->next()->f_present("bad_dur")) + (*f)[NBAD_DUR]=1; + else + (*f)[NBAD_DUR]=0; + + if(seg->f_present("bad_lex")) + (*f)[BAD_OOL]=1; + else + (*f)[BAD_OOL]=0; + + if(seg->next()->f_present("bad_lex")) + (*f)[NBAD_OOL]=1; + else + (*f)[NBAD_OOL]=0; + + + (*f)[BAD_F0]=get_bad_f0(seg); + + + // This segments syl features + + if(syl) + { + (*f)[SYL]=simple_id(syl->S("id")); + (*f)[SYL_STRESS]=syl->I("stress"); + //cout << "syl id: " << simple_id(syl->S("id")) + //<< " stress: " << syl->I("stress") << endl; + } + else + { + (*f)[SYL]=0; + (*f)[SYL_STRESS]=0; + //cout << "no syl present " << endl; + + } + + + // Next segment features + + //cout << "NSEG: " << seg->next()->S("name") << " is sil: " + // << ph_is_silence(seg->next()->S("name")) << endl; + + if(ph_is_silence(seg->next()->S("name"))) + (*f)[N_SIL]=1; + else + (*f)[N_SIL]=0; + + //cout << "NSEG: " << seg->next()->S("name") << " is vowel: " + // << ph_is_vowel(seg->next()->S("name")) << endl; + + if(ph_is_vowel(seg->next()->S("name"))) + (*f)[N_VOWEL]=1; + else + (*f)[N_VOWEL]=0; + + // Next seg syl features + if(nsyl) + { + (*f)[NSYL]=simple_id(nsyl->S("id")); + (*f)[NSYL_STRESS]=nsyl->I("stress"); + //cout << "nsyl stress: " << nsyl->I("stress") << endl; + } + else + { + (*f)[NSYL]=0; + (*f)[NSYL_STRESS]=0; + //cout << "no nsyl: " << endl; + } + + if(seg->next()->next()) + { + //cout << "RC: " << seg->next()->next()->S("name") + //<< " " << simple_phone(seg->next()->next()->S("name")) + // << endl; + (*f)[RC]=simple_phone(seg->next()->next()->S("name")); + (*f)[NNBAD_DUR]=seg->next()->next()->f_present("bad_dur"); + } + else + { + //cout << "NO RC\n"; + (*f)[RC]=0; + (*f)[NNBAD_DUR]=0; + } + + // Next next seg syl features. + if(nnsyl) + { + (*f)[NNSYL]=simple_id(nnsyl->S("id")); + } + else + (*f)[NNSYL]=0; + + // Prev seg syl feature + if(seg->prev()) + { + (*f)[LC]=simple_phone(seg->prev()->S("name")); + (*f)[PBAD_DUR]=seg->prev()->f_present("bad_dur"); + } + else + { + (*f)[LC]=0; + (*f)[PBAD_DUR]=0; + } + + if(seg->prev() && (syl=tc_get_syl(seg->prev()))) + (*f)[PSYL]=simple_id(syl->S("id")); + else + (*f)[PSYL]=0; + + // seg word feature + if((word=tc_get_word(seg))) + (*f)[WQRD]=simple_id(word->S("id")); + else + (*f)[WQRD]=0; + + + // Next seg word features + if((word=tc_get_word(seg->next()))) + (*f)[NWQRD]=simple_id(word->S("id")); + else + (*f)[NWQRD]=0; + + // next next seg word feature + if(seg->next()->next() && (word=tc_get_word(seg->next()->next()))) + (*f)[NNWQRD]=simple_id(word->S("id")); + else + (*f)[NNWQRD]=0; + + // Prev seg word feature + if(seg->prev() && (word=tc_get_word(seg->prev()))) + (*f)[PWQRD]=simple_id(word->S("id")); + else + (*f)[PWQRD]=0; + + + // segs sylpos + (*f)[SYLPOS]=0; // medial + if( f->a_no_check(SYL)!= f->a_no_check(NSYL) ) + (*f)[SYLPOS]=1; // inter + else if( f->a_no_check(SYL)!= f->a_no_check(PSYL) ) + (*f)[SYLPOS]=2; // initial + else if( f->a_no_check(NSYL) != f->a_no_check(NNSYL) ) + (*f)[SYLPOS]=3; // final + + // segs wordpos + (*f)[WQRDPOS]=0; // medial + if( f->a_no_check(WQRD)!= f->a_no_check(NWQRD) ) + (*f)[WQRDPOS]=1; // inter + else if( f->a_no_check(WQRD)!= f->a_no_check(PWQRD) ) + (*f)[WQRDPOS]=2; // initial + else if( f->a_no_check(NWQRD) != f->a_no_check(NNWQRD) ) + (*f)[WQRDPOS]=3; // final + + // pbreak + if ((word=tc_get_word(seg))) + { + if ( word->S("pbreak") == "NB" ) + (*f)[PBREAK]=0; + else if ( word->S("pbreak") == "B" ) + (*f)[PBREAK]=1; + else + (*f)[PBREAK]=2; + } + else + (*f)[PBREAK]=-1; + + // seg punc and pos + if((word=tc_get_word(seg))) + { + (*f)[POS]=simple_pos(word->S("pos")); + (*f)[PUNC]=simple_punc(parent(word,"Token")->S("punc","NONE")); + } + else + { + (*f)[POS]=-1; + (*f)[PUNC]=-1; + } + + // next seg punc and pos + if ((word=tc_get_word(seg->next()))) + { + (*f)[NPOS]=simple_pos(word->S("pos")); + (*f)[NPUNC]=simple_punc(parent(word,"Token")->S("punc","NONE")); + } + else + { + (*f)[NPOS]=-1; + (*f)[NPUNC]=-1; + } + + return f; + //seg->set_val("tcdata",est_val(f)); // copied? + +} + + +float EST_FlatTargetCost::stress_cost() const +{ + + if( t->a_no_check(VOWEL) && ! t->a_no_check(SIL)) + { + // Can't assume candidate and target identities are the same + // (because of backoff to a silence for example) + if( c->a_no_check(SYL) == 0 || c->a_no_check(NSYL) ) + return 1.0; + + if ( t->a_no_check(SYL_STRESS) != c->a_no_check(SYL_STRESS) ) + return 1.0; + + if ( t->a_no_check(NSYL_STRESS) != c->a_no_check(NSYL_STRESS) ) + return 1.0; + + } + + return 0.0; + +} + + +float EST_FlatTargetCost::position_in_phrase_cost() const +{ + + if ( !t->a_no_check(WQRD) && !c->a_no_check(WQRD) ) + return 0; + if ( !t->a_no_check(WQRD) || !c->a_no_check(WQRD) ) + return 1; + + return ( t->a_no_check(PBREAK) == c->a_no_check(PBREAK) ) ? 0 : 1; +} + +float EST_FlatTargetCost::punctuation_cost() const +{ + + float score = 0.0; + + if ( (t->a_no_check(WQRD) && !c->a_no_check(WQRD)) + || (!t->a_no_check(WQRD) && c->a_no_check(WQRD)) ) + score += 0.5; + else + if (t->a_no_check(WQRD) && c->a_no_check(WQRD)) + if ( t->a_no_check(PUNC) != c->a_no_check(PUNC) ) + score += 0.5; + + if ( (t->a_no_check(NWQRD) && !c->a_no_check(NWQRD)) + || (!t->a_no_check(NWQRD) && c->a_no_check(NWQRD)) ) + score += 0.5; + else + if(t->a_no_check(NWQRD) && c->a_no_check(NWQRD)) + if ( t->a_no_check(NPUNC) != c->a_no_check(NPUNC) ) + score += 0.5; + + return score; + +} + + +float EST_FlatTargetCost::partofspeech_cost() const +{ + // Compare left phone half of diphone + if(!t->a_no_check(WQRD) && !c->a_no_check(WQRD)) + return 0; + if(!t->a_no_check(WQRD) || !c->a_no_check(WQRD)) + return 1; + if( t->a_no_check(POS) != c->a_no_check(POS) ) + return 1; + + // Compare right phone half of diphone + if(!t->a_no_check(NWQRD) && !c->a_no_check(NWQRD)) + return 0; + if(!t->a_no_check(NWQRD) || !c->a_no_check(NWQRD)) + return 1; + if( t->a_no_check(NPOS) != c->a_no_check(NPOS) ) + return 1; + + return 0; +} + + +float EST_FlatTargetCost::out_of_lex_cost() const +{ + // bad_dur may at some stage be set on a target for resynthesis purposes. + if( c->a_no_check(BAD_OOL) != t->a_no_check(BAD_OOL) ) + return 1.0; + + if( c->a_no_check(NBAD_OOL) != t->a_no_check(NBAD_OOL) ) + return 1.0; + + return 0.0; +} + +float EST_FlatTargetCost::bad_duration_cost() const +{ + // bad_dur may at some stage be set on a target for resynthesis purposes. + if( c->a_no_check(BAD_DUR) != t->a_no_check(BAD_DUR) ) + return 1.0; + + if( c->a_no_check(NBAD_DUR) != t->a_no_check(NBAD_DUR) ) + return 1.0; + + // If the segments next to these segments are bad, then these ones are probably wrong too! + if( c->a_no_check(PBAD_DUR) != t->a_no_check(PBAD_DUR) ) + return 1.0; + if( c->a_no_check(NNBAD_DUR) != t->a_no_check(NNBAD_DUR) ) + return 1.0; + + return 0.0; +} + + +/* + * DERIVED CLASS: EST_FlatTargetCost + * + * This is CSTR's proposed default target cost flat packed. Nothing + * special, if you think you can do better derive your own class. + */ + +float EST_FlatTargetCost::operator()(const TCData* targ, const TCData* cand) const +{ + set_t_and_c(targ,cand); + score = 0.0; + weight_sum = 0.0; + + score += add_weight(10.0)*stress_cost(); + score += add_weight(5.0)*position_in_syllable_cost(); + score += add_weight(5.0)*position_in_word_cost(); + score += add_weight(6.0)*partofspeech_cost(); + score += add_weight(15.0)*position_in_phrase_cost(); + score += add_weight(4.0)*left_context_cost(); + score += add_weight(3.0)*right_context_cost(); + + score /= weight_sum; + + // These are considered really bad, and will result in a score > 1. + score += 10.0*bad_duration_cost(); // see also join cost. + score += 10.0*bad_f0_cost(); + score += 10.0*punctuation_cost(); + + + return score ; +} + + + +/* + * Auxillary target cost functions + */ + +static const int simple_phone(const EST_String &phone) +{ + if(phonehash.present(phone)) + return phonehash.val(phone); + + phonehash.add_item(phone,++phone_count); + return phone_count; + +} + +static const int simple_id(const EST_String &id) +{ + return id.after("_").Int(); +} + + +static const int simple_punc(const EST_String &punc) +{ + if ( punc == "NONE") + return 0; + else if ( punc == "," || punc == ":" || punc == ";" ) + return 1; + else if ( punc == "\"" || punc == "'" || punc == "-" ) + return 1; + else if ( punc == "(" || punc == ")" ) + return 1; + else if ( punc == ".") + return 2; + else if ( punc == "?") + return 3; + else + return 0; +} + +static const int simple_pos(const EST_String &s) +{ + if( s == "nn" || s == "nnp" || s == "nns" || s == "nnps" || s == "fw" || s == "sym" || s == "ls") + return 1; + if( s == "vbd" || s == "vb" || s == "vbn" || s == "vbz" || s == "vbp" || s == "vbg") + return 2; + if( s == "jj" || s == "jjr" || s == "jjs" || s == "1" || s == "2" || s == "rb" || + s == "rp" || s == "rbr" || s == "rbs") + return 3; + return 0; + } + +static const int get_bad_f0(const EST_Item *seg) +{ + // by default, the last element of join cost coef vector is + // the f0 (i.e. fv->a_no_check( fv->n()-1 ) ) + + EST_String left(seg->S("name")); + EST_String right(seg->next()->S("name")); + + EST_FVector *fv = 0; + int penalty = 0; + + if( seg->f_present("midcoef") && + ( ph_is_vowel( left ) + || ph_is_approximant( left ) + || ph_is_liquid( left ) + || ph_is_nasal( left ) )){ + fv = fvector( seg->f("midcoef") ); + if( fv->a_no_check(fv->n()-1) == -1.0 ) // means unvoiced + penalty += 1; + } + + if( seg->next()->f_present("midcoef") && + ( ph_is_vowel( right ) + || ph_is_approximant( right ) + || ph_is_liquid( right ) + || ph_is_nasal( right ) ) ){ + fv = fvector( seg->next()->f("midcoef") ); + if( fv->a_no_check(fv->n()-1) == -1.0 ) // means unvoiced + penalty += 1; + } + + return penalty/2; +} + +static const EST_Item *tc_get_syl(const EST_Item *seg) +{ + // if(!seg) + // return 0; + + return parent(seg,"SylStructure"); +} + +static const EST_Item *tc_get_word(const EST_Item *seg) + { + // if(!seg) + // return 0; + const EST_Item *syl = tc_get_syl(seg); + + if(syl) + return parent(syl,"SylStructure"); + else + return 0; + } + + diff --git a/src/modules/MultiSyn/EST_FlatTargetCost.h b/src/modules/MultiSyn/EST_FlatTargetCost.h new file mode 100644 index 0000000..1bb7855 --- /dev/null +++ b/src/modules/MultiSyn/EST_FlatTargetCost.h @@ -0,0 +1,131 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Rob Clark */ +/* Copyright (c) 2006 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Rob Clark */ +/* Date: June 2006 */ +/* --------------------------------------------------------------------- */ +/* */ +/* */ +/* */ +/* */ +/*************************************************************************/ + + +#ifndef __EST_TARGETCOST_FLAT_H__ +#define __EST_TARGETCOST_FLAT_H__ + +#include "EST_THash.h" +#include "EST_TargetCost.h" + +/* we use WQRD in place of WORD below to keep VC++ happy */ +enum tcdata_t +{ + VOWEL, SIL, BAD_DUR, NBAD_DUR, BAD_OOL, NBAD_OOL, BAD_F0, + SYL, SYL_STRESS, N_SIL, N_VOWEL, + NSYL, NSYL_STRESS, + RC, NNBAD_DUR, NNSYL, LC, PBAD_DUR, + PSYL, WQRD, NWQRD, NNWQRD, PWQRD, + SYLPOS, WQRDPOS, PBREAK, + POS, PUNC, NPOS, NPUNC, + TCHI_LAST +} ; + + +typedef EST_IVector TCData; +VAL_REGISTER_TYPE_DCLS(ivector,TCData) + +typedef EST_THash TCDataHash; + + +/* + * DERIVED CLASS: EST_FlatTargetCost + */ +class EST_FlatTargetCost : public EST_TargetCost { + + public: + EST_FlatTargetCost() : li(0){}; + + + private: + mutable const TCData *t; + mutable const TCData *c; + mutable const EST_Item *li; + + inline void set_t_and_c(const TCData* targ, const TCData* cand) const + { + t = targ; + c = cand; + } + + float stress_cost() const; + inline float position_in_syllable_cost() const + { return ( t->a_no_check(SYLPOS) == c->a_no_check(SYLPOS) ) ? 0 : 1; } + inline float position_in_word_cost() const + { return ( t->a_no_check(WQRDPOS) == c->a_no_check(WQRDPOS) ) ? 0 : 1; } + float position_in_phrase_cost() const; + float punctuation_cost() const; + float partofspeech_cost() const; + inline float left_context_cost() const + { return ( t->a_no_check(LC) == c->a_no_check(LC) ) ? 0 : 1; } + inline float right_context_cost() const + { return ( t->a_no_check(RC) == c->a_no_check(RC) ) ? 0 : 1; } + float bad_duration_cost() const; + float out_of_lex_cost() const; + inline float bad_f0_cost() const + { return float(c->a_no_check(BAD_F0)) / 2.0; } + + + + public: + float operator()(const EST_Item* targ, const EST_Item* cand) const + { EST_error("EST_FlatTargetCost operator() called with EST_Items\n"); + return 1; } + float operator()(const TCData *targ, const TCData *cand) const; + const bool is_flatpack() const { return true; } + TCData *flatpack(EST_Item *seg) const; + + +}; + + + + + +#endif // __EST_TARGETCOST_H__ + + + + + diff --git a/src/modules/MultiSyn/EST_JoinCost.cc b/src/modules/MultiSyn/EST_JoinCost.cc new file mode 100644 index 0000000..6f579da --- /dev/null +++ b/src/modules/MultiSyn/EST_JoinCost.cc @@ -0,0 +1,80 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Korin Richmond */ +/* Date: October 2002 */ +/* --------------------------------------------------------------------- */ +/* name speaks for itself */ +/* */ +/* */ +/* */ +/* */ +/* */ +/*************************************************************************/ + +#include "ling_class/EST_Relation.h" +#include "ling_class/EST_Utterance.h" + +#include "EST_JoinCost.h" +#include "EST_JoinCostCache.h" +#include "safety.h" +#include "ling_class/EST_Item.h" +#include "EST_FMatrix.h" + +EST_JoinCost::~EST_JoinCost() +{ + int cclen = costCaches.length(); + for( int i=0; i &list, bool verbose ) +{ + // for later speed use the costCache's index as it's id + unsigned int jccid = costCaches.length(); + costCaches.resize( jccid+1 ); //EST_TVector doesn't have push_back yet + + EST_JoinCostCache *jcc = new EST_JoinCostCache( jccid, list.length() ); + CHECK_PTR(jcc); + + costCaches[jccid] = jcc; + + // this function adds the relevant indicies + return jcc->computeAndCache( list, *this, verbose ); +} + + + + diff --git a/src/modules/MultiSyn/EST_JoinCost.h b/src/modules/MultiSyn/EST_JoinCost.h new file mode 100644 index 0000000..6813004 --- /dev/null +++ b/src/modules/MultiSyn/EST_JoinCost.h @@ -0,0 +1,277 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Korin Richmond */ +/* Date: October 2002 */ +/* --------------------------------------------------------------------- */ +/* Interface for family of join cost function objects which */ +/* calculate a join score for given candidates */ +/* */ +/* */ +/* */ +/* */ +/*************************************************************************/ + + +#ifndef __EST_JOINCOST_H__ +#define __EST_JOINCOST_H__ + +class EST_Item; +class EST_JoinCostCache; + +/////////////////////////////// +//because of the inline +#include "EST_JoinCostCache.h" +#include "DiphoneVoiceModule.h" +#include "safety.h" +#include "ling_class/EST_Item.h" +#include "EST_FMatrix.h" +/////////////////////////////// + +/**@name Interface for Join Cost function object +*/ + +//@{ + +/** Object oriented approach for better or for worse... +*/ + +class EST_JoinCost { + public: + + EST_JoinCost() + : defCost(1), + f0_weight (1.0), + power_weight(1.0), + spectral_weight(1.0) + {}; + + ~EST_JoinCost(); + + bool computeAndCache( const EST_TList &list, bool verbose=true ); + + // join cost which avoids information lookup (or faster lookup), to + // be used in Viterbi search + inline float operator()( const DiphoneCandidate* left, const DiphoneCandidate* right ) const; + + // original join cost, retained because it's still used by cost calculations for + // caching (and for posterity, and comparison) + inline float operator()( const EST_Item *left, const EST_Item *right ) const; + + inline void set_f0_weight(float val) { f0_weight = val; } + inline void set_power_weight(float val) { power_weight = val; } + inline void set_spectral_weight(float val) { spectral_weight = val; } + + private: + // member data all used by the older implementation (taking EST_Item* not + // DiphoneCandidate*) + float defCost; + mutable const EST_Item *cachedItem; + mutable const EST_FVector *cachedItemVector; + mutable unsigned int cached_jccid; + mutable unsigned int cached_jccindex; + mutable bool costIsCached; + mutable bool diphoneJoin; //would definitely be better somewhere else... + float f0_weight; + float power_weight; + float spectral_weight; + + + EST_TSimpleVector costCaches; + + // function which computes the actual cost between two vectors + // of join cost coefficients + inline float calcDistance( const EST_FVector *l, const EST_FVector *r ) const; +}; + + +////////////////////////////////////////////////////////////////////////////////// +// experiment to see if this is sensible or not +// // for now, the left and right edges of a join are represented by EST_Items, +// // which is particular to the implementation of the DiphoneUnitVoice. +// // It would be desirable in the future though to abstract the interface to +// // something like concatUnit::right_edge() and concatUnit::left_edge() since +// // we would like a given join cost to be able to be generally applied to units of +// // any length or form. +inline float EST_JoinCost::operator()( const EST_Item* left, const EST_Item* right ) const +{ + float d_overall; + + //default zero cost if units contiguous in database + // (i.e. this is the cost between a phone and *itself* + if( left == right->prev() ) + return 0.0; + + // An "infinite" join cost for bad units. The idea here is that if + // you use a unit marked as bad (either for duration or + // pitchmarks, then the units either side of it must also be used. + // if( left->f_present("bad") || right->f_present("bad") + // || left->f_present("bad_dur") || right->f_present("bad_dur") ) + // return 100.0; + + // else... + + // since the Viterbi class takes each path at time t and tries to + // extend with all candidates at time t+1, it probably makes sense + // to cache the feature vector for the "left" item to avoid looking + // it up every time this function is called... + if( cachedItem != left ){ + cachedItem = left; + + if( left->f_present( "jccid" ) ){ + costIsCached = true; + cached_jccid = left->features().val( "jccid" ).Int(); + cached_jccindex = left->features().val( "jccindex" ).Int(); + } + else{ + costIsCached = false; + + if( left->f_present( "extendRight") ){ + diphoneJoin = false; + cachedItemVector = fvector( left->features().val( "endcoef" ) ); + } + else{ + diphoneJoin = true; + cachedItemVector = fvector( left->features().val( "midcoef" ) ); + } + + } + } + + if( costIsCached && right->f_present( "jccid" ) ){ + unsigned int right_jccid = right->features().val( "jccid" ).Int(); + unsigned int right_jccindex = right->features().val( "jccindex" ).Int(); + + if( cached_jccid == right_jccid ) + d_overall = (float)(costCaches(cached_jccid)->val(cached_jccindex, right_jccindex))/255; + else{ + EST_warning( "JoinCost: inconsistent cache ids, setting max join cost" ); + d_overall = 1.0; + } + } + else{ + const EST_FVector *l = cachedItemVector; + const EST_FVector *r; + if( diphoneJoin ) + r = fvector( right->features().val( "midcoef" ) ); + else + r = fvector( right->features().val( "startcoef" ) ); + + d_overall = calcDistance( l, r ); + } + return d_overall; +} + +inline float EST_JoinCost::operator()( const DiphoneCandidate* left, const DiphoneCandidate* right ) const +{ + float dist; + + //default zero cost if units contiguous in database + // (i.e. this is the cost between a phone and *itself* + if( left->ph1->next() == right->ph1 ) + return 0.0; + + //use cached costs in preference to calculating + if( left->ph2_jccid >= 0 ) + if( left->ph2_jccid == right->ph1_jccid ) + dist = (float)(costCaches(left->ph2_jccid)->val(left->ph2_jccindex, right->ph1_jccindex))/255; + else{ + EST_warning( "JoinCost: inconsistent cache ids, setting max join cost" ); + dist = 1.0; + } + else{ + dist = calcDistance( left->r_coef, right->l_coef ); + } + return dist; +} + +inline float EST_JoinCost::calcDistance( const EST_FVector *l, const EST_FVector *r ) const +{ + float d_spectral, d_f0, d_power, d_overall; + + int l_length = l->length(); + if (l_length != r->length()) + EST_error("Can't compare vectors of differing length\n"); + + //////////////////////////////////////////////////////////////////////////// + // f0 distance + // + // because unvoiced is represented as -1.0, we need to take + // special measures when calculating f0 distances, to avoid + // situation where something low in the speaker's range is closer + // to 0.0 than other higher voiced speech. (this could especially + // be problematic where bad pitchmarking or labelling occurs) + float l_f0 = l->a_no_check( l_length-1 ); + float r_f0 = r->a_no_check( l_length-1 ); + + if( l_f0 != -1.0 ){ + if( r_f0 != -1.0 ){ + d_f0 = pow((float)( l_f0 - r_f0 ), (float)2.0); + d_f0 = sqrt( d_f0 ); + } + else + d_f0 = 1.0; + } + else if( r_f0 != -1.0 ) + d_f0 = 1.0; + else + d_f0 = 0.0; + + //////////////////////////////////////////////////////////////////////////// + // power distance + d_power = pow((float)((l->a_no_check(l_length-2) - r->a_no_check(l_length-2))), (float)2.0); + d_power = sqrt( d_power ); + + //////////////////////////////////////////////////////////////////////////// + // spectral distance + float d = 0.0; + l_length -= 2; // don't include f0 and power + for(int i = 0; i < l_length; i++) + d += pow((float)(l->a_no_check(i) - r->a_no_check(i)), (float)2.0); + d_spectral = sqrt( d ); + + // equal weighting by default + d_overall = (d_f0*f0_weight + d_power*power_weight + d_spectral*spectral_weight) / 3; + + + return d_overall; +} + +#endif // __EST_JOINCOST_H__ + + + + + diff --git a/src/modules/MultiSyn/EST_JoinCostCache.cc b/src/modules/MultiSyn/EST_JoinCostCache.cc new file mode 100644 index 0000000..34b8e0e --- /dev/null +++ b/src/modules/MultiSyn/EST_JoinCostCache.cc @@ -0,0 +1,168 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2004 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Korin Richmond */ +/* Date: January 2004 */ +/* */ +/*************************************************************************/ + +#include "EST_JoinCostCache.h" +#include "EST_JoinCost.h" +#include "EST_error.h" +#include "safety.h" +#include +#include + +EST_JoinCostCache::EST_JoinCostCache( unsigned int id ) + : numInstances(0), + _id(id), + cache(0), + cachelen(0) + +{ + +} + +EST_JoinCostCache::EST_JoinCostCache( unsigned int id, unsigned int n ) + : numInstances(n), + _id(id), + cachelen((n*n)/2-n), + deleteMemoryOnDeath(true) +{ + cache = new unsigned char [cachelen]; + CHECK_PTR( cache ); +} + +EST_JoinCostCache::EST_JoinCostCache( unsigned int id, unsigned char *mem, unsigned int n, bool del ) + : numInstances(n), + _id(id), + cache(mem), + cachelen((n*n)/2-n), + deleteMemoryOnDeath(del) +{ + +} + +EST_JoinCostCache::~EST_JoinCostCache() +{ + if( cache != 0 ) + if( deleteMemoryOnDeath ) + delete [] cache; +} + +unsigned char EST_JoinCostCache::val( unsigned int a, unsigned int b ) const +{ + if( a>numInstances || b>numInstances ) + EST_error( "Requested index greater than cache size" ); + + if( a == b ) + return minVal; + else if( b > a ) + return cache[(b*(b-1)>>1)+a]; + else + return cache[(a*(a-1)>>1)+b]; + + return defVal; +} + + +bool EST_JoinCostCache::setval( unsigned int a, unsigned int b, unsigned char v ) +{ + if( a>numInstances || b>numInstances ) + EST_error( "Requested index greater than cache size" ); + + if( a == b ){ + return true; + } + else if( b > a ){ + cache[(b*(b-1)>>1)+a] = v; + return true; + } + else{ + cache[(a*(a-1)>>1)+b] = v; + return true; + } + + return false; +} + + +ostream& EST_JoinCostCache::write( ostream &os ) const +{ + os << cachelen; + // os.write( cache, cachelen ); + return os; +} + +bool EST_JoinCostCache::computeAndCache( const EST_TList &list, + const EST_JoinCost &jc, + bool verbose ) +{ + unsigned char qcost; // quantized cost + + unsigned int qleveln = maxVal-minVal; + + float ulimit = 1.0-1/(float)(qleveln); + float llimit = 0.0+1/(float)(qleveln); + + unsigned int i=0; + EST_warning("EST_JoinCostCache::computeAndCache"); + for( EST_Litem *it=list.head(); it; it=it->next(), ++i ){ + + unsigned int j=i+1; + for( EST_Litem *jt=it->next(); jt; jt=jt->next(), ++j ){ + float cost = jc( list(it), list(jt) ); + + if( cost >= ulimit ) + qcost = maxVal; + else if( cost <= llimit ) + qcost = minVal; + else + qcost = static_cast(rint(cost*(float)qleveln)); + + setval( i, j, qcost ); + } + + // yet to be convinced this is the best place for this... + list(it)->set( "jccid", (int)this->id() ); + list(it)->set( "jccindex", (int) i ); + } + + return true; +} + + + + diff --git a/src/modules/MultiSyn/EST_JoinCostCache.h b/src/modules/MultiSyn/EST_JoinCostCache.h new file mode 100644 index 0000000..05e236b --- /dev/null +++ b/src/modules/MultiSyn/EST_JoinCostCache.h @@ -0,0 +1,100 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2004 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Korin Richmond */ +/* Date: January 2004 */ +/* --------------------------------------------------------------------- */ +/* Data type intended to handle a table of n^2 measures relating to */ +/* n distinct entities (e.g. join costs for a phone type). We assume */ +/* the measure is symmetric and hence we don't actually store n^2 values */ +/* */ +/*************************************************************************/ + + +#ifndef __EST_JOINCOSTCACHE_H__ +#define __EST_JOINCOSTCACHE_H__ + +/**@name Caching of join cost computations (need for speed) + We assume the distance measure is symmetric and hence we don't actually + store n^2 values for the n instances of any given phone. +*/ + +//@{ + +/** Object oriented approach for better and for worse... +*/ + +#include "EST_TList.h" +#include "ling_class/EST_Item.h" +#include + +using namespace std; + +class EST_JoinCost; + +class EST_JoinCostCache { +public: + + EST_JoinCostCache( unsigned int id ); + EST_JoinCostCache( unsigned int id, unsigned int n ); + EST_JoinCostCache( unsigned int id, unsigned char *mem, unsigned int n, bool del=false ); + ~EST_JoinCostCache(); + + unsigned int id() const {return _id;} + + ostream& write( ostream &os ) const; + unsigned char val( unsigned int a, unsigned int b ) const; + bool setval( unsigned int a, unsigned int b, unsigned char v ); + + bool computeAndCache( const EST_TList &list, + const EST_JoinCost &jc, + bool verbose=false ); + +private: + EST_JoinCostCache( EST_JoinCostCache &); + EST_JoinCostCache& operator=( EST_JoinCostCache &); + +private: + unsigned int numInstances; + unsigned int _id; + unsigned char* cache; + static const unsigned char minVal = 0x0; + static const unsigned char maxVal = 0xff; + static const unsigned char defVal = 0xff; + unsigned int cachelen; + bool deleteMemoryOnDeath; +}; + +#endif // __EST_JOINCOSTCACHE_H__ diff --git a/src/modules/MultiSyn/EST_TargetCost.cc b/src/modules/MultiSyn/EST_TargetCost.cc new file mode 100644 index 0000000..4fc10ab --- /dev/null +++ b/src/modules/MultiSyn/EST_TargetCost.cc @@ -0,0 +1,762 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Korin Richmond */ +/* Date: October 2002 */ +/* --------------------------------------------------------------------- */ +/* */ +/* */ +/* */ +/* */ +/* */ +/* */ +/*************************************************************************/ + +#include +#include "festival.h" +#include "ling_class/EST_Item.h" +#include "EST_TargetCost.h" +#include "siod.h" + +static const EST_String simple_pos(const EST_String &s); +static const EST_Utterance *tc_get_utt(const EST_Item *seg); +static const EST_Item* tc_get_syl(const EST_Item *seg); +static const EST_Item* tc_get_word(const EST_Item *seg); +static EST_String ff_tobi_accent(const EST_Item *s); +static EST_String ff_tobi_endtone(const EST_Item *s); +static bool threshold_equal(float a, float b, float threshold); + +/* + * BASE CLASS: EST_TargetCost + */ + + +/* Individual cost functions */ + + +// This is really designed only for apml! +float EST_TargetCost::apml_accent_cost() const +{ + // Check if target is an apml utterance. If not return 0 as we don't + // trust its accent specification. + + if( !tc_get_utt(targ)->relation_present("SemStructure")) + return 0.0; + + // Check if candidate is an apml utterance. If not return 1 + // (as we want to use apml if available) + + if( !tc_get_utt(cand)->relation_present("SemStructure")) + return 1.0; + + // As they are both apml match accents. + + const EST_Item *tsyl, *csyl; + EST_String targ_accent, cand_accent, targ_boundary, cand_boundary; + + + if( ph_is_vowel(targ->features().val("name").String()) && + !ph_is_silence(targ->features().val("name").String()) ) + { + tsyl = tc_get_syl(targ); + csyl = tc_get_syl(cand); + + // Can't assume candidate and target identities are the same + // (because of backoff to a silence for example) + if( csyl == 0 ) + return 1.0; + + targ_accent = ff_tobi_accent(tsyl); + cand_accent = ff_tobi_accent(csyl); + targ_boundary = ff_tobi_endtone(tsyl); + cand_boundary = ff_tobi_endtone(csyl); + + if( (cand_accent != targ_accent) || (cand_boundary != targ_boundary) ) + return 1.0; + } + + if( ph_is_vowel(targ->next()->features().val("name").String()) && + !ph_is_silence(targ->next()->features().val("name").String()) ) + { + tsyl = tc_get_syl(targ->next()); + csyl = tc_get_syl(cand->next()); + + // Can't assume candidate and target identities are the same + // (because of backoff to a silence for example) + if( csyl == 0 ) + return 1.0; + + targ_accent = ff_tobi_accent(tsyl); + cand_accent = ff_tobi_accent(csyl); + targ_boundary = ff_tobi_endtone(tsyl); + cand_boundary = ff_tobi_endtone(csyl); + + if( (cand_accent != targ_accent) || (cand_boundary != targ_boundary) ) + return 1.0; + } + + return 0.0; + +} + + +float EST_TargetCost::stress_cost() const +{ + int cand_stress; + int targ_stress; + const EST_Item *tsyl, *csyl; + + if( ph_is_vowel(targ->features().val("name").String()) && + !ph_is_silence(targ->features().val("name").String()) ) + { + tsyl = tc_get_syl(targ); + csyl = tc_get_syl(cand); + + // Can't assume candidate and target identities are the same + // (because of backoff to a silence for example) + if( csyl == 0 ) + { + //cout << "SC: 1 returning 1\n"; + return 1; + } + + targ_stress = (tsyl->I("stress") > 0) ? 1 : 0; + cand_stress = (csyl->I("stress") > 0) ? 1 : 0; + + if( cand_stress != targ_stress) + { + //cout << "SC: 2 returning 1\n"; + return 1; + } + } + + if( ph_is_vowel(targ->next()->features().val("name").String()) && + !ph_is_silence(targ->next()->features().val("name").String()) ) + { + tsyl = tc_get_syl(targ->next()); + csyl = tc_get_syl(cand->next()); + + // Can't assume candidate and target identities are the same + // (because of backoff to a silence for example) + if( csyl == 0 ) + { + //cout << "SC: 3 returning 1\n"; + return 1; + } + + targ_stress = (tsyl->I("stress") > 0) ? 1 : 0; + cand_stress = (csyl->I("stress") > 0) ? 1 : 0; + if( cand_stress != targ_stress) + { + //cout << "SC: 4 returning 1\n"; + return 1; + } + } + + //cout << "SC: 5 returning 0\n"; + return 0; +} + +float EST_TargetCost::position_in_syllable_cost() const +{ + tcpos_t targ_pos = TCPOS_MEDIAL; + tcpos_t cand_pos = TCPOS_MEDIAL; + + const EST_Item *targ_syl = tc_get_syl(targ); + const EST_Item *targ_next_syl = tc_get_syl(targ->next()); + const EST_Item *targ_next_next_syl = tc_get_syl(targ->next()->next()); + const EST_Item *targ_prev_syl = tc_get_syl(targ->prev()); + const EST_Item *cand_syl = tc_get_syl(cand); + const EST_Item *cand_next_syl = tc_get_syl(cand->next()); + const EST_Item *cand_next_next_syl = tc_get_syl(cand->next()->next()); + const EST_Item *cand_prev_syl = tc_get_syl(cand->prev()); + + if( targ_syl != targ_next_syl ) + targ_pos = TCPOS_INTER; + else if( targ_syl != targ_prev_syl) + targ_pos = TCPOS_INITIAL; + else if( targ_next_syl != targ_next_next_syl) + targ_pos = TCPOS_FINAL; + + if( cand_syl != cand_next_syl ) + cand_pos = TCPOS_INTER; + else if( cand_syl != cand_prev_syl) + cand_pos = TCPOS_INITIAL; + else if( cand_next_syl != cand_next_next_syl) + cand_pos = TCPOS_FINAL; + + return (targ_pos == cand_pos) ? 0 : 1; +} + +float EST_TargetCost::position_in_word_cost() const +{ + tcpos_t targ_pos = TCPOS_MEDIAL; + tcpos_t cand_pos = TCPOS_MEDIAL; + + const EST_Item *targ_word = tc_get_word(targ); + const EST_Item *targ_next_word = tc_get_word(targ->next()); + const EST_Item *targ_next_next_word = tc_get_word(targ->next()->next()); + const EST_Item *targ_prev_word = tc_get_word(targ->prev()); + const EST_Item *cand_word = tc_get_word(cand); + const EST_Item *cand_next_word = tc_get_word(cand->next()); + const EST_Item *cand_next_next_word = tc_get_word(cand->next()->next()); + const EST_Item *cand_prev_word = tc_get_word(cand->prev()); + + if( targ_word != targ_next_word ) + targ_pos = TCPOS_INTER; + else if( targ_word != targ_prev_word) + targ_pos = TCPOS_INITIAL; + else if( targ_next_word != targ_next_next_word) + targ_pos = TCPOS_FINAL; + + if( cand_word != cand_next_word ) + cand_pos = TCPOS_INTER; + else if( cand_word != cand_prev_word) + cand_pos = TCPOS_INITIAL; + else if( cand_next_word != cand_next_next_word) + cand_pos = TCPOS_FINAL; + + return (targ_pos == cand_pos) ? 0 : 1; +} + + +float EST_TargetCost::position_in_phrase_cost() const +{ + + const EST_Item *targ_word = tc_get_word(targ); + const EST_Item *cand_word = tc_get_word(cand); + + if (!targ_word && !cand_word) + return 0; + if (!targ_word || !cand_word) + return 1; + + return (targ_word->features().val("pbreak").String() == cand_word->features().val("pbreak").String()) ? 0 : 1; +} + +float EST_TargetCost::punctuation_cost() const +{ + + const EST_Item *targ_word = tc_get_word(targ); + const EST_Item *cand_word = tc_get_word(cand); + const EST_Item *next_targ_word = tc_get_word(targ->next()); + const EST_Item *next_cand_word = tc_get_word(cand->next()); + + float score = 0.0; + + if ( (targ_word && !cand_word) || (!targ_word && cand_word) ) + score += 0.5; + else + if (targ_word && cand_word) + if ( parent(targ_word,"Token")->features().val("punc","NONE").String() + != parent(cand_word,"Token")->features().val("punc","NONE").String() ) + score += 0.5; + + + if ( (next_targ_word && !next_cand_word) || (!next_targ_word && next_cand_word) ) + score += 0.5; + else + if(next_targ_word && next_cand_word) + if ( parent(next_targ_word,"Token")->features().val("punc","NONE").String() + != parent(next_cand_word,"Token")->features().val("punc","NONE").String() ) + score += 0.5; + + + return score; + +} + + +float EST_TargetCost::partofspeech_cost() const +{ + // Compare left phone half of diphone + const EST_Item *targ_left_word = tc_get_word(targ); + const EST_Item *cand_left_word = tc_get_word(cand); + + if(!targ_left_word && !cand_left_word) + return 0; + if(!targ_left_word || !cand_left_word) + return 1; + + const EST_String targ_left_pos( simple_pos(targ_left_word->features().val("pos").String()) ); + const EST_String cand_left_pos( simple_pos(cand_left_word->features().val("pos").String()) ); + + if( targ_left_pos != cand_left_pos ) + return 1; + + // Compare right phone half of diphone + const EST_Item *targ_right_word = tc_get_word(targ->next()); + const EST_Item *cand_right_word = tc_get_word(cand->next()); + + if(!targ_right_word && !cand_right_word) + return 0; + if(!targ_right_word || !cand_right_word) + return 1; + + const EST_String targ_right_pos( simple_pos(targ_right_word->features().val("pos").String()) ); + const EST_String cand_right_pos( simple_pos(cand_right_word->features().val("pos").String()) ); + + if( targ_right_pos != cand_right_pos ) + return 1; + + return 0; +} + +float EST_TargetCost::left_context_cost() const +{ + + EST_Item *targ_context = targ->prev(); + EST_Item *cand_context = cand->prev(); + + if ( !targ_context && !cand_context) + return 0; + if ( !targ_context || !cand_context) + return 1; + + return (targ_context->features().val("name").String() == cand_context->features().val("name").String()) ? 0 : 1; +} + +float EST_TargetCost::right_context_cost() const +{ + + EST_Item *targ_context = targ->next()->next(); + EST_Item *cand_context = cand->next()->next(); + + if ( !targ_context && !cand_context) + return 0; + if ( !targ_context || !cand_context) + return 1; + + return (targ_context->features().val("name").String() == cand_context->features().val("name").String()) ? 0 : 1; +} + +float EST_TargetCost::out_of_lex_cost() const +{ + static const EST_String ool_feat("bad_lex"); + + // bad_dur may at some stage be set on a target for resynthesis purposes. + if( cand->f_present(ool_feat) + != targ->f_present(ool_feat) ) + return 1.0; + + if( cand->next()->f_present(ool_feat) + != targ->next()->f_present(ool_feat) ) + return 1.0; + + return 0.0; +} + +float EST_TargetCost::bad_duration_cost() const +{ + static const EST_String bad_dur_feat("bad_dur"); + + // bad_dur may at some stage be set on a target for resynthesis purposes. + if( cand->f_present(bad_dur_feat) + != targ->f_present(bad_dur_feat) ) + return 1.0; + + if( cand->next()->f_present(bad_dur_feat) + != targ->next()->f_present(bad_dur_feat) ) + return 1.0; + // If the segments next to these segments are bad, then these ones are probably wrong too! + if( cand->prev() && targ->prev() && ( cand->prev()->f_present(bad_dur_feat) + != targ->prev()->f_present(bad_dur_feat) ) ) + return 1.0; + + if( cand->next()->next() && targ->next()->next() && ( cand->next()->next()->f_present(bad_dur_feat) + != targ->next()->next()->f_present(bad_dur_feat) ) ) + return 1.0; + + + return 0.0; +} + +float EST_TargetCost::bad_f0_cost() const +{ + // by default, the last element of join cost coef vector is + // the f0 (i.e. fv->a_no_check( fv->n()-1 ) ) + + const EST_Item *cand_left = cand; + const EST_Item *cand_right = cand_left->next(); + + const EST_String &left_phone( cand_left->features().val("name").String() ); + const EST_String &right_phone( cand_right->features().val("name").String() ); + + EST_FVector *fv = 0; + float penalty = 0.0; + + if( ph_is_vowel( left_phone ) + || ph_is_approximant( left_phone ) + || ph_is_liquid( left_phone ) + || ph_is_nasal( left_phone ) ){ + fv = fvector( cand_left->f("midcoef") ); + if( fv->a_no_check(fv->n()-1) == -1.0 ) // means unvoiced + penalty += 0.5; + } + + if( ph_is_vowel( right_phone ) + || ph_is_approximant( right_phone ) + || ph_is_liquid( right_phone ) + || ph_is_nasal( right_phone ) ){ + fv = fvector( cand_right->f("midcoef") ); + if( fv->a_no_check(fv->n()-1) == -1.0 ) // means unvoiced + penalty += 0.5; + } + + return penalty; +} + + +/* + * DERIVED CLASS: EST_DefaultTargetCost + * + * This is CSTR's proposed default target cost. Nothing special, if you think you can + * do better derive your own class. + */ + +float EST_DefaultTargetCost::operator()(const EST_Item* targ, const EST_Item* cand) const +{ + set_targ_and_cand(targ,cand); + score = 0.0; + weight_sum = 0.0; + + score += add_weight(10.0)*stress_cost(); + score += add_weight(5.0)*position_in_syllable_cost(); + score += add_weight(5.0)*position_in_word_cost(); + score += add_weight(6.0)*partofspeech_cost(); + score += add_weight(15.0)*position_in_phrase_cost(); + score += add_weight(4.0)*left_context_cost(); + score += add_weight(3.0)*right_context_cost(); + + score /= weight_sum; + + // These are considered really bad, and will result in a score > 1. + score += 10.0*bad_duration_cost(); // see also join cost. + score += 10.0*bad_f0_cost(); + score += 10.0*punctuation_cost(); + score += 10.0*out_of_lex_cost(); + + return score ; +} + +/* + * DERIVED CLASS: EST_APMLTargetCost + * + */ + +float EST_APMLTargetCost::operator()(const EST_Item* targ, const EST_Item* cand) const +{ + set_targ_and_cand(targ,cand); + score = 0.0; + weight_sum = 0.0; + + score += add_weight(10.0)*stress_cost(); + score += add_weight(20.0)*apml_accent_cost(); // APML only! + score += add_weight(5.0)*position_in_syllable_cost(); + score += add_weight(5.0)*position_in_word_cost(); + score += add_weight(6.0)*partofspeech_cost(); + score += add_weight(4.0)*position_in_phrase_cost(); + score += add_weight(4.0)*left_context_cost(); + score += add_weight(3.0)*right_context_cost(); + + score /= weight_sum; + + score += 10.0*bad_duration_cost(); // see also join cost. + score += 10.0*bad_f0_cost(); + score += 10.0*punctuation_cost(); + score += 10.0*out_of_lex_cost(); + + return score; + +} + +/* + * DERIVED CLASS: EST_SingingTargetCost + * + * Mostly default stuff, but tries to match pitch and duration + * specified on Tokens from the xxml + * + */ + +float EST_SingingTargetCost::pitch_cost() const +{ + + const EST_Item *targ_word = tc_get_word(targ); + const EST_Item *cand_word = tc_get_word(cand); + const EST_Item *next_targ_word = tc_get_word(targ->next()); + const EST_Item *next_cand_word = tc_get_word(cand->next()); + const float threshold = 0.1; + float targ_pitch, cand_pitch; + LISP l_tmp; + + float score = 0.0; + + if ( (targ_word && !cand_word) || (!targ_word && cand_word) ) + { + cout << "PITCH PENALTY WORD NON-WORD MISMATCH\n"; + score += 0.5; + } + else + if (targ_word && cand_word) + { + + l_tmp = lisp_val(parent(targ_word,"Token")->f("freq",est_val(0))); + + // This currently assumes one syllable words, need to process + // the list more for multiple syllable words, or move the info + // to the syllable. + if(CONSP(l_tmp)) + targ_pitch = get_c_float(car(l_tmp)); + else + targ_pitch = get_c_float(l_tmp); + cand_pitch = parent(cand_word,"Token")->F("freq",0.0); + + if ( ! threshold_equal(targ_pitch,cand_pitch,threshold)) + { + cout << "PP: " << targ_pitch << " " << cand_pitch << endl; + score += 0.5; + } + } + + if ( (next_targ_word && !next_cand_word) || (!next_targ_word && next_cand_word) ) + { + cout << "PITCH PENALTY NEXT WORD NON-WORD MISMATCH\n"; + score += 0.5; + } + else + if(next_targ_word && next_cand_word) + { + l_tmp = lisp_val(parent(next_targ_word,"Token")->f("freq",est_val(0))); + if(CONSP(l_tmp)) + targ_pitch = get_c_float(car(l_tmp)); + else + targ_pitch = get_c_float(l_tmp); + cand_pitch = parent(next_cand_word,"Token")->F("freq",0.0); + + if ( ! threshold_equal(targ_pitch,cand_pitch,threshold)) + { + cout << "NP: " << targ_pitch << " " << cand_pitch << endl; + score += 0.5; + } + } + + if (score == 0.0) + cout << "NO PITCH PENALTY\n"; + + return score; +} + +float EST_SingingTargetCost::duration_cost() const +{ + + const EST_Item *targ_word = tc_get_word(targ); + const EST_Item *cand_word = tc_get_word(cand); + const EST_Item *next_targ_word = tc_get_word(targ->next()); + const EST_Item *next_cand_word = tc_get_word(cand->next()); + float targ_dur, cand_dur; + LISP l_tmp; + + float score = 0.0; + + if ( (targ_word && !cand_word) || (!targ_word && cand_word) ) + score += 0.5; + else + if (targ_word && cand_word) + { + l_tmp = lisp_val(parent(targ_word,"Token")->f("dur",est_val(0))); + if(CONSP(l_tmp)) + targ_dur = get_c_float(car(l_tmp)); + else + targ_dur = get_c_float(l_tmp); + + cand_dur = parent(cand_word,"Token")->F("dur",0.0); + + if ( targ_dur != cand_dur ) + score += 0.5; + } + + if ( (next_targ_word && !next_cand_word) || (!next_targ_word && next_cand_word) ) + score += 0.5; + else + if(next_targ_word && next_cand_word) + { + l_tmp = lisp_val(parent(next_targ_word,"Token")->f("dur",est_val(0))); + if(CONSP(l_tmp)) + targ_dur = get_c_float(car(l_tmp)); + else + targ_dur = get_c_float(l_tmp); + cand_dur = parent(next_cand_word,"Token")->F("dur",0.0); + + if ( targ_dur != cand_dur ) + score += 0.5; + } + + return score; +} + + + +float EST_SingingTargetCost::operator()(const EST_Item* targ, const EST_Item* cand) const +{ + set_targ_and_cand(targ,cand); + score = 0.0; + weight_sum = 0.0; + + score += add_weight(50.0)*pitch_cost(); + score += add_weight(50.0)*duration_cost(); + score += add_weight(5.0)*stress_cost(); + score += add_weight(5.0)*position_in_syllable_cost(); + score += add_weight(5.0)*position_in_word_cost(); + score += add_weight(5.0)*partofspeech_cost(); + score += add_weight(5.0)*position_in_phrase_cost(); + score += add_weight(5.0)*punctuation_cost(); + score += add_weight(4.0)*left_context_cost(); + score += add_weight(3.0)*right_context_cost(); + score += add_weight(2.0)*bad_duration_cost(); // see also join cost. + + return score / weight_sum; +} + + + +/* + * DERIVED CLASS: EST_SchemeTargetCost + * + * This lets you implement your target cost in scheme, so you can + * change it on the fly. Great for developement, but about 5 times as slow. + * + */ + + +float EST_SchemeTargetCost::operator()( const EST_Item* targ, const EST_Item* cand ) const + { + LISP r,l; + + l = cons(tc, + cons( siod(targ), cons( siod(cand), NIL) )); + r = leval(l,NIL); + if ((consp(r)) || (r == NIL) || !(numberp(r))) + { + cerr << "Lisp function: " << tc << + " did not return float score" << endl; + festival_error(); + } + else + score = get_c_float(r); + + return score; + } + + + +/* + * Auxillary target cost functions + */ + + +static const EST_String simple_pos(const EST_String &s) +{ + if( s == "nn" || s == "nnp" || s == "nns" || s == "nnps" || s == "fw" || s == "sym" || s == "ls") + return "n"; + if( s == "vbd" || s == "vb" || s == "vbn" || s == "vbz" || s == "vbp" || s == "vbg") + return "v"; + if( s == "jj" || s == "jjr" || s == "jjs" || s == "1" || s == "2" || s == "rb" || + s == "rp" || s == "rbr" || s == "rbs") + return "other"; + return "func"; + } + +static const EST_Utterance *tc_get_utt(const EST_Item *seg) +{ + return seg->relation()->utt(); +} + +static const EST_Item *tc_get_syl(const EST_Item *seg) +{ + // if(!seg) + // return 0; + + return parent(seg,"SylStructure"); +} + +static const EST_Item *tc_get_word(const EST_Item *seg) + { + // if(!seg) + // return 0; + const EST_Item *syl = tc_get_syl(seg); + + if(syl) + return parent(syl,"SylStructure"); + else + return 0; + } + + +/* adapted from base/ff.cc */ +static EST_String ff_tobi_accent(const EST_Item *s) +{ + // First tobi accent related to syllable + EST_Item *nn = as(s,"Intonation"); + EST_Item *p; + + for (p=daughter1(nn); p; p=p->next()) + if (p->name().contains("*")) + return p->name(); + return "NONE"; +} + +static EST_String ff_tobi_endtone(const EST_Item *s) +{ + // First tobi endtone (phrase accent or boundary tone) + EST_Item *nn = as(s,"Intonation"); + EST_Item *p; + + for (p=daughter1(nn); p; p=p->next()) + { + EST_String l = p->name(); + if ((l.contains("%")) || (l.contains("-"))) + return p->name(); + } + + return "NONE"; +} + +static bool threshold_equal(float a, float b, float threshold) +{ + if ( ( (a-b) < threshold ) && ( (a-b) > -threshold ) ) + return true; + else + return false; +} diff --git a/src/modules/MultiSyn/EST_TargetCost.h b/src/modules/MultiSyn/EST_TargetCost.h new file mode 100644 index 0000000..d3bc9b8 --- /dev/null +++ b/src/modules/MultiSyn/EST_TargetCost.h @@ -0,0 +1,188 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Korin Richmond */ +/* Date: October 2002 */ +/* --------------------------------------------------------------------- */ +/* Interface for family of target cost function objects which */ +/* calculate a target score for given candidate */ +/* */ +/* */ +/* */ +/* */ +/*************************************************************************/ + + +#ifndef __EST_TARGETCOST_H__ +#define __EST_TARGETCOST_H__ + +#include "siod.h" // for gc_protect( obj**) +class EST_Item; + +/**@name Interface for Target Cost function object +*/ + +//@{ + +/** Object oriented approach for better or for worse... +*/ + +/* Positional enum */ + +enum tcpos_t { + TCPOS_INITIAL, + TCPOS_MEDIAL, + TCPOS_FINAL, + TCPOS_INTER, + TCPOS_POSITIONS +}; + + +/* + * BASE CLASS: EST_TargetCost + */ +class EST_TargetCost { + public: + + EST_TargetCost() : defScore(0.0){}; + virtual ~EST_TargetCost() {}; + + // Base class operator() doesn't do much, but it will work. + virtual float operator()( const EST_Item* targp, const EST_Item* candp ) const + { return defScore; } + + // Allow flatpacking + virtual const bool is_flatpack() const {return false;} + + protected: + float defScore; + // Temp variables for use while calculating cost (here for speed) + mutable float score; + mutable float weight_sum; + mutable const EST_Item *cand; + mutable const EST_Item *targ; + + inline void set_cand(const EST_Item* seg) const + { cand = seg; } + + inline void set_targ(const EST_Item* seg) const + { targ = seg; } + + inline void set_targ_and_cand(const EST_Item* tseg, const EST_Item* cseg) const + { set_targ(tseg); set_cand(cseg); } + + inline float add_weight(float w) const + { weight_sum += w ; return w; } + + // General cost functions that derived classes may want to use. + float apml_accent_cost() const; + float stress_cost() const; + float position_in_syllable_cost() const; + float position_in_word_cost() const; + float position_in_phrase_cost() const; + float partofspeech_cost() const; + float punctuation_cost() const; + float left_context_cost() const; + float right_context_cost() const; + float bad_duration_cost() const; + float out_of_lex_cost() const; + float bad_f0_cost() const; +}; + + +/* + * DERIVED CLASS: EST_DefaultTargetCost + */ +class EST_DefaultTargetCost : public EST_TargetCost { + + public: + float operator()(const EST_Item* targ, const EST_Item* cand) const; + +}; + +/* + * DERIVED CLASS: EST_APMLTargetCost + */ +class EST_APMLTargetCost : public EST_TargetCost { + + public: + float operator()(const EST_Item* targ, const EST_Item* cand) const; + +}; + +/* + * DERIVED CLASS: EST_SingingTargetCost + */ +class EST_SingingTargetCost : public EST_TargetCost { + + public: + float operator()(const EST_Item* targ, const EST_Item* cand) const; + + protected: + float pitch_cost() const; + float duration_cost() const; + +}; + + + + + +/* + * DERIVED CLASS: EST_SchemeTargetCost + */ +class EST_SchemeTargetCost : public EST_TargetCost { + + private: + LISP tc; + + public: + EST_SchemeTargetCost( LISP scheme_targetcost ) + : EST_TargetCost(), tc(scheme_targetcost) + { gc_protect( &tc ); } + + ~EST_SchemeTargetCost() + { gc_unprotect( &tc ); } + + float operator()(const EST_Item* targ, const EST_Item* cand) const; +}; + + +#endif // __EST_TARGETCOST_H__ + + + + + diff --git a/src/modules/MultiSyn/Makefile b/src/modules/MultiSyn/Makefile new file mode 100644 index 0000000..8890ea3 --- /dev/null +++ b/src/modules/MultiSyn/Makefile @@ -0,0 +1,78 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## (University of Edinburgh, UK) and ## +## Korin Richmond ## +## Copyright (c) 2002 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +# # +# Contains implementation of Generalis[ed/able] synthesis # +# (like unit selection) # +# # +# Author: Korin Richmond (korin@cstr.ed.ac.uk) Aug 2002 # +########################################################################### +TOP=../../.. +DIRNAME=src/modules/MultiSyn + +LIB_BUILD_DIRS = inst_tmpl +BUILD_DIRS = $(LIB_BUILD_DIRS) + +ALL_DIRS = $(BUILD_DIRS) + +H = UnitSelection.h \ + VoiceBase.h DiphoneUnitVoice.h \ + VoiceModuleBase.h DiphoneVoiceModule.h \ + EST_TargetCost.h TargetCostRescoring.h \ + EST_JoinCost.h EST_JoinCostCache.h \ + DiphoneBackoff.h safety.h EST_DiphoneCoverage.h \ + EST_FlatTargetCost.h + + +SRCS = UnitSelection.cc \ + VoiceBase.cc DiphoneUnitVoice.cc \ + VoiceModuleBase.cc DiphoneVoiceModule.cc \ + EST_TargetCost.cc TargetCostRescoring.cc \ + EST_JoinCost.cc EST_JoinCostCache.cc \ + DiphoneBackoff.cc EST_DiphoneCoverage.cc \ + EST_FlatTargetCost.cc + +OBJS = $(SRCS:.cc=.o) + +FILES = Makefile $(SRCS) $(H) + +LOCAL_INCLUDES = -I../include + +INLIB = $(TOP)/src/lib/libFestival.a + +ALL = .buildlib + +include $(TOP)/config/common_make_rules + + diff --git a/src/modules/MultiSyn/TargetCostRescoring.cc b/src/modules/MultiSyn/TargetCostRescoring.cc new file mode 100644 index 0000000..ac550ec --- /dev/null +++ b/src/modules/MultiSyn/TargetCostRescoring.cc @@ -0,0 +1,121 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author: Korin Richmond */ +/* Date: June 2006 */ +/* --------------------------------------------------------------------- */ +/* */ +/* Code to allow the possibility of changing the costs (thus ranking) */ +/* assigned to candidates units by the target cost function, typically */ +/* based on what those original costs are... */ +/* */ +/*************************************************************************/ + +#include "TargetCostRescoring.h" +#include "ling_class/EST_Item.h" +#include "EST_TList.h" +#include "EST_viterbi.h" + +#include "DiphoneVoiceModule.h" //for getJoinTime(const EST_Item *it) + +// Do some additional target cost here, based on the acoustic +// properties of the units found matching this target unit. (Changes +// the candidate list "candidates" in place.) This currently is +// limited to looking at the durations of the top scoring candidates, +// but also could be extended to include components for the pitch and +// amplitude at various points for example... + +void rescoreCandidates( EST_VTCandidate *candidates, float beam_width, float mult ) +{ + // first calculate the stats for the "model" + float dur = 0.0; + EST_Item *ph1 = 0; + EST_Item *ph2 = 0; + //EST_FVector *ph1_mid = 0; + //EST_FVector *ph2_mid = 0; + + EST_TList scores; + + // get all scores to work out what durations are "suitable" + for( EST_VTCandidate *it = candidates; it != 0; it=it->next ){ + ph1 = it->s; + ph2 = ph1->next(); + // ph1_mid = fvector( ph1->f( "midcoef" ) ); + // ph2_mid = fvector( ph2->f( "midcoef" ) ); + + dur = getJoinTime(ph2) - getJoinTime(ph1); // duration of diphone unit + scores.append( ScorePair(it->score,dur, it) ); + } + + sort( scores ); + //cerr << scores << endl; + + // calculate simple mean duration of some or all of candidates + float meandur = 0.0; + int n = 0; + + if( beam_width < 0 ){ // just average all of them + for( EST_Litem *li = scores.head(); li != 0; li = li->next() ){ + meandur += scores(li)._dur; + n++; + } + } + else{ + float score_cutoff = scores.first()._score + beam_width; + for( EST_Litem *li = scores.head(); li != 0; li = li->next() ){ + if( scores(li)._score > score_cutoff ) + break; + else{ + meandur += scores(li)._dur; + n++; + } + } + } + + meandur /= n; + + // then tweak the scores based on that + for( EST_Litem *li = scores.head(); li != 0; li = li->next() ){ + float cand_dur = scores(li)._dur; + // cerr << scores(li)._cand->score << " "; + scores(li)._cand->score += (mult * abs( cand_dur - meandur ) ); + // cerr << scores(li)._cand->score << endl; + } +} + +ostream& operator << ( ostream& out, const ScorePair &sp ) +{ + out << sp._score << " " << sp._dur << "\n"; + return out; +} diff --git a/src/modules/MultiSyn/TargetCostRescoring.h b/src/modules/MultiSyn/TargetCostRescoring.h new file mode 100644 index 0000000..d77ddeb --- /dev/null +++ b/src/modules/MultiSyn/TargetCostRescoring.h @@ -0,0 +1,107 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Korin Richmond */ +/* Date: June 2006 */ +/* --------------------------------------------------------------------- */ +/* */ +/* Code to allow the possibility of changing the costs (thus ranking) */ +/* assigned to candidates units by the target cost function, typically */ +/* based on what those original costs are... */ +/* */ +/* */ +/*************************************************************************/ + +#ifndef __TARGETCOSTRESCORING_H__ +#define __TARGETCOSTRESCORING_H__ + +#include +#include "EST_viterbi.h" + +// do some additional target cost, based on the acoustic properties of +// the units found matching this target unit. This function assumes the +// viterbi candidates linked list is in the correct format, with the +// correct information available. + +// toplevel function to call (Changes the candidate list in place.) +// beam_width specifies which candidates are included in the +// computation (i.e. only those candidates with target costs within +// "beam_width" of the best candidate's score). +// mult specifies an arbitrary weighting factor to be applied to the penalties +// added to the candidates' target cost scores by this function (e.g. a mult of +// 0.0 will stop the target cost scores being changed at all, whereas a mult of 10.0 +// will make it have a large effect). +void rescoreCandidates( EST_VTCandidate *candidates, float beam_width, float mult ); + +// our internal class for doing the work +class ScorePair { +public: + ScorePair( ) + : _score(0.0), + _dur(0.0), + _cand(0) + {}; + + ScorePair( float score, float duration, EST_VTCandidate* cand ) + : _score(score), + _dur(duration), + _cand(cand) + {}; + +public: + float _score; + float _dur; + EST_VTCandidate* _cand; +}; + +inline bool operator > ( const ScorePair &A, const ScorePair &B ) +{ + return ( A._score > B._score ) ? true : false; +} + +inline bool operator < ( const ScorePair &A, const ScorePair &B ) +{ + return ( A._score < B._score ) ? true : false; +} + +inline bool operator == ( const ScorePair &A, const ScorePair &B ) +{ + return ( A._score == B._score ) ? true : false; +} + +ostream& operator<<( ostream& out, const ScorePair &sp ); + + +#endif // __TARGETCOSTRESCORING_H__ diff --git a/src/modules/MultiSyn/UnitSelection.cc b/src/modules/MultiSyn/UnitSelection.cc new file mode 100644 index 0000000..d675a51 --- /dev/null +++ b/src/modules/MultiSyn/UnitSelection.cc @@ -0,0 +1,812 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Korin Richmond */ +/* Date: August 2002 */ +/* --------------------------------------------------------------------- */ +/* Generalis[ed/able] Unit Selection synthesis */ +/* */ +/*************************************************************************/ + +#include "siod.h" +#include "EST.h" +#include "UnitSelection.h" +#include "DiphoneUnitVoice.h" +#include "DiphoneVoiceModule.h" +#include "EST_JoinCost.h" +#include "EST_TargetCost.h" +#include "EST_FlatTargetCost.h" +#include "safety.h" + + +static LISP FT_voice_get_units(LISP l_voice, LISP l_utt) +{ + EST_Utterance *u = get_c_utt(l_utt); + VoiceBase *v = voice( l_voice ); + + // Find units and put in utterance + v->getUnitSequence( u ); + + return l_utt; +} + + +/////////////////////////////////////////////////////////////////////////////// +// experimental candidate omission stuff ////////////////////////////////////// +static LISP FT_voice_reget_units(LISP l_duv, LISP l_utt) +{ + if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_duv)) ){ + EST_Utterance *u = get_c_utt(l_utt); + duv->regetUnitSequence( u ); + } + else + EST_error( "du_voice_reget_units: expects DiphoneUnitVoice" ); + + return l_utt; +} + +static LISP FT_utt_tag_unit( LISP l_utt, LISP l_unitnum ) +{ + EST_Utterance *u = get_c_utt(l_utt); + const int n = get_c_int( l_unitnum ); + + if( n<1 ) + EST_error( "unit number must be greater than 1" ); + + EST_Item *it = u->relation("Unit")->first(); + int i; + for( i=1; i<=n && it!= 0; i++ ) + it=it->next(); + + if( i<=n ) + EST_error( "unit number greater than number of items in unit relation") ; + + ItemList* omitlist=0; + + if( !it->f_present("omitlist") ){ + omitlist = new ItemList; + CHECK_PTR(omitlist); + it->set_val( "omitlist", est_val(omitlist) ); + } + else + omitlist = itemlist( it->f("omitlist")); + + EST_Item *currentCandidateUsed = item(it->f("source_ph1")); + + printf( "setting omit flag on unit %d (item %p)\n", i-1, currentCandidateUsed ); + + omitlist->append( currentCandidateUsed ); + + return l_utt; +} + + +/////////////////////////////////////////////////////////////////////////////// + + +static LISP FT_voice_get_name(LISP l_voice) +{ + VoiceBase *v = voice( l_voice ); + EST_String n = v->name(); + + return strintern(n); +} + +static LISP FT_voice_set_name(LISP l_voice, LISP l_name) +{ + EST_String n = get_c_string(l_name); + VoiceBase *v = voice( l_voice ); + + v->set_name( n ); + + return NIL; +} + +static LISP FT_voice_init( LISP l_voice, LISP l_ignore_bad_tag ) +{ + VoiceBase *v = voice( l_voice ); + + bool ignore_bad_tag = false; + if( l_ignore_bad_tag != NIL ) + ignore_bad_tag = true; + + v->initialise( ignore_bad_tag ); + return NIL; +} + + +static LISP FT_voice_debugLevel( LISP l_voice, LISP l_level ) +{ + VoiceBase *v = voice( l_voice ); + + if( l_level != NIL ) + v->setVerbosity( get_c_int(l_level) ); + + return flocons( v->verbosity() ); +} + +static void parseVoiceDataParams( LISP l_dataparams, + EST_String *uttDir, + EST_String *wavDir, + EST_String *pmDir, + EST_String *coefDir, + EST_String *uttExt, + EST_String *wavExt, + EST_String *pmExt, + EST_String *coefExt ) +{ + int listlen = siod_llength( l_dataparams ); + + if( listlen == 8 ){ + *uttExt = get_c_string( CAR1(CDR4(l_dataparams)) ); + *wavExt = get_c_string( CAR2(CDR4(l_dataparams)) ); + *pmExt = get_c_string( CAR3(CDR4(l_dataparams)) ); + *coefExt = get_c_string( CAR4(CDR4(l_dataparams)) ); + } + else if( listlen == 4 ){ //set some defaults + *uttExt = ".utt"; + *wavExt = ".wav"; + *pmExt = ".pm"; + *coefExt = ".coef"; + } + else + EST_error( "Incorrect number of voice data parameters" ); + + *uttDir = get_c_string( CAR1(l_dataparams) ); + *wavDir = get_c_string( CAR2(l_dataparams) ); + *pmDir = get_c_string( CAR3(l_dataparams) ); + *coefDir = get_c_string( CAR4(l_dataparams) ); +} + + +static LISP FT_make_du_voice( LISP l_bnames, LISP l_datadirs, LISP l_srate ) +{ + EST_String uttDir, wavDir, pmDir, coefDir; + EST_String uttExt, wavExt, pmExt, coefExt; + + int wav_srate = get_c_int( l_srate ); + if( wav_srate <= 0 ) + EST_error( "Waveform sample rate set to %d", wav_srate ); + + parseVoiceDataParams( l_datadirs, + &uttDir, &wavDir, &pmDir, &coefDir, + &uttExt, &wavExt, &pmExt, &coefExt ); + + EST_StrList bnames; + siod_list_to_strlist( l_bnames, bnames ); + + DiphoneUnitVoice *v; + v = new DiphoneUnitVoice( bnames, + uttDir, wavDir, pmDir, coefDir, + static_cast(wav_srate), + uttExt, wavExt, pmExt, coefExt ); + + CHECK_PTR(v); + + return siod( static_cast(v) ); +} + + +static LISP FT_make_du_voice_module( LISP l_bnames, LISP l_datadirs, LISP l_srate ) +{ + EST_String uttDir, wavDir, pmDir, coefDir; + EST_String uttExt, wavExt, pmExt, coefExt; + + int wav_srate = get_c_int( l_srate ); + if( wav_srate <= 0 ) + EST_error( "Waveform sample rate set to %d", wav_srate ); + + parseVoiceDataParams( l_datadirs, + &uttDir, &wavDir, &pmDir, &coefDir, + &uttExt, &wavExt, &pmExt, &coefExt ); + + EST_StrList bnames; + siod_list_to_strlist( l_bnames, bnames ); + + DiphoneVoiceModule *vm; + vm = new DiphoneVoiceModule( bnames, + uttDir, wavDir, pmDir, coefDir, + static_cast(wav_srate), + uttExt, wavExt, pmExt, coefExt ); + CHECK_PTR(vm); + + return siod( vm ); +} + +static LISP FT_voice_add_module( LISP l_duv, LISP l_bnames, LISP l_datadirs, LISP l_srate ) +{ + EST_String uttDir, wavDir, pmDir, coefDir; + EST_String uttExt, wavExt, pmExt, coefExt; + + + int wav_srate = get_c_int( l_srate ); + if( wav_srate <= 0 ) + EST_error( "Waveform sample rate set to %d", wav_srate ); + + parseVoiceDataParams( l_datadirs, + &uttDir, &wavDir, &pmDir, &coefDir, + &uttExt, &wavExt, &pmExt, &coefExt ); + + EST_StrList bnames; + siod_list_to_strlist( l_bnames, bnames ); + + if( DiphoneUnitVoice* duv = dynamic_cast(voice(l_duv)) ){ + if( ! duv->addVoiceModule(bnames, uttDir, wavDir, pmDir, coefDir, + static_cast(wav_srate), + uttExt, wavExt, pmExt, coefExt ) ) + EST_error( "voice.addModule failed" ); + } + else + EST_error( "voice_add_module: expects DiphoneUnitVoice for now" ); + + return NIL; +} + + +static LISP FT_du_voice_function( LISP x ) +{ + if( DiphoneUnitVoice *duv = dynamic_cast(voice(x)) ){ + // do things you can only do to a DiphoneUnitVoice + (void) duv; + } + else + EST_error( "du_voice_function: expects DiphoneUnitVoice" ); + + return NIL; +} + + +static LISP FT_du_voice_precomputeJoinCosts( LISP l_voice, LISP l_phones ) +{ + if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_voice)) ){ + EST_StrList phones; + siod_list_to_strlist( l_phones, phones ); + duv->precomputeJoinCosts( phones ); + } + else + EST_error( "du_voice_function: expects DiphoneUnitVoice" ); + + return NIL; +} + +static LISP FT_du_voice_set_pruning_beam( LISP l_voice, LISP l_width ) +{ + if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_voice)) ){ + duv->set_pruning_beam( get_c_float( l_width ) ); + } + else + EST_error( "du_voice_set_pruning: expects DiphoneUnitVoice" ); + + return NIL; +} + +static LISP FT_du_voice_set_ob_pruning_beam( LISP l_voice, LISP l_width ) +{ + if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_voice)) ){ + duv->set_ob_pruning_beam( get_c_float( l_width ) ); + } + else + EST_error( "du_voice_set_pruning: expects DiphoneUnitVoice" ); + + return NIL; +} + +static LISP FT_du_voice_set_tc_rescoring_beam( LISP l_voice, LISP l_width ) +{ + if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_voice)) ){ + duv->set_tc_rescoring_beam( get_c_float( l_width ) ); + } + else + EST_error( "du_voice_set_tc_scoring_beam: expects DiphoneUnitVoice" ); + + return NIL; +} + +static LISP FT_du_voice_set_tc_rescoring_weight( LISP l_voice, LISP l_weight ) +{ + if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_voice)) ){ + duv->set_tc_rescoring_weight( get_c_float( l_weight ) ); + } + else + EST_error( "du_voice_set_tc_rescoring_weight: expects DiphoneUnitVoice" ); + + return NIL; +} + +static LISP FT_du_voice_set_target_cost_weight( LISP l_voice, LISP l_weight ) +{ + if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_voice)) ){ + duv->set_target_cost_weight( get_c_float( l_weight ) ); + } + else + EST_error( "du_voice_set_target_cost_weight: expects DiphoneUnitVoice" ); + + return NIL; +} + +// static LISP FT_du_voice_set_join_cost_weight( LISP l_voice, LISP l_weight ) +// { +// if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_voice)) ){ +// duv->set_join_cost_weight( get_c_float( l_weight ) ); +// } +// else +// EST_error( "du_voice_set_target_cost_weight: expects DiphoneUnitVoice" ); + +// return NIL; +// } + +static LISP FT_du_voice_set_prosodic_modification( LISP l_voice, LISP l_mod ) +{ + if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_voice)) ){ + duv->set_prosodic_modification( get_c_int( l_mod ) ); + } + else + EST_error( "du_voice_set_prosodic_modification: expects DiphoneUnitVoice" ); + + return NIL; +} + +static LISP FT_du_voice_prosodic_modification(LISP l_voice) +{ + int pm; + if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_voice)) ) + { + pm = duv->get_prosodic_modification(); + if (pm == 0) + return NIL; + else + return truth; + } + else + { + EST_error( "du_voice_prosodic_modification: expects DiphoneUnitVoice" ); + return NIL; + } +} + + + +static LISP FT_du_voice_set_diphonebackoff(LISP l_voice, LISP l_list) +{ + + DiphoneBackoff *dbo; + + + if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_voice)) ) + { + + dbo = new DiphoneBackoff(l_list); + CHECK_PTR(dbo); + duv->set_diphone_backoff(dbo); + } + else + EST_error( "du_voice_set_diphone_backoff: expects DiphoneUnitVoice" ); + + return NIL; +} + +static LISP FT_du_voice_setTargetCost(LISP l_voice, LISP l_tc) +{ + EST_TargetCost *tc=0; + + if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_voice)) ) + { + if( l_tc == NIL ){ + tc = new EST_TargetCost(); + CHECK_PTR(tc); + } + else if(l_tc == truth){ + tc = new EST_DefaultTargetCost(); + CHECK_PTR(tc); + } + else if(TYPE(l_tc) == tc_closure) { + tc = new EST_SchemeTargetCost(l_tc); + CHECK_PTR(tc); + } + else if(streq(get_c_string(l_tc),"flat")){ + tc = new EST_FlatTargetCost(); + CHECK_PTR(tc); + } + else if(streq(get_c_string(l_tc),"apml")){ + tc = new EST_APMLTargetCost(); + CHECK_PTR(tc); + } + else if(streq(get_c_string(l_tc),"singing")){ + tc = new EST_SingingTargetCost(); + CHECK_PTR(tc); + } + else + EST_error( "du_voice_setTargetcost: Unknown targetcost type." ); + + duv->setTargetCost(tc,true); + } + else + EST_error( "du_voice_setTargetcost: expects DiphoneUnitVoice" ); + + return NIL; +} + +static LISP FT_du_voice_setJoinCost(LISP l_voice, LISP l_tc) +{ + if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_voice)) ) + { + EST_JoinCost *jc=0; + if( l_tc == truth ){ + jc = new EST_JoinCost(); + CHECK_PTR(jc); + } + else + EST_error( "du_voice_setJoinCost: currently t is the only supported second arguement" ); + + duv->setJoinCost(jc,true); + } + else + EST_error( "du_voice_setJoinCost: expects DiphoneUnitVoice" ); + + return NIL; +} + +static LISP FT_voicemodule_getUtterance( LISP l_vm, LISP l_n ) +{ + EST_Utterance *utt = 0; + + if( DiphoneVoiceModule *dvm = dynamic_cast(voice(l_vm)) ){ + dvm->getUtterance( &utt, get_c_int(l_n) ); + } + else + EST_error( "du_voicemodule_function: expects DiphoneVoiceModule" ); + + EST_warning( "EST_Utterance = %x\n", utt ); + + return siod( utt ); +} + + +static LISP FT_voice_getUtteranceByFileID( LISP l_vm, LISP l_fileid ) +{ + EST_Utterance *utt = 0; + + if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_vm)) ){ + duv->getCopyUnitUtterance( get_c_string(l_fileid), &utt ); + } + else + EST_error( "du_voicemodule_function: expects DiphoneVoiceModule" ); + + EST_warning( "EST_Utterance = %x\n", utt ); + + return siod( utt ); +} + + +static LISP FT_voice_unit_count( LISP l_voice ) +{ + VoiceBase *v = voice( l_voice ); + return flocons( static_cast(v->numDatabaseUnits()) ); +} + +static LISP FT_voice_unit_type_count( LISP l_voice ) +{ + VoiceBase *v = voice( l_voice ); + return flocons( static_cast(v->numUnitTypes()) ); +} + +static LISP FT_voice_unit_available( LISP l_voice, LISP l_unit ) +{ + VoiceBase *v = voice( l_voice ); + + if( v->unitAvailable( get_c_string( l_unit ) ) ) + return truth; + + return NIL; +} + +static LISP FT_voice_num_available_candidates( LISP l_voice, LISP l_unit ) +{ + VoiceBase *v = voice( l_voice ); + + unsigned int number = v->numAvailableCandidates( get_c_string(l_unit) ); + + return flocons( number ); +} + +static LISP FT_du_voice_diphone_coverage( LISP l_voice, LISP l_filename) +{ + DiphoneUnitVoice *duv = dynamic_cast(voice(l_voice)); + EST_String filename = get_c_string(l_filename); + + duv->diphoneCoverage(filename); + + return NIL; +} + +static LISP FT_du_voice_set_jc_f0_weight( LISP l_voice, LISP l_val ) +{ + + if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_voice)) ){ + duv->set_jc_f0_weight( get_c_float( l_val ) ); + if (duv->get_jc()) + duv->get_jc()->set_f0_weight( get_c_float( l_val ) ); + } + else + EST_error( "du_voice_set_jc_f0_weight: expects DiphoneUnitVoice" ); + + return NIL; +} + +static LISP FT_du_voice_get_jc_f0_weight( LISP l_voice ) +{ + if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_voice)) ){ + return flocons(duv->get_jc_f0_weight()); + } + else + EST_error( "du_voice_get_jc_f0_weight: expects DiphoneUnitVoice" ); + + return NIL; +} + +static LISP FT_du_voice_set_jc_power_weight( LISP l_voice, LISP l_val ) +{ + + if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_voice)) ){ + duv->set_jc_power_weight( get_c_float( l_val ) ); + if (duv->get_jc()) + duv->get_jc()->set_power_weight( get_c_float( l_val ) ); + } + else + EST_error( "du_voice_set_jc_power_weight: expects DiphoneUnitVoice" ); + + return NIL; +} + +static LISP FT_du_voice_get_jc_power_weight( LISP l_voice ) +{ + if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_voice)) ){ + return flocons(duv->get_jc_power_weight()); + } + else + EST_error( "du_voice_get_jc_power_weight: expects DiphoneUnitVoice" ); + + return NIL; +} + +static LISP FT_du_voice_set_jc_spectral_weight( LISP l_voice, LISP l_val ) +{ + + if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_voice)) ){ + duv->set_jc_spectral_weight( get_c_float( l_val ) ); + if (duv->get_jc()) + duv->get_jc()->set_spectral_weight( get_c_float( l_val ) ); + } + else + EST_error( "du_voice_set_jc_spectral_weight: expects DiphoneUnitVoice" ); + + return NIL; +} + +static LISP FT_du_voice_get_jc_spectral_weight( LISP l_voice ) +{ + if( DiphoneUnitVoice *duv = dynamic_cast(voice(l_voice)) ){ + return flocons(duv->get_jc_spectral_weight()); + } + else + EST_error( "du_voice_get_jc_spectral_weight: expects DiphoneUnitVoice" ); + + return NIL; +} + + +void festival_MultiSyn_init(void) +{ + proclaim_module("MultiSyn"); + + init_subr_2("voice.getUnits", FT_voice_get_units, + "(voice.getUnits VOICE UTT)\n\ + Voice object VOICE looks at the segment relation in utterance UTT\n\ + and adds a suitable unit sequence in the Unit relation."); + + init_subr_2("utt.tag_unit", FT_utt_tag_unit, + "(utt.tag_unit UTT INT)\n\ + Tags the candidate used in Unit INT in the Unit relation for omission in\n\ + subsequent reruns of viterbi search for the unit sequence."); + + init_subr_2("du_voice.regetUnits", FT_voice_reget_units, + "(du_voice.regetUnits DU_VOICE UTT)\n\ + Voice object DU_VOICE looks at the unit relation in utterance UTT\n\ + redoes the viterbi, respecting candidates flagged for omission"); + + init_subr_1("voice.getName", FT_voice_get_name, + "(voice.getName VOICE)\n\ + Gets the name of a voice."); + + init_subr_2("voice.setName", FT_voice_set_name, + "(voice.setName VOICE NAME)\n\ + Sets the name of a voice."); + + init_subr_2("voice.debugLevel", FT_voice_debugLevel, + "(voice.debugLevel VOICE LEVEL)\n\ + Query and/or set the level of debugging for VOICE to LEVEL (positive int).\n\ + A level of 0 switches off all debugging messages in the voice. Leaving\n\ + level unspecified simply returns the current level."); + + init_subr_3( "make_du_voice", FT_make_du_voice, + "(make_du_voice BASENAMES DATADIRS SAMPLERATE)\n\ + Creates a Diphone UnitSelection Voice, using the list of file basenames\n\ + in LISP list BASENAMES, and the four directory strings in the DATADIRS list.\n\ + The voice waveform data files are sampled at SAMPLERATE." ); + + init_subr_3( "make_du_voice_module", FT_make_du_voice_module, + "(make_du_voice_module BASENAMES DATADIRS SAMPLERATE)\n\ + Creates a Diphone UnitSelection Voice Module, using the list of file basenames\n\ + in LISP list BASENAMES, and the four directory strings in the DATADIRS list.\n\ + The voice waveform data files are sampled at SAMPLERATE." ); + + init_subr_4( "voice.addModule", FT_voice_add_module, + "(voice.addModule VOICE BASENAMES DATADIRS SAMPLERATE)\n\ + Creates a Diphone UnitSelection Voice Module, using the list of file basenames\n\ + in LISP list BASENAMES, and the three directory strings in the remaining\n\ + argument DATADIRS and adds it to the current voice. The voice waveform data\n\ + files are sampled at SAMPLERATE." ); + + init_subr_2("voice.init", FT_voice_init, + "(voice.init VOICE IGNORE_BAD)\n\ + Perform any necessary initialisation for the UnitSelection Voice object VOICE.\n\ + If optional IGNORE_BAD is not nil, then phones marked with a \"bad\" feature\n\ + in the segment relation will not be added to the diphone inventory" ); + + init_subr_2("voice.getUtteranceByFileID", FT_voice_getUtteranceByFileID, + "(voice.getUtteranceByFileID VOICE FILEIDSTRING)\n\ + Returns copy of the Utterance in the voice module database, with\n\ + all the Unit relation filled in, ready for synthesis."); + + init_subr_2("voicemodule.getUtterance", FT_voicemodule_getUtterance, + "(voicemodule.getUtterance VOICEMODULE UTTNUMBER)\n\ + Returns copy of UTTNUMBER Utterance in the voice module database." ); + + init_subr_1("voice.numUnitTypes", FT_voice_unit_type_count, + "(voice.numUnitTypes VOICE)\n\ + Number of different unit types available in Voice object VOICE."); + + init_subr_1("voice.numUnits", FT_voice_unit_count, + "(voice.numUnits VOICE)\n\ + Total units available in Voice object VOICE."); + + init_subr_2("voice.unitAvailable", FT_voice_unit_available, + "(voice.unitAvailable VOICE UNIT)\n\ + Returns true or false whether speech fragment UNIT (string) is\n\ + present in the VOICE"); + + init_subr_2("voice.numAvailableCandidates", FT_voice_num_available_candidates, + "(voice.numAvailableCandidates VOICE UNIT)\n\ + Returns the number of instances of speech fragment UNIT (string)\n\ + present in the VOICE"); + + init_subr_1("du_voice_function", FT_du_voice_function, + "(du_voice_function DU_VOICE)\n\ + Does something to a DU_VOICE only"); + + init_subr_2("du_voice.precomputeJoinCosts", FT_du_voice_precomputeJoinCosts, + "(du_voice.precomputeJoinCosts DU_VOICE PHONELIST)\n\ + Calculate and store the join costs for all instances of phones present\n\ + in the phone list."); + + init_subr_2("du_voice.set_pruning_beam", FT_du_voice_set_pruning_beam, + "(du_voice.set_pruning_beam DU_VOICE BEAMFLOAT)\n\ + Sets the beam pruning parameter for Viterbi search"); + + init_subr_2("du_voice.set_ob_pruning_beam", FT_du_voice_set_ob_pruning_beam, + "(du_voice.set_ob_pruning_beam DU_VOICE BEAMFLOAT)\n\ + Sets the observation beam pruning parameter for Viterbi search"); + + init_subr_2("du_voice.set_tc_rescoring_beam", FT_du_voice_set_tc_rescoring_beam, + "(du_voice.set_tc_rescoring_beam DU_VOICE BEAMFLOAT)\n\ + Sets the target cost rescoring beam width for Viterbi search (set to -1.0 to disable)"); + + init_subr_2("du_voice.set_tc_rescoring_weight", FT_du_voice_set_tc_rescoring_weight, + "(du_voice.set_tc_rescoring_weight DU_VOICE WEIGHTFLOAT)\n\ + Sets the target cost rescoring weight for Viterbi search (set to 0.0 to disable)"); + + init_subr_2("du_voice.set_target_cost_weight", FT_du_voice_set_target_cost_weight, + "(du_voice.set_target_cost_weight DU_VOICE FLOAT)\n\ + Sets the target cost weight (default is 1)"); + + /* + * This is currently not implemented, due to problems of passing + * such a parameter to the viterbi extend path function. + * + * init_subr_2("du_voice.set_join_cost_weight", FT_du_voice_set_join_cost_weight, + * "(du_voice.set_join_cost_weight DU_VOICE FLOAT)\n \ + * Sets the join cost weight (default is 1)"); + */ + + init_subr_2("du_voice.set_jc_f0_weight", FT_du_voice_set_jc_f0_weight, + "(du_voice.set_jc_f0_weight DU_VOICE FLOAT)\n\ + Sets the joincost f0 weight (default 1)"); + + init_subr_1("du_voice.get_jc_f0_weight", FT_du_voice_get_jc_f0_weight, + "(du_voice.get_jc_f0_weight DU_VOICE)\n\ + Gets the joincost f0 weight"); + + init_subr_2("du_voice.set_jc_power_weight", FT_du_voice_set_jc_power_weight, + "(du_voice.set_jc_power_weight DU_VOICE FLOAT)\n\ + Sets the joincost power weight (default 1)"); + + init_subr_1("du_voice.get_jc_power_weight", FT_du_voice_get_jc_power_weight, + "(du_voice.get_jc_f0_weight DU_VOICE)\n\ + Gets the joincost f0 weight"); + + init_subr_2("du_voice.set_jc_spectral_weight", FT_du_voice_set_jc_spectral_weight, + "(du_voice.set_jc_spectral_weight DU_VOICE FLOAT)\n\ + Sets the joincost spectral weight (default 1)"); + + init_subr_1("du_voice.get_jc_spectral_weight", FT_du_voice_get_jc_spectral_weight, + "(du_voice.get_jc_f0_weight DU_VOICE)\n\ + Gets the joincost f0 weight"); + + init_subr_2("du_voice.set_prosodic_modification", FT_du_voice_set_prosodic_modification, + "(du_voice.set_prosodic_modification DU_VOICE INT)\n\ + Turns prosodic modification on or off (default is 0 [off])\n\ + This will only work if durations and f0 targets are provided"); + + init_subr_1("du_voice.prosodic_modification", FT_du_voice_prosodic_modification, + "(du_voice.prosodic_modification DU_VOICE)\n\ + Status of prosodic modification on or off."); + + init_subr_2("du_voice.setDiphoneBackoff", FT_du_voice_set_diphonebackoff, + "(du_voice.setDiphoneBackoff DU_VOICE LIST)\n\ + Adds diphone backoff rules to the voice."); + + init_subr_2("du_voice.setJoinCost", FT_du_voice_setJoinCost, + "(du_voice.setJoinCost DU_VOICE JOINCOST)\n\ + Sets the voice joincost function.\n\ + If t is specified then the default joincost is used."); + + init_subr_2("du_voice.setTargetCost", FT_du_voice_setTargetCost, + "(du_voice.setTargetCost DU_VOICE TARGETCOST)\n\ + Sets the voice targetcost function.\n\ + If t is specified then the default targetcost is used.\n\ + If nil is specified then a null targetcost is used.\n\ + If a closure is specified, this is called as the target cost.\n\ + If 'apml is specified and apml targetcost is uses."); + + init_subr_2("du_voice.getDiphoneCoverage", FT_du_voice_diphone_coverage, + "(du_voice.getDiphoneCoverage DU_VOICE FILENAME)\n\ + prints diphone coverage information for this voice\n\ + use filename '-' for stdout."); + + +} + + + diff --git a/src/modules/MultiSyn/UnitSelection.h b/src/modules/MultiSyn/UnitSelection.h new file mode 100644 index 0000000..1e7fedb --- /dev/null +++ b/src/modules/MultiSyn/UnitSelection.h @@ -0,0 +1,59 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Korin Richmond */ +/* Date: Aug 2002 */ +/* --------------------------------------------------------------------- */ +/* */ +/*************************************************************************/ + + +#ifndef __UNITSELECTION_H__ +#define __UNITSELECTION_H__ + +#include "festival.h" + +/**@name Generalis[ed/able] Unit Selection concatenative synthesis +*/ + +//@{ + +/** Object oriented approach for better or for worse... +*/ + +void register_UnitSelection_features(void); + +#endif // __UNITSELECTION_H__ + diff --git a/src/modules/MultiSyn/VoiceBase.cc b/src/modules/MultiSyn/VoiceBase.cc new file mode 100644 index 0000000..801dd80 --- /dev/null +++ b/src/modules/MultiSyn/VoiceBase.cc @@ -0,0 +1,51 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Korin Richmond */ +/* Date: Aug 2002 */ +/* --------------------------------------------------------------------- */ +/* Abstract base class for "voices" - a top level interface to any code */ +/* which knows how to take a preprocessed utterance object and fill in */ +/* the information for subsequent synthesis by later festival modules */ +/*************************************************************************/ + +#include "VoiceBase.h" +#include "siod_est.h" + + +SIOD_REGISTER_CLASS(voice,VoiceBase) +VAL_REGISTER_CLASS(voice,VoiceBase) + + diff --git a/src/modules/MultiSyn/VoiceBase.h b/src/modules/MultiSyn/VoiceBase.h new file mode 100644 index 0000000..ef07f59 --- /dev/null +++ b/src/modules/MultiSyn/VoiceBase.h @@ -0,0 +1,103 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Korin Richmond */ +/* Date: Aug 2002 */ +/* --------------------------------------------------------------------- */ +/* Abstract base class for "voices" - a top level interface to any code */ +/* which knows how to take a preprocessed utterance object and fill in */ +/* the information for subsequent synthesis by later festival modules */ +/*************************************************************************/ + + +#ifndef __VOICEBASE_H__ +#define __VOICEBASE_H__ + +#include "EST_Val.h" +#include "EST_Val_defs.h" + +// EST_TKVL.h is necessary because of header dependencies +// which should probably be fixed at root, then this include +// could be removed... +#include "EST_TKVL.h" +#include "siod_defs.h" + +class EST_Utterance; + +class VoiceBase { +public: + VoiceBase() : _verbosity(0), _name( EST_String::Empty ) {}; + virtual ~VoiceBase() {}; + virtual void initialise( bool ignore_bad_tag=false ) = 0; + + virtual EST_String name() { return _name; } + virtual void set_name(EST_String n) { _name = n;} + + virtual void setVerbosity( unsigned int level ) { _verbosity=level; } + virtual unsigned int verbosity() const { return _verbosity; } + + //virtual bool synthesiseWave( EST_Utterance *utt ) = 0; + // this function should at best be moved to concatenative voice + // subclass and the above one implemented instead in order to + // generalise to non concatenative synthesis methods + virtual void getUnitSequence( EST_Utterance *utt )=0; + + virtual unsigned int numDatabaseUnits() const = 0; + virtual unsigned int numUnitTypes() const = 0; + virtual bool unitAvailable( const EST_String &unit ) const = 0; + virtual unsigned int numAvailableCandidates( const EST_String &unit ) const =0; + + +private: + //EST_Features params; + unsigned int _verbosity; + EST_String _name; +}; + +SIOD_REGISTER_CLASS_DCLS(voice,VoiceBase) +VAL_REGISTER_CLASS_DCLS(voice,VoiceBase) + + + +/**@name Generalis[ed/able] notion of a "voice" within festival +*/ + +//@{ + +/** Object oriented approach for better or for worse... +*/ + +#endif // __VOICEBASE_H__ + diff --git a/src/modules/MultiSyn/VoiceModuleBase.cc b/src/modules/MultiSyn/VoiceModuleBase.cc new file mode 100644 index 0000000..9c81ba3 --- /dev/null +++ b/src/modules/MultiSyn/VoiceModuleBase.cc @@ -0,0 +1,49 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Korin Richmond */ +/* Date: Aug 2002 */ +/* --------------------------------------------------------------------- */ +/* Abstract base class for "voice modules" - a top level interface to */ +/* any collection of speech material in the voice database */ +/*************************************************************************/ + +#include "VoiceModuleBase.h" +#include "siod_est.h" + +SIOD_REGISTER_CLASS(voicemodule,VoiceModuleBase) +VAL_REGISTER_CLASS(voicemodule,VoiceModuleBase) + + diff --git a/src/modules/MultiSyn/VoiceModuleBase.h b/src/modules/MultiSyn/VoiceModuleBase.h new file mode 100644 index 0000000..8e8424b --- /dev/null +++ b/src/modules/MultiSyn/VoiceModuleBase.h @@ -0,0 +1,94 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +//*************************************************************************/ +/* */ +/* Author: Korin Richmond */ +/* Date: Aug 2002 */ +/* --------------------------------------------------------------------- */ +/* Abstract base class for "voice modules" - a top level interface to */ +/* any collection of speech material in the voice database */ +/*************************************************************************/ + +#ifndef __VOICEMODULEBASE_H__ +#define __VOICEMODULEBASE_H__ + +#include "EST_Val.h" +#include "EST_Val_defs.h" + +// EST_TKVL.h is necessary because of header dependencies +// which should probably be fixed at root, then this include +// could be removed... +#include "EST_TKVL.h" +#include "siod_defs.h" + +#include "EST_FlatTargetCost.h" + + +class EST_Utterance; +class VoiceBase; + +class VoiceModuleBase { +public: + VoiceModuleBase( VoiceBase *parent = 0 ): _parentVoice(parent) {}; + virtual ~VoiceModuleBase() {}; + virtual void initialise(const EST_TargetCost *tc, bool ignore_bad_tag=false) = 0; + virtual unsigned int numModuleUnits() const = 0; + virtual unsigned int numUnitTypes() const = 0; + virtual unsigned int numAvailableCandidates( const EST_String &unit ) const =0; + +private: + //EST_Features params; + const VoiceBase *_parentVoice; +}; + +SIOD_REGISTER_CLASS_DCLS(voicemodule,VoiceModuleBase) +VAL_REGISTER_CLASS_DCLS(voicemodule,VoiceModuleBase) + + + +/**@name Generalis[ed/able] notion of a unit selection "voice" module +within festival. +*/ + +//@{ + +/** Object oriented approach for better or for worse... +This is primarily intended to be a "module" from the point of view of a +collection of *related* speech database material (such as a "weather" module) +or a "news broadcast" module. At the very least, though, it is just a +collection of speech database material. +*/ + +#endif // __VOICEMODULEBASE_H__ + diff --git a/src/modules/MultiSyn/inst_tmpl/Makefile b/src/modules/MultiSyn/inst_tmpl/Makefile new file mode 100644 index 0000000..79aef09 --- /dev/null +++ b/src/modules/MultiSyn/inst_tmpl/Makefile @@ -0,0 +1,63 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## (University of Edinburgh, UK) and ## +## Korin Richmond ## +## Copyright (c) 2002 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +# Makefile for base class instantiations # +# # +# Author: Korin Richmond (korin@cstr.ed.ac.uk) Aug 2002 # +########################################################################### + +TOP=../../../.. +DIRNAME=src/modules/MultiSyn/inst_tmpl + +LOCAL_INCLUDES = -I$(TOP)/src/modules/MultiSyn + +TSRCS = hash_s_itemlistp_t.cc \ + hash_itemp_tcdatap_t.cc \ + list_uttp_t.cc \ + list_itemp_t.cc \ + list_voicemodulep_t.cc \ + list_strlist_t.cc \ + vector_jccp_t.cc \ + list_scorepair_t.cc + +CPPSRCS = $(TSRCS) + +SRCS = $(CPPSRCS) +OBJS = $(CPPSRCS:.cc=.o) +FILES = $(SRCS) Makefile + +ALL = .buildlib $(BUILD_DIRS) + +include $(TOP)/config/common_make_rules + diff --git a/src/modules/MultiSyn/inst_tmpl/hash_itemp_tcdatap_t.cc b/src/modules/MultiSyn/inst_tmpl/hash_itemp_tcdatap_t.cc new file mode 100644 index 0000000..1c4d1e5 --- /dev/null +++ b/src/modules/MultiSyn/inst_tmpl/hash_itemp_tcdatap_t.cc @@ -0,0 +1,59 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Rob Clark */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Rob Clark (robert@cstr.ed.ac.uk) */ +/* Date: June 2006 */ +/* ----------------------------------------------------------------------*/ +/* Instantiate THash for Item* TCData* */ +/*************************************************************************/ + +#include "EST_THash.h" +#include "EST_String.h" +#include "EST_FlatTargetCost.h" // for TCData + +//Declare_TStringHash_T(ItemList*,ItemListP) +//above macro has been disabled, so do it manually. +template <> EST_Item* EST_THash< EST_Item*, TCData* >::Dummy_Key=0; +template <> TCData* EST_THash< EST_Item*, TCData* >::Dummy_Value=0; + +#if defined(INSTANTIATE_TEMPLATES) + +#include "../base_class/EST_THash.cc" + +Instantiate_THash_T(EST_Item*, TCData*, TCDataP) + +#endif + + diff --git a/src/modules/MultiSyn/inst_tmpl/hash_s_itemlistp_t.cc b/src/modules/MultiSyn/inst_tmpl/hash_s_itemlistp_t.cc new file mode 100644 index 0000000..b2a443a --- /dev/null +++ b/src/modules/MultiSyn/inst_tmpl/hash_s_itemlistp_t.cc @@ -0,0 +1,60 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Korin Richmond (korin@cstr.ed.ac.uk) */ +/* Date: September 2002 */ +/* ----------------------------------------------------------------------*/ +/* Instantiate TStringHash for ItemList* */ +/*************************************************************************/ + +#include "EST_THash.h" +#include "EST_String.h" +#include "EST_Val.h" +#include "DiphoneUnitVoice.h" //for ItemList + +//Declare_TStringHash_T(ItemList*,ItemListP) +//above macro has been disabled, so do it manually. +template <> EST_String EST_THash< EST_String, ItemList* >::Dummy_Key="DUMMY"; +template <> ItemList* EST_THash< EST_String, ItemList* >::Dummy_Value=0; + +#if defined(INSTANTIATE_TEMPLATES) + +#include "../base_class/EST_THash.cc" + +Instantiate_TStringHash_T(ItemList *, ItemListP) + +#endif + + diff --git a/src/modules/MultiSyn/inst_tmpl/list_itemp_t.cc b/src/modules/MultiSyn/inst_tmpl/list_itemp_t.cc new file mode 100644 index 0000000..45f6f18 --- /dev/null +++ b/src/modules/MultiSyn/inst_tmpl/list_itemp_t.cc @@ -0,0 +1,60 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Date: September 2002 */ +/* --------------------------------------------------------------------- */ +/* Instantiate class list of pointers to EST_Utterance */ +/* */ +/* Author: Korin Richmond (korin@cstr.ed.ac.uk) */ +/*************************************************************************/ + +#include "EST_TList.h" +#include "EST_TSortable.h" +#include "EST_Val.h" + +class EST_Item; + + +Declare_TList_T(EST_Item *, EST_ItemP) + +#if defined(INSTANTIATE_TEMPLATES) + +#include "../base_class/EST_TList.cc" +#include "../base_class/EST_TSortable.cc" + +Instantiate_TList_T(EST_Item *, EST_ItemP) + +#endif + diff --git a/src/modules/MultiSyn/inst_tmpl/list_scorepair_t.cc b/src/modules/MultiSyn/inst_tmpl/list_scorepair_t.cc new file mode 100644 index 0000000..b03fe22 --- /dev/null +++ b/src/modules/MultiSyn/inst_tmpl/list_scorepair_t.cc @@ -0,0 +1,53 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2004 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Korin Richmond */ +/* Date: June 2006 */ +/* --------------------------------------------------------------------- */ +/* Template instantiations for target cost rescoring internal work class */ +/*************************************************************************/ + +#include "TargetCostRescoring.h" + +// manual template instantiation hassle +Declare_TList(ScorePair) +Declare_TSortable(ScorePair) + +#if defined(INSTANTIATE_TEMPLATES) +#include "../base_class/EST_TList.cc" +#include "../base_class/EST_TSortable.cc" +Instantiate_TList(ScorePair); +Instantiate_TSortable(ScorePair); +#endif + diff --git a/src/modules/MultiSyn/inst_tmpl/list_strlist_t.cc b/src/modules/MultiSyn/inst_tmpl/list_strlist_t.cc new file mode 100644 index 0000000..5ccf452 --- /dev/null +++ b/src/modules/MultiSyn/inst_tmpl/list_strlist_t.cc @@ -0,0 +1,51 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 2004 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Rob Clark */ +/* Date: Jan 2004 */ +/* --------------------------------------------------------------------- */ +/* Diphone backing off procedure */ +/*************************************************************************/ + + +#include "DiphoneBackoff.h" + + +// Template rubish + +Declare_TList_T(EST_TList,STR_LIST) + +#if defined(INSTANTIATE_TEMPLATES) +#include "../base_class/EST_TList.cc" + Instantiate_TList_T(EST_StrList,STR_LIST); +#endif diff --git a/src/modules/MultiSyn/inst_tmpl/list_uttp_t.cc b/src/modules/MultiSyn/inst_tmpl/list_uttp_t.cc new file mode 100644 index 0000000..f0fbb8a --- /dev/null +++ b/src/modules/MultiSyn/inst_tmpl/list_uttp_t.cc @@ -0,0 +1,60 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Date: September 2002 */ +/* ------------------------------------------------------------------- */ +/* Instantiate class list of pointers to EST_Utterance */ +/* */ +/* Author: Korin Richmond (korin@cstr.ed.ac.uk) */ +/*************************************************************************/ + +#include "EST_TList.h" +#include "EST_TSortable.h" +#include "EST_Val.h" + +class EST_Utterance; + + +Declare_TList_T(EST_Utterance *, EST_UtteranceP) + +#if defined(INSTANTIATE_TEMPLATES) + +#include "../base_class/EST_TList.cc" +#include "../base_class/EST_TSortable.cc" + +Instantiate_TList_T_MIN(EST_Utterance *, EST_UtteranceP) + +#endif + diff --git a/src/modules/MultiSyn/inst_tmpl/list_voicemodulep_t.cc b/src/modules/MultiSyn/inst_tmpl/list_voicemodulep_t.cc new file mode 100644 index 0000000..6258004 --- /dev/null +++ b/src/modules/MultiSyn/inst_tmpl/list_voicemodulep_t.cc @@ -0,0 +1,59 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Date: February 2002 */ +/* ------------------------------------------------------------------- */ +/* Instantiate class list of pointers to DiphoneVoiceModule */ +/* */ +/* Author: Korin Richmond (korin@cstr.ed.ac.uk) */ +/*************************************************************************/ + +#include "EST_TList.h" +#include "EST_TSortable.h" +#include "EST_Val.h" + +class DiphoneVoiceModule; + +Declare_TList_T(DiphoneVoiceModule *, DiphoneVoiceModuleP) + +#if defined(INSTANTIATE_TEMPLATES) + +#include "../base_class/EST_TList.cc" +#include "../base_class/EST_TSortable.cc" + +Instantiate_TList_T_MIN(DiphoneVoiceModule *, DiphoneVoiceModuleP) + +#endif + diff --git a/src/modules/MultiSyn/inst_tmpl/vector_jccp_t.cc b/src/modules/MultiSyn/inst_tmpl/vector_jccp_t.cc new file mode 100644 index 0000000..b243b5b --- /dev/null +++ b/src/modules/MultiSyn/inst_tmpl/vector_jccp_t.cc @@ -0,0 +1,60 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2004 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Date: February 2004 */ +/* -------------------------------------------------------------------- */ +/* Instantiate class list of pointers to EST_Utterance */ +/* */ +/* Author: Korin Richmond (korin@cstr.ed.ac.uk) */ +/*************************************************************************/ + +#include "EST_TVector.h" +#include "EST_TSimpleVector.h" + +class EST_JoinCostCache; + +Declare_TVector_Base_T(EST_JoinCostCache*,0,0,EST_JoinCostCacheP) +Declare_TSimpleVector_T(EST_JoinCostCache*,EST_JoinCostCacheP) + +#if defined(INSTANTIATE_TEMPLATES) + +#include "../base_class/EST_TSimpleVector.cc" +#include "../base_class/EST_TVector.cc" + +Instantiate_TVector_T_MIN(EST_JoinCostCache*,EST_JoinCostCacheP) +Instantiate_TSimpleVector(EST_JoinCostCache*) + +#endif + diff --git a/src/modules/MultiSyn/safety.h b/src/modules/MultiSyn/safety.h new file mode 100644 index 0000000..3dde447 --- /dev/null +++ b/src/modules/MultiSyn/safety.h @@ -0,0 +1,57 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* (University of Edinburgh, UK) and */ +/* Korin Richmond */ +/* Copyright (c) 2002 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Korin Richmond */ +/* Date: Sept 2002 */ +/* --------------------------------------------------------------------- */ +/* */ +/*************************************************************************/ + +#ifndef __SAFETY_H__ +#define __SAFETY_H__ + +#include "EST_error.h" + +#define CHECK_NULL 1 + +#ifdef CHECK_NULL +#define CHECK_PTR(p) if((p)==0){\ + EST_error("memory allocation failed (file %s, line %d)",\ + __FILE__,__LINE__);} +#else +#define CHECK_PTR(p) +#endif //CHECK_NULL + +#endif //__SAFETY_H__ diff --git a/src/modules/Text/Makefile b/src/modules/Text/Makefile new file mode 100644 index 0000000..fa5ee42 --- /dev/null +++ b/src/modules/Text/Makefile @@ -0,0 +1,51 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +TOP=../../.. +DIRNAME=src/modules/Text +H = tokenP.h + +NOOPTSRCS = text_modes.cc + +SRCS = text_aux.cc token.cc text.cc tok_ext.cc \ + token_pos.cc xxml.cc $(NOOPTSRCS) +OBJS = $(SRCS:.cc=.o) + +FILES = Makefile $(SRCS) $(H) + +INLIB = $(TOP)/src/lib/libFestival.a + +ALL = .buildlib + +include $(TOP)/config/common_make_rules + + diff --git a/src/modules/Text/text.cc b/src/modules/Text/text.cc new file mode 100644 index 0000000..0fb5d63 --- /dev/null +++ b/src/modules/Text/text.cc @@ -0,0 +1,300 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Basic text utilities */ +/* */ +/* This seems to be the only language specific part that cannot be */ +/* reasonably parameterized. I'd like to change this but I'm not sure */ +/* of the best way. Language-specific token processing module */ +/* generating Words (lexical items) from Tokens are current written as */ +/* FT_*_Token_Utt functions. A language-independent one is available */ +/* FT_Any_Token_Utt which depends heavily on the lexicon can be used */ +/* when you don't have the language specific version. */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "text.h" + +static void tts_raw_token(EST_Item *t); +static void tts_raw_utt(LISP utt); + +LISP FT_Text_Utt(LISP utt) +{ + // Parse text into words + EST_Utterance *u = get_c_utt(utt); + EST_String text; + EST_TokenStream ts; + LISP ws,punc,scs; + EST_Token tok; + + *cdebug << "Text module\n"; + + text = get_c_string(utt_iform(*u)); + + u->create_relation("Token"); + + ts.open_string(text); + ts.set_SingleCharSymbols(EST_Token_Default_SingleCharSymbols); + ts.set_PunctuationSymbols(EST_Token_Default_PunctuationSymbols); + ts.set_PrePunctuationSymbols(EST_Token_Default_PrePunctuationSymbols); + if ((ws = siod_get_lval("token.whitespace",NULL)) == NIL) + ts.set_WhiteSpaceChars(EST_Token_Default_WhiteSpaceChars); + else + ts.set_WhiteSpaceChars(get_c_string(ws)); + if ((punc = siod_get_lval("token.punctuation",NULL)) == NIL) + ts.set_PunctuationSymbols(EST_Token_Default_PunctuationSymbols); + else + ts.set_PunctuationSymbols(get_c_string(punc)); + if ((punc = siod_get_lval("token.prepunctuation",NULL)) == NIL) + ts.set_PrePunctuationSymbols(EST_Token_Default_PrePunctuationSymbols); + else + ts.set_PrePunctuationSymbols(get_c_string(punc)); + if ((scs = siod_get_lval("token.singlecharsymbols",NULL)) == NIL) + ts.set_SingleCharSymbols(EST_Token_Default_SingleCharSymbols); + else + ts.set_SingleCharSymbols(get_c_string(scs)); + + for (ts >> tok; tok.string() != ""; ts >> tok) + add_token(u,tok); + + return utt; +} + +LISP tts_file(LISP filename,LISP mode) +{ + LISP user_text_modes,t_mode; + + user_text_modes = siod_get_lval("tts_text_modes",NULL); + + if ((mode == NIL) || + (streq(get_c_string(mode),"text")) || + (streq(get_c_string(mode),"fundamental"))) + tts_file_raw(filename); // Simple text file + else + { + t_mode = siod_assoc_str(get_c_string(mode),user_text_modes); + if (t_mode == NIL) + { + // Attempt to load it + leval(cons(rintern("request"), + cons(strintern(EST_String(get_c_string(mode))+ + "-mode"),NIL)),NIL); + // get it again, and see if its defined + user_text_modes = siod_get_lval("tts_text_modes",NULL); + } + t_mode = siod_assoc_str(get_c_string(mode),user_text_modes); + if (t_mode == NIL) + { + cerr << "tts_file: can't find mode description \"" + << get_c_string(mode) << "\" using raw mode instead" << endl; + tts_file_raw(filename); // so read it as simple text file + } + else + tts_file_user_mode(filename,car(cdr(t_mode))); + } + + return NIL; +} + +void tts_file_raw(LISP filename) +{ + // Say the contents of a named file + EST_TokenStream ts; + LISP ws,prepunc,punc,scs; + LISP lutt,eou_tree; + LISP stream = NULL; + + + stream = fopen_c(get_c_string(filename), "rb"); + if (ts.open(stream->storage_as.c_file.f, FALSE) == -1) + { + cerr << "tts_file: can't open file \"" << filename << "\"\n"; + festival_error(); + } + ts.set_SingleCharSymbols(EST_Token_Default_SingleCharSymbols); + ts.set_PunctuationSymbols(EST_Token_Default_PunctuationSymbols); + ts.set_PrePunctuationSymbols(EST_Token_Default_PrePunctuationSymbols); + if ((ws = siod_get_lval("token.whitespace",NULL)) == NIL) + ts.set_WhiteSpaceChars(EST_Token_Default_WhiteSpaceChars); + else + ts.set_WhiteSpaceChars(get_c_string(ws)); + if ((punc = siod_get_lval("token.punctuation",NULL)) == NIL) + ts.set_PunctuationSymbols(EST_Token_Default_PunctuationSymbols); + else + ts.set_PunctuationSymbols(get_c_string(punc)); + if ((prepunc = siod_get_lval("token.prepunctuation",NULL)) == NIL) + ts.set_PrePunctuationSymbols(EST_Token_Default_PrePunctuationSymbols); + else + ts.set_PrePunctuationSymbols(get_c_string(prepunc)); + if ((scs = siod_get_lval("token.singlecharsymbols",NULL)) == NIL) + ts.set_SingleCharSymbols(EST_Token_Default_SingleCharSymbols); + else + ts.set_SingleCharSymbols(get_c_string(scs)); + eou_tree = siod_get_lval("eou_tree","No end of utterance tree set"); + + lutt = tts_chunk_stream(ts,tts_raw_token,tts_raw_utt,eou_tree,0); + + // The last one is returned because the chunker doesn't know if this + // is truly the end of an utterance or not, but here we do know. + tts_raw_utt(lutt); + + ts.close(); + if (stream) + fclose_l(stream); +} + +static void tts_raw_token(EST_Item *t) +{ + // Do something to token, in this case nothing + (void)t; +} + +static void tts_raw_utt(LISP utt) +{ + // Do (simple) tts on this utt + LISP lutt; + + // There are some pessimal cases when the utterance is empty + if ((utt == NIL) || + (get_c_utt(utt)->relation("Token")->length() == 0)) + return; // in this case do nothing. + + lutt = quote(utt); + lutt = cons(rintern("apply_hooks"), + cons(rintern("tts_hooks"), + cons(lutt,NIL))); + + + + lutt = cons(rintern("set!"), + cons(rintern("utt_tts"), + cons(lutt,NIL))); + + // Synth and Play it + lutt = leval(lutt,NIL); + user_gc(NIL); +} + +LISP new_token_utt(void) +{ + // An empty utterance ready to take Tokens + EST_Utterance *u = new EST_Utterance; + u->f.set("type","Tokens"); + u->create_relation("Token"); + return siod(u); +} + +LISP tts_chunk_stream(EST_TokenStream &ts, + TTS_app_tok app_tok, + TTS_app_utt app_utt, + LISP eou_tree, + LISP utt) +{ + // Get tokens from ts and cummulate them in u. + // Apply app_tok to each token + // Apply app_utt to each utt signalled + // Return untermitated utterance potentially for next call + // Uses the wagon tree eou_tree to predict utterance termination on + // penultimate token. + EST_Item *tok, *ebo; + EST_Token t; + if (utt == NIL) + utt = new_token_utt(); + EST_Utterance *u = get_c_utt(utt); + + while (!ts.eof()) + { + t = ts.get(); + tok = add_token(u,t); + app_tok(tok); // do what you do with the token + ebo = as(tok,"Token")->prev(); // end but one token + if ((ebo != 0) && + (wagon_predict(ebo,eou_tree) == 1)) + { + // Remove that extra token + remove_item(tok,"Token"); + app_utt(utt); // do what you do with the utt + utt = new_token_utt(); + u = get_c_utt(utt); + add_token(u,t); // add that last token to the new utt. + } + } + + return utt; +} + +#if 0 +LISP memon(void) +{ + printf("memon\n"); + putenv("MALLOC_TRACE=mallfile"); + mtrace(); + return NIL; +} + +LISP memoff(void) +{ + muntrace(); + printf("memoff\n"); + return NIL; +} +#endif + +void festival_Text_init(void) +{ + festival_token_init(); + festival_def_utt_module("Text",FT_Text_Utt, + "(Text UTT)\n\ + From string in input form tokenize and create a token stream."); + init_subr_2("tts_file",tts_file, + "(tts_file FILE MODE)\n\ + Low level access to tts function, you probably want to use the function\n\ + tts rather than this one. Render data in FILE as speech. Respect\n\ + MODE. Currently modes are defined through the variable tts_text_modes."); +#if 0 + init_subr_0("memon",memon, + "(tts_file FILE MODE)"); + init_subr_0("memoff",memoff, + "(tts_file FILE MODE)"); +#endif + init_subr_3("extract_tokens",extract_tokens, + "(extract_tokens FILE TOKENS OUTFILE)\n\ + Find all occurrences of TOKENS in FILE and output specified context around\n\ + the token. Results are appended to OUTFILE, if OUTFILE is nil, output\n\ + goes to stdout."); +} + diff --git a/src/modules/Text/text_aux.cc b/src/modules/Text/text_aux.cc new file mode 100644 index 0000000..47cd01e --- /dev/null +++ b/src/modules/Text/text_aux.cc @@ -0,0 +1,60 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Basic text utilities */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "text.h" + +EST_Item *add_token(EST_Utterance *u,EST_Token &t) +{ + // Add a token stream item to the end of Token relation + EST_Item *item = u->relation("Token")->append(); + + item->set_name(t.string()); + if (t.punctuation() != "") + item->set("punc",t.punctuation()); + item->set("whitespace",t.whitespace()); + item->set("prepunctuation",t.prepunctuation()); + + return item; +} + + + + diff --git a/src/modules/Text/text_modes.cc b/src/modules/Text/text_modes.cc new file mode 100644 index 0000000..9e4c7f1 --- /dev/null +++ b/src/modules/Text/text_modes.cc @@ -0,0 +1,159 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : November 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Support for general user tts modes */ +/* Each mode consists of user definable parameters for (at least) the */ +/* the following */ +/* filter external Unix program filter */ +/* utterance chunk tree: decision tree to determine end of utterance */ +/* punctuation */ +/* whitespace */ +/* token analysis rule */ +/* init function run before mode is applied */ +/* exit function run after mode is applied */ +/* */ +/*=======================================================================*/ +#include +#include "EST_unix.h" +#include "festival.h" +#include "text.h" +#include "lexicon.h" + +static void um_apply_filter(const EST_String &filtername, + const EST_String &infile, + const EST_String &outname); + +void tts_file_user_mode(LISP filename, LISP params) +{ + + volatile EST_String tmpname = make_tmp_filename(); + volatile EST_String inname = (EST_String)get_c_string(filename); + volatile EST_String filter; + volatile EST_TokenStream ts; + volatile LISP func; + jmp_buf *old_errjmp = est_errjmp; + int old_errjmp_ok = errjmp_ok; + + func = get_param_lisp("init_func",params,NIL); + if (func != NIL) + leval(cons(func,NIL),NIL); // run initial function if specified + + errjmp_ok = 1; + est_errjmp = walloc(jmp_buf,1); + + if (setjmp(*est_errjmp)) + { + cerr << "festival: text modes, caught error and tidying up\n"; + if (siod_ctrl_c == TRUE) + { + wfree(est_errjmp); + est_errjmp = old_errjmp; + errjmp_ok = old_errjmp_ok; + err("forwarded ctrl_c",NIL); + } + } + else + { + + filter.ignore_volatile() = get_param_str("filter",params,""); + um_apply_filter(filter.ignore_volatile(),inname.ignore_volatile(),tmpname.ignore_volatile()); + + if (streq("xxml",get_param_str("analysis_type",params,""))) + tts_file_xxml(strintern(tmpname.ignore_volatile())); + else if (streq("xml",get_param_str("analysis_type",params,""))) + { + // As xml support is optional we call it through a LISP + // function which wont be defined if its not in this installation + leval(cons(rintern("tts_file_xml"),cons(strintern(tmpname.ignore_volatile()),NIL)),NIL); + } + else + tts_file_raw(strintern(tmpname.ignore_volatile())); + } + wfree(est_errjmp); + est_errjmp = old_errjmp; + errjmp_ok = old_errjmp_ok; + + unlink(tmpname.ignore_volatile()); + + func = get_param_lisp("exit_func",params,NIL); + if (func != NIL) + leval(cons(func,NIL),NIL); // run end function if specified +} + +void um_apply_filter(const EST_String &filtername, + const EST_String &infile, + const EST_String &outfile) +{ + // filter the file into standard form + EST_String command; + + if (access(infile,R_OK) != 0) + { + cerr << "TTS user mode: \"" << infile << "\" cannot be accessed" << + endl; + festival_error(); + } + + if (filtername == "") + { // don't bother forking + FILE *fdin, *fdout; + char buff[256]; + int n; + if ((fdin = fopen(infile,"rb")) == NULL) + { + cerr << "TTS user mode: \"" << infile << "\" cannot be read from" + << endl; + festival_error(); + } + if ((fdout = fopen(outfile,"wb")) == NULL) + { + cerr << "TTS user mode: \"" << outfile << "\" cannot be written to" + << endl; + festival_error(); + } + + while ((n = fread(buff,1,256,fdin)) > 0) + fwrite(buff,1,n,fdout); + fclose(fdin); + fclose(fdout); + } + else + { + command = filtername + " '" + infile + "' > " + outfile; + system(command); // should test if this is successful or not + } +} + diff --git a/src/modules/Text/tok_ext.cc b/src/modules/Text/tok_ext.cc new file mode 100644 index 0000000..9e24bf2 --- /dev/null +++ b/src/modules/Text/tok_ext.cc @@ -0,0 +1,166 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : February 1997 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* EST_Token extract */ +/* */ +/* Extract tokens and related features from text files. */ +/* */ +/* This is designed to find examples of particular tokens from large */ +/* text corpora (10s millions of words). This initial use is for */ +/* Resolving homographs using David Yarowsky's techniques. */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "lexicon.h" +#include "text.h" + +static int rhc = 10; +static int lhc = 10; + +static void search_file(const EST_String &filename, LISP tokens, LISP ofile); +static EST_Item *next_token(EST_TokenStream &ts, EST_Relation &ps, + EST_Item *s); +static void append_token(EST_Relation &ps, const EST_Token &s); +static void output_find(const EST_String &filename, + EST_Item *s,LISP v,LISP tfeats, FILE* fd); + +LISP extract_tokens(LISP file, LISP tokens, LISP ofile) +{ + // Extract occurrences of tokens in file + + search_file(get_c_string(file),tokens, ofile); + + return NIL; +} + +static void search_file(const EST_String &filename, LISP tokens, LISP ofile) +{ + // Step through the file token by token with a window of tokens + // output info when current token matchs on of those required + EST_TokenStream ts; + EST_Relation ps; + EST_Item *s = 0; + LISP l,v; + FILE *ofd; + + if (ts.open(filename) == -1) + { + cerr << "Extract_tokens: can't open file \"" << + filename << "\" for reading\n"; + festival_error(); + } + ts.set_PunctuationSymbols(EST_Token_Default_PunctuationSymbols); + ts.set_PrePunctuationSymbols(EST_Token_Default_PrePunctuationSymbols); + + if (ofile == NIL) + ofd = stdout; + else if ((ofd = fopen(get_c_string(ofile),"a")) == NULL) + { + cerr << "extract_tokens: cannot open \"" << get_c_string(ofile) + << "\" for appending" << endl; + festival_error(); + } + + for (s = next_token(ts,ps,s); s != 0; s=next_token(ts,ps,s)) + { + for (l=tokens; l != NIL; l=cdr(l)) +// if (s->name().matches(make_regex(get_c_string(car(car(l)))))) + { + v = leval(cons(car(car(l)),cons(siod(s),NIL)),NIL); + if (v != NIL) + output_find(filename,s,v,car(l),ofd); + } + } + + ts.close(); + if (ofd != stdout) + fclose(ofd); + +} + +static void output_find(const EST_String &filename, + EST_Item *s,LISP v,LISP tfeats, FILE* fd) +{ + // Found a match so output info + LISP t; + + fprintf(fd,"%s %s ",get_c_string(v), + (const char *)filename); + for (t=cdr(tfeats); t != NIL; t=cdr(t)) + fprintf(fd,"%s ",(const char *) + ffeature(s,get_c_string(car(t))).string()); + fprintf(fd,"\n"); +} + +static EST_Item *next_token(EST_TokenStream &ts, + EST_Relation &ps, + EST_Item *s) +{ + // return next EST_Token as stream item extending right hand context + // and deleting left hand context as required. + EST_Item *ns; + int i; + + if (s == 0) + { + // at start so fill rhc + for (i=0; i < lhc; i++) + append_token(ps,EST_Token("*lhc*")); + append_token(ps,ts.get()); + ns = ps.last(); + for (i=0; i < rhc; i++) + append_token(ps,ts.get()); + return ns; + } + + // As token can be "" if there is trailing whitespace before eof + // I ignore it. + if ((!ts.eof()) && (ts.peek() != "")) + append_token(ps,ts.get()); + remove_item(ps.first(),"Token"); + + return s->next(); +} + +static void append_token(EST_Relation &ps, const EST_Token &t) +{ + // Append s as stream cell on end of ps + EST_Item *item = ps.append(); + + item->set_name(t.string()); + item->set("filepos",t.filepos()); +} diff --git a/src/modules/Text/token.cc b/src/modules/Text/token.cc new file mode 100644 index 0000000..7360bc7 --- /dev/null +++ b/src/modules/Text/token.cc @@ -0,0 +1,674 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : November 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Tokenizing */ +/* */ +/* This provides tokenizing methods for tokens into words. All that */ +/* special rules stuff for analysizing numbers, dates, acronyms etc. */ +/* Much of this is still too specific and although easy to add to it */ +/* be better if the rules could be specified externally */ +/* */ +/* Note only English tokenization has any substance at present */ +/* */ +/*=======================================================================*/ +#include + +using namespace std; + +#include "festival.h" +#include "lexicon.h" +#include "modules.h" +#include "text.h" +#include "tokenP.h" + +static EST_Regex numpointnum("[0-9]*\\.[0-9]+"); +static EST_Regex RXintcommaed("[0-9][0-9]?[0-9]?,\\([0-9][0-9][0-9],\\)*[0-9][0-9][0-9]\\(\\.[0-9]+\\)?"); +static EST_Regex RXintord("[0-9]*\\(1st\\|2nd\\|3rd\\|[0-9]th\\)"); +static EST_Regex RXdottedabbrev("\\([A-Za-z]\\.\\)+[A-Za-z]\\.?"); +static EST_Regex RXapostropheS(".*'[sS]$"); +static EST_String PunctuationChars("'`.,:;!?{}[]()-\""); +static EST_Regex RXpunctuation("\\(\\]\\|[-[.,!?]\\)+"); +static EST_String remove_punct(const EST_String &tok); +static int only_punc(const EST_String &tok); +static LISP num_2_words(int iword); +static LISP say_num_as_words(const EST_String &num); +static LISP word_it(EST_Item *t,const EST_String tok); +static LISP builtin_word_it(EST_Item *token, EST_String tok); +static LISP say_as_letters(const EST_String &word); +static LISP say_num_as_ordinal(const EST_String &num); +static LISP say_num_as_year(const EST_String &num); +static LISP say_as_digits(const EST_String &word); + +LISP FT_English_Token_Utt(LISP utt); +LISP FT_Welsh_Token_Utt(LISP utt); +LISP FT_Spanish_Token_Utt(LISP utt); +LISP FT_Any_Token_Utt(LISP utt); + +static LISP user_token_to_word_func = NIL; + +LISP FT_Welsh_Token_Utt(LISP utt) +{ + return FT_Any_Token_Utt(utt); +} + +LISP FT_Spanish_Token_Utt(LISP utt) +{ + (void)utt; + cerr << "TOKEN: Spanish tokenization not yet supported\n"; + festival_error(); + + // never happens + return NULL; +} + +LISP FT_Any_Token_Utt(LISP utt) +{ + // Language independent EST_Token to Word module. Uses user specified + // token to word function of simply creates a word for each token. + EST_Utterance *u = get_c_utt(utt); + LISP words,w; + EST_Item *t; + EST_Item *new_word; + + user_token_to_word_func = siod_get_lval("token_to_words",NULL); + u->create_relation("Word"); + + for (t=u->relation("Token")->first(); t != 0; t = t->next()) + { + if (user_token_to_word_func != NIL) + { + words = word_it(t,t->name()); + for (w=words; w != NIL; w=cdr(w)) + { + new_word = add_word(u,car(w)); + append_daughter(t,"Token",new_word); + } + } + else + { // No user token_to_word function so just do it directly + new_word = add_word(u,t->name()); + append_daughter(t,"Token",new_word); + } + } + user_token_to_word_func = NIL; // reset this + + return utt; +} + +LISP FT_English_Token_Utt(LISP utt) +{ + // This module generates a word stream from a token stream + // Tokens may go to zero or more words, tokens retain information + // about their punctuation and spacing on the page. + // Preceeding and succeeding punctuation become words + EST_Utterance *u = get_c_utt(utt); + EST_Item *t; + LISP words,w,eou_tree,l; + EST_Item *new_word; + + *cdebug << "Token module (English)" << endl; + + eou_tree = siod_get_lval("eou_tree","No end of utterance tree"); + user_token_to_word_func = siod_get_lval("token_to_words",NULL); + u->create_relation("Word"); + + for (t=u->relation("Token")->first(); t != 0; t = t->next()) + { + words = word_it(t,t->name()); + // Initial punctuation becomes words + new_word = 0; + if ((t->f("prepunctuation") != "0") && + (t->f("prepunctuation") != "")) + { + l = symbolexplode(strintern(t->f("prepunctuation").string())); + for (w=l; w != NIL; w=cdr(w)) + { + new_word = add_word(u,car(w)); + append_daughter(t,"Token",new_word); + } + } + + // Words become words + for (w=words; w != NIL; w=cdr(w)) + { + new_word = add_word(u,car(w)); + append_daughter(t,"Token",new_word); + } + + // final word gets punctuation marking + if ((new_word != 0) && (ffeature(t,"punc") != "0")) + { + if ((ffeature(t,"punc") == ".") && + (wagon_predict(t,eou_tree) == 0)) + { // It wasn't a really punctuation mark + t->set("punc","0"); + } + else + { + l = symbolexplode(strintern(ffeature(t,"punc").string())); + for (w=l; w != NIL; w=cdr(w)) + { + new_word = add_word(u,car(w)); + append_daughter(t,"Token",new_word); + } + } + } + } + + user_token_to_word_func = NIL; + + return utt; + +} + +static LISP word_it(EST_Item *token, const EST_String tok) +{ + // The user may specify their own addition;a token to word rules + // through the variable user_token_to_word_func if so we must + // call that, which may or may not call the builtin version + // The builtin version if bound in LISP so that recursion works + // properly + // This takes a LISP utt as an argument as creating a new wraparound + // will cause gc to fail. + LISP tok_string = strcons(tok.length(),tok); + + if (user_token_to_word_func != NIL) // check user's rules + return leval(cons(user_token_to_word_func, + cons(siod(token), + cons(tok_string,NIL))),NIL); + else + return builtin_word_it(token,tok); +} + +static LISP builtin_word_it(EST_Item *token, EST_String tok) +{ + // Return a list of words for this token + EST_String token_pos; + + if (tok == "") + return NIL; + else if (in_current_lexicon(downcase(tok),NIL)) // if in lexicon use as is + { + if ((tok != token->name()) && // mainly to catch internal "a" + (tok.length() == 1)) + { + LISP let_pos = siod_get_lval("token.letter_pos",NULL); + return cons(cons(make_param_str("name",tok), + cons(make_param_lisp("pos",let_pos),NIL)), + NIL); + } + else + return cons(strintern(tok),NIL); + } + else if ((token_pos = (EST_String)ffeature(token,"token_pos")) == "ordinal") + return say_num_as_ordinal(tok); + else if (token_pos == "year") + return say_num_as_year(tok); + else if ((token_pos == "digits") || + (tok.matches(make_regex("0[0-9]+")))) + return say_as_digits(tok); + else if (tok.matches(RXint)) + return say_num_as_words(tok); + else if (tok.matches(RXintord)) + return say_num_as_ordinal(tok.at(0,tok.length()-2)); + else if (tok.matches(RXintcommaed)) // containing commas at thousands + { + if (tok.contains(".")) + return word_it(token,remove_punct(tok.before("."))+ + "."+tok.after(".")); + else + return say_num_as_words(remove_punct(tok)); + } + else if (tok.matches(RXapostropheS)) + { + return append(word_it(token,tok.at(0,tok.length()-2)), + cons(strintern("'s"),NIL)); + } + else if (tok.matches(numpointnum)) + { + EST_String afterpoint = tok.after("."); + LISP ap = NIL; + int i; + for (i=0; i < afterpoint.length(); i++) + ap = append(say_num_as_words(afterpoint.at(i,1)),ap); + return append(say_num_as_words(tok.before(".")), + cons(strintern("point"),reverse(ap))); + } + else if ((tok.matches(make_regex("[A-Z][A-Z]+"))) && + ((!tok.contains(make_regex("[AEIOUY]"))) || + ((!tok.contains(make_regex("[^AEIOU][AEIOU][^AEIOU]"))) && + (tok.length() < 5)))) // an acronym + return say_as_letters(tok); + else if (tok.matches(RXdottedabbrev)) + return say_as_letters(remove_punct(tok)); + else if ((tok.matches(RXalpha)) && + !(tok.matches(make_regex(".*[AEIOUYaeiouy].*")))) + return say_as_letters(tok); // no vowels so spell it + else if (tok.matches(RXalpha)) // as is, some sort of word + return cons(strintern(tok),NIL); + else if (only_punc(tok)) + return stringexplode(tok); + else if (tok.contains("-")) + return append(word_it(token,tok.before("-")), + word_it(token,tok.after("-"))); + else if (tok.contains(".")) // internet address + { + LISP a=NIL; + EST_String tok2 = tok; + for ( ; tok2.contains("."); tok2 = tok2.after(".")) + a = append(a, + append(word_it(token,tok2.before(".")), + cons(strintern("dot"),NIL))); + a = append(a,word_it(token,tok2)); + return a; + } + else if (tok.contains("/")) + return append(word_it(token,tok.before("/")), + cons(strintern("slash"), + word_it(token,tok.after("/")))); + else if (tok.contains("&")) + return append(word_it(token,tok.before("&")), + cons(strintern("ampersand"), + word_it(token,tok.after("&")))); + else if (tok.contains("_")) + return append(word_it(token,tok.before("_")), + cons(strintern("underscore"), + word_it(token,tok.after("_")))); + else if (tok.contains("'")) + return word_it(token,tok.before("'")+tok.after("'")); + else if (tok.contains("`")) + return append(word_it(token,tok.before("`")), + word_it(token,tok.after("`"))); + else if (tok.contains("\"")) + return append(word_it(token,tok.before("\"")), + word_it(token,tok.after("\""))); + else if (tok.contains(",")) + return append(word_it(token,tok.before(",")), + word_it(token,tok.after(","))); + else if (tok.contains("(")) + return append(word_it(token,tok.before("(")), + word_it(token,tok.after("("))); + else if (tok.contains(")")) + return append(word_it(token,tok.before(")")), + word_it(token,tok.after(")"))); + else if (tok.matches(make_regex("^[^a-zA-Z].+"))) // incrementally remove + return append(say_as_letters(tok.at(0,1)),// num/symbol from front + word_it(token,tok.at(1,tok.length()-1))); + else if (tok.matches(make_regex(".+[^a-zA-Z]$"))) // incrementally remove rear + return append(word_it(token,tok.at(0,tok.length()-1)), + say_as_letters(tok.at((int)tok.length()-1,1))); + else // could try harder + return say_as_letters(remove_punct(tok)); +} + +static LISP say_as_digits(const EST_String &word) +{ + // Should be string of digits, but I wont require it + // This isn't really correct for telephone numbers (oh/zero/double) + LISP l; + LISP lets = stringexplode(word); + LISP let_pos = siod_get_lval("token.letter_pos",NULL); + + for (l=lets; l != NIL; l=cdr(l)) + { + if (streq(get_c_string(car(l)),"0")) + CAR(l) = strintern("zero"); + else if (streq(get_c_string(car(l)),"1")) + CAR(l) = strintern("one"); + else if (streq(get_c_string(car(l)),"2")) + CAR(l) = strintern("two"); + else if (streq(get_c_string(car(l)),"3")) + CAR(l) = strintern("three"); + else if (streq(get_c_string(car(l)),"4")) + CAR(l) = strintern("four"); + else if (streq(get_c_string(car(l)),"5")) + CAR(l) = strintern("five"); + else if (streq(get_c_string(car(l)),"6")) + CAR(l) = strintern("six"); + else if (streq(get_c_string(car(l)),"7")) + CAR(l) = strintern("seven"); + else if (streq(get_c_string(car(l)),"8")) + CAR(l) = strintern("eight"); + else if (streq(get_c_string(car(l)),"9")) + CAR(l) = strintern("nine"); + else + CAR(l) = cons(make_param_lisp("name",car(l)), + cons(make_param_lisp("pos",let_pos),NIL)); + } + + return lets; +} + +static LISP say_as_letters(const EST_String &word) +{ + // Explode letters in word and say them, marking them as nouns + // This is particularly designed so that A/a doesn't come out as + // the determiner a which in typically schwa'd + LISP l; + LISP lets = stringexplode(word); + LISP let_pos = siod_get_lval("token.letter_pos",NULL); + + for (l=lets; l != NIL; l=cdr(l)) + { + EST_String name = EST_String(get_c_string(car(l))); + if (name.matches(make_regex("[0-9]"))) + CAR(l) = car(say_as_digits(get_c_string(car(l)))); +// else if (name.matches(make_regex("[^a-zA-Z]"))) +// // Not sure, probably a bug to get here +// CAR(l) = cons(make_param_str("name","symbol"), +// cons(make_param_lisp("pos",let_pos),NIL)); + else + CAR(l) = cons(make_param_lisp("name",car(l)), + cons(make_param_lisp("pos",let_pos),NIL)); + } + + return lets; +} + +static int only_punc(const EST_String &tok) +{ + // If this token consists solely of punctuation chars + // If this is true I'm probably suppose to say some of them + int i; + EST_String np; + const char *tokch = tok; + + for (i=0; i 9) + { + if (num(0) == '-') + return cons(strintern("minus"),say_as_digits(num.after("-"))); + else + return say_as_digits(num); + } + else + return num_2_words(atoi(num)); +} + +static LISP say_num_as_year(const EST_String &num) +{ + int iword = atoi(num); + + if (num.length() > 4) + return say_num_as_words(num); + else if (num.matches(make_regex("00"))) + { + return cons(strintern("o"), + cons(strintern("o"),NIL)); + } + else if (num.matches(make_regex("0[0-9]"))) + { + return cons(strintern("o"), + num_2_words(iword)); + } + else if (iword < 100) + return num_2_words(iword); + else if ((iword % 1000) < 10) + { + if ((iword % 1000) == 0) + return append(num_2_words(iword/1000), + cons(strintern("thousand"),NIL)); + else + return append(num_2_words(iword/1000), + cons(strintern("thousand"), + cons(strintern("and"), + num_2_words(iword%1000)))); + } + else if ((iword % 100) == 0) + return append(num_2_words(iword/100), + cons(strintern("hundred"),NIL)); + else if ((iword % 100) < 10) + return append(num_2_words(iword/100), + cons(strintern("o"), + num_2_words(iword%100))); + else + return append(num_2_words(iword/100), + num_2_words(iword%100)); +} + +static LISP num_2_words(int iword) +{ + // Convert number of list of words + int tens, units; + LISP s_tens, lang_stype=NIL; + + if (iword < 0) + return cons(strintern("minus"),num_2_words(-iword)); + else if (iword < 20) + switch (iword) { + case 0: return cons(strintern("zero"),NIL); + case 1: return cons(strintern("one"),NIL); + case 2: return cons(strintern("two"),NIL); + case 3: return cons(strintern("three"),NIL); + case 4: return cons(strintern("four"),NIL); + case 5: return cons(strintern("five"),NIL); + case 6: return cons(strintern("six"),NIL); + case 7: return cons(strintern("seven"),NIL); + case 8: return cons(strintern("eight"),NIL); + case 9: return cons(strintern("nine"),NIL); + case 10: return cons(strintern("ten"),NIL); + case 11: return cons(strintern("eleven"),NIL); + case 12: return cons(strintern("twelve"),NIL); + case 13: return cons(strintern("thirteen"),NIL); + case 14: return cons(strintern("fourteen"),NIL); + case 15: return cons(strintern("fifteen"),NIL); + case 16: return cons(strintern("sixteen"),NIL); + case 17: return cons(strintern("seventeen"),NIL); + case 18: return cons(strintern("eighteen"),NIL); + case 19: return cons(strintern("nineteen"),NIL); + default: return cons(siod_get_lval("token.unknown_word_name",NULL), + NIL); + } + else if (iword < 100) + { + tens = iword / 10; + units = iword % 10; + switch (tens) + { + case 2: s_tens = strintern("twenty"); break; + case 3: s_tens = strintern("thirty"); break; + case 4: s_tens = strintern("forty"); break; + case 5: s_tens = strintern("fifty"); break; + case 6: s_tens = strintern("sixty"); break; + case 7: s_tens = strintern("seventy"); break; + case 8: s_tens = strintern("eighty"); break; + case 9: s_tens = strintern("ninety"); break; + default: return cons(siod_get_lval("token.unknown_word_name",NULL), + NIL); + } + if (units != 0) + return cons(s_tens,num_2_words(units)); + else + return cons(s_tens,NIL); + } + else if (iword < 1000) + { + lang_stype = ft_get_param("Language"); + if (streq("americanenglish",get_c_string(lang_stype))) + return append(num_2_words(iword/100), + cons(strintern("hundred"), + (((iword % 100) != 0) ? + num_2_words(iword % 100) : + NIL))); + else + return append(num_2_words(iword/100), + cons(strintern("hundred"), + (((iword % 100) != 0) ? + cons(strintern("and"), + num_2_words(iword % 100)) : + NIL))); + } +#if 0 + // We don't depend on this hack now. + else if ((iword > 1910) && + (iword < 2000)) // hacky date condition + return append(num_2_words(iword/100), + num_2_words(iword%100)); +#endif + else if (iword < 1000000) + return append(num_2_words(iword/1000), + cons(strintern("thousand"), + (((iword % 1000) != 0) ? + ((((iword % 1000)/100) == 0) ? + cons(strintern("and"),num_2_words(iword % 1000)): + num_2_words(iword % 1000)) : + NIL))); + else if (iword >= 1000000) + return append(num_2_words(iword/1000000), + cons(strintern("million"), + (((iword % 1000000) != 0) ? + num_2_words(iword % 1000000) : + NIL))); + else + return cons(strintern("bignum"),NIL); +} + +static LISP l_word_it(LISP token, LISP tok) +{ + // Lisp wrap around for word_it + EST_Item *t = get_c_item(token); + EST_String tok_name = get_c_string(tok); + + return builtin_word_it(t,tok_name); +} + +void festival_token_init(void) +{ + festival_def_utt_module("Token_English",FT_English_Token_Utt, + "(Token_English UTT)\n\ + Build a Word stream from the Token stream, for English (American and\n\ + British English), analyzing compound words, numbers, etc. as tokens\n\ + into words."); + festival_def_utt_module("Token_Welsh",FT_Welsh_Token_Utt, + "(Token_Welsh UTT)\n\ + Build a Word stream from the Token stream, for Welsh, analyzing\n\ + compound words, numbers etc as tokens into words."); + festival_def_utt_module("Token_Spanish",FT_Spanish_Token_Utt, + "(Token_Spanish UTT)\n\ + Build a Word stream from the Token stream, for Castillian Spanish,\n\ + analyzing compound words, numbers etc as tokens into words."); + festival_def_utt_module("Token_Any",FT_Any_Token_Utt, + "(Token_Any UTT)\n\ + Build a Word stream from the Token stream, in a language independent way,\n\ + which means that all simple tokens should be in the lexicon, or analysed\n\ + by letter to sound rules."); + festival_def_utt_module("Token_POS",FT_Token_POS_Utt, + "(Token_POS UTT)\n\ + Assign feature token_pos to tokens thats match CART trees in the\n\ + variable token_pos_cart_trees. These are used for gross level pos\n\ + such as identifying how numbers should be pronunced."); + init_subr_2("builtin_english_token_to_words",l_word_it, + "(english_token_to_words TOKENSTREAM TOKENNAME)\n\ + Returns a list of words expanded from TOKENNAME. Note that as this\n\ + function may be called recursively TOKENNAME may not be the name of\n\ + TOKENSTREAM."); +} diff --git a/src/modules/Text/tokenP.h b/src/modules/Text/tokenP.h new file mode 100644 index 0000000..b0f997f --- /dev/null +++ b/src/modules/Text/tokenP.h @@ -0,0 +1,45 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : March 1997 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Private declarations for tokens */ +/* */ +/*=======================================================================*/ +#ifndef __TOKENP_H__ +#define __TOKENP_H__ + +LISP FT_Token_POS_Utt(LISP utt); + +#endif diff --git a/src/modules/Text/token_pos.cc b/src/modules/Text/token_pos.cc new file mode 100644 index 0000000..52dad37 --- /dev/null +++ b/src/modules/Text/token_pos.cc @@ -0,0 +1,73 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : March 1997 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* This assigns a form of part of speech to tokens. It is typically use */ +/* to assign gross level pos for things before they are converted to */ +/* words, particularly, numbers (ordinals, digitized, years, numbers) */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "tokenP.h" + +LISP FT_Token_POS_Utt(LISP utt) +{ + // This module assigns token_pos feature to each token based on the + // Assoc list of Regex to CART trees in token_pos_cart_trees. + EST_Utterance *u = get_c_utt(utt); + EST_Item *t; + LISP trees,l; + + trees = siod_get_lval("token_pos_cart_trees",NULL); + if (trees == NIL) return utt; + + for (t=u->relation("Token")->first(); t != 0; t = t->next()) + { + if (t->f("token_pos","0") == "0") + for (l=trees; l != NIL; l=cdr(l)) // find a tree that matches + { + if (t->name().matches(make_regex(get_c_string(car(car(l)))))) + { + t->set_val("token_pos", + wagon_predict(t,car(cdr(car(l))))); + break; + } + } + } + + return utt; + +} diff --git a/src/modules/Text/xxml.cc b/src/modules/Text/xxml.cc new file mode 100644 index 0000000..0fa9470 --- /dev/null +++ b/src/modules/Text/xxml.cc @@ -0,0 +1,293 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : August 1997 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* There are just too many different versions of sgml-based mark up */ +/* and none of them are stable so this is allows arbitrary */ +/* of tags to Lisp functions so any of them can be implemented in Lisp */ +/* That is people can worry about the actual content later and do not */ +/* need to change the C++ */ +/* */ +/* Of course once I give you this functionality you'll just want more */ +/* */ +/*=======================================================================*/ +#include "EST_unix.h" +#include "festival.h" +#include "text.h" +#include "lexicon.h" + +static LISP xxml_get_attribute(const EST_String &remainder); +static char *xxml_process_line(const char *line); +static void tts_xxml_token(EST_Item *t); +static void tts_xxml_utt(LISP lutt); + +static LISP xxml_word_features = NIL; +static LISP xxml_token_hooks = NIL; + +void tts_file_xxml(LISP filename) +{ + // For stml ssml rsml and jsml etc + // filename contains *output* from something like nsgml + EST_String inname = get_c_string(filename); + EST_String line, type, remainder; + EST_TokenStream ts; + LISP atts, element_defs; + LISP utt = NIL; // for cummulation of tokens + + if (ts.open(inname) == -1) + { + cerr << "xxml: unable to open output from SGML parser" << endl; + festival_error(); + } + ts.set_WhiteSpaceChars(" \t\r\n"); + ts.set_SingleCharSymbols(""); + ts.set_PunctuationSymbols(""); + ts.set_PrePunctuationSymbols(""); + + element_defs = siod_get_lval("xxml_elements",NULL); + atts = NIL; + + if (ts.peek() != get_c_string(car(car(element_defs)))) + { + cerr << "xxml parse error: " << get_c_string(filename) << + " Expected " << get_c_string(car(car(element_defs))) + << " but found " << ts.peek() << endl; + festival_error(); + } + while (ts.peek() != get_c_string(car(car(cdr(element_defs))))) + { + if (ts.eof()) + { + cerr << "xxml parse error: unexpected end of file \n"; + festival_error(); + } + line = (EST_String)ts.get_upto_eoln(); + type = line.at(0,1); + remainder = line.after(0); + if (type == "-") + { // Segments into utterances as it goes along + utt = xxml_get_tokens(remainder, + siod_get_lval("xxml_word_features",NULL), + utt); + } + else if (type == "A") // general attribute + { + atts = cons(xxml_get_attribute(remainder),atts); + } + else if ((type == "(") || (type == ")")) + { + utt = xxml_call_element_function(type+remainder,atts, + element_defs,utt); + atts = NIL; + } + else + { + cerr << "xxml parse error: unexpected token found " + << line << endl; + festival_error(); + } + } + // Last call (should synthesize trailing tokens + utt = xxml_call_element_function(ts.get().string(),atts,element_defs,utt); + + ts.close(); +} + +LISP xxml_call_element_function(const EST_String &element, + LISP atts, LISP elements, LISP utt) +{ + // Form the call to the defined element function, with the attributes + // and the utterance, returns the utterance + LISP def,l; + + def = siod_assoc_str(element,elements); + + if (def != NIL) + { + // You get two arguments, ATTLIST and UTTERANCE + l = cons( + make_param_lisp("ATTLIST", + cons(rintern("quote"),cons(atts,NIL))), + cons( + make_param_lisp("UTT", + cons(rintern("quote"),cons(utt,NIL))), + NIL)); + return leval(cons(rintern("let"), + cons(l,cdr(cdr(def)))),NIL); + } + else // no definition to do nothing + return utt; +} + +static LISP xxml_get_attribute(const EST_String &remainder) +{ + EST_TokenStream ts; + LISP tokens=NIL,att=NIL; + EST_String name; + EST_Token t; + + ts.open_string(remainder); + name = (EST_String)ts.get(); + if ((t=ts.get()) == "IMPLIED") + att = cons(rintern(name),cons(NIL,NIL)); + else if (t == "TOKEN") + { + EST_Token v = ts.get(); + att = cons(rintern(name),cons(cons(rintern(v.string()),NIL),NIL)); + } + else if (t == "CDATA") + { + while (!ts.eof()) + tokens = cons(rintern(ts.get().string()),tokens); + att = cons(rintern(name),cons(reverse(tokens),NIL)); + } + else + { + cerr << "XXML: unknow attribute type " << remainder << endl; + festival_error(); + } + + ts.close(); + return att; +} + +static char *xxml_process_line(const char *line) +{ + // STML (sgml) data line have a number of special escape characters + // this undoes them, namely "\\n" to "\n" + char *procline = walloc(char,strlen(line)+1); + int i,j; + + for (i=j=0; line[i] != '\0'; j++,i++) + { + if (line[i] == '\\') + { + i++; + if (line[i] == 'n') + procline[j] = '\n'; + else if (line[i] == '\\') + procline[j] = '\\'; + else if ((line[i] == '0') || // its an octal number + (line[i] == '1')) + { + int k,oct = 0; + for (k=0; k < 3; k++,i++) + oct = (oct*8)+(line[i]-'0'); + procline[j] = oct; + i--; + } + else + { + procline[j] = line[i]; // no change + i--; + } + } + else + procline[j] = line[i]; // no change + } + procline[j] = '\0'; + return procline; +} + +static void tts_xxml_token(EST_Item *t) +{ + // Add xxml_word features to t + LISP a; + + for (a=xxml_word_features; a != NIL; a=cdr(a)) + if ((car(cdr(car(a))) != NIL) && + (!streq("NAME",get_c_string(car(car(a)))))) + { + if (cdr(cdr(car(a))) == NIL) + t->set(get_c_string(car(car(a))), + get_c_string(car(cdr(car(a))))); + else + { + // Its more complex than a single atom so save the list + t->set(get_c_string(car(car(a))), + siod_sprint(car(cdr(car(a))))); + } + } + + apply_hooks(xxml_token_hooks,siod(t)); +} + +LISP xxml_get_tokens(const EST_String &line,LISP feats,LISP utt) +{ + // Read from here until end of line collects all the tokens + // Note tokens are in reverse order until they are made into an + // utterance + EST_TokenStream ls; + EST_Token t; + LISP eou_tree; + char *processed_line; + processed_line = xxml_process_line(line); + ls.open_string(processed_line); + ls.set_SingleCharSymbols( + get_c_string(siod_get_lval("token.singlecharsymbols", + "token.singlecharsymbols unset"))); + ls.set_PunctuationSymbols( + get_c_string(siod_get_lval("token.punctuation", + "token.punctuation unset"))); + ls.set_PrePunctuationSymbols( + get_c_string(siod_get_lval("token.prepunctuation", + "token.prepunctuation unset"))); + ls.set_WhiteSpaceChars( + get_c_string(siod_get_lval("token.whitespace", + "token.whitespace unset"))); + + eou_tree = siod_get_lval("eou_tree","No end of utterance tree set"); + + xxml_word_features = feats; + xxml_token_hooks = siod_get_lval("xxml_token_hooks",NULL); + + // Segment and synth as much as appropriate + utt = tts_chunk_stream(ls,tts_xxml_token,tts_xxml_utt,eou_tree,utt); + + return utt; +} + +static void tts_xxml_utt(LISP lutt) +{ + // Build and utterance with these tokens and apply xxml synth function + + if ((lutt == NIL) || + (get_c_utt(lutt)->relation("Token")->length() == 0)) + return; // in this case do nothing. + + leval(cons(rintern("xxml_synth"), + cons(quote(lutt),NIL)),NIL); +} + diff --git a/src/modules/UniSyn/Makefile b/src/modules/UniSyn/Makefile new file mode 100644 index 0000000..7074365 --- /dev/null +++ b/src/modules/UniSyn/Makefile @@ -0,0 +1,54 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### + +TOP=../../.. +DIRNAME=src/modules/UniSyn +H = UniSyn.h us_features.h us_synthesis.h + +TSRCS = +CPPSRCS = UniSyn.cc us_prosody.cc us_unit.cc ps_synthesis.cc \ + us_mapping.cc us_features.cc $(TSRCS) +SRCS = $(CPPSRCS) +OBJS = $(CPPSRCS:.cc=.o) + +FILES = Makefile $(SRCS) $(H) + +INLIB = $(TOP)/src/lib/libFestival.a + +ALL = .buildlib + +include $(TOP)/config/common_make_rules + +UniSyn.o: + $(CXX_COMMAND) $(TDPSOLA_FLAGS) UniSyn.cc -o UniSyn.o + diff --git a/src/modules/UniSyn/UniSyn.cc b/src/modules/UniSyn/UniSyn.cc new file mode 100644 index 0000000..b52b6fd --- /dev/null +++ b/src/modules/UniSyn/UniSyn.cc @@ -0,0 +1,379 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Paul Taylor */ +/* Date: February 1998 */ +/* --------------------------------------------------------------------- */ +/* Waveform Generation Scheme Interface File */ +/* */ +/*************************************************************************/ +#include "siod.h" +#include "EST.h" +#include "UniSyn.h" +#include "us_synthesis.h" +#include "Phone.h" + +VAL_REGISTER_TYPE(ivector,EST_IVector) +VAL_REGISTER_TYPE(wavevector,EST_WaveVector); + +SIOD_REGISTER_TYPE(wavevector, EST_WaveVector); + +void map_to_relation(EST_IVector &map, EST_Relation &r, + const EST_Track &source_pm, + const EST_Track &target_pm); + +EST_Features *scheme_param(const EST_String& param, const EST_String &path) +{ + EST_Features *f, *p; + + f = feats(siod_get_lval(param, "Couldn't find scheme paramete named: " + + param)); + + p = (path == "") ? f : &f->A(path); + return p; +} + + +LISP FT_us_linear_smooth_amplitude( LISP lutt ) +{ + EST_Utterance *utt = get_c_utt( lutt ); + + us_linear_smooth_amplitude( utt ); + + return lutt; +} + + +static LISP FT_wavevector_get_wave( LISP l_wavevector, LISP l_framenum ) +{ + EST_WaveVector *wv = wavevector( l_wavevector ); + int i = get_c_int( l_framenum ); + + if( i<0 || i>wv->length() ) + EST_error( "index out of bounds" ); + + return siod( &((*wv)[i]) ); +} + + +LISP FT_us_unit_concat(LISP lutt) +{ + EST_String window_name; + float window_factor; + bool window_symmetric; + + EST_Features *f = scheme_param("Param", "unisyn"); + + window_name = f->S("window_name"); + window_factor = f->F("window_factor"); + + window_symmetric = (f->I("window_symmetric",1) == 0) ? false : true; + + us_unit_concat(*get_c_utt(lutt), window_factor, window_name, false, window_symmetric); + return lutt; +} + +LISP FT_us_unit_raw_concat(LISP lutt) +{ + us_unit_raw_concat(*get_c_utt(lutt)); + return lutt; +} + + +LISP FT_us_energy_normalise(LISP lutt, LISP lrname) +{ + EST_Utterance *utt = get_c_utt(lutt); + EST_String rname = get_c_string(lrname); + + us_energy_normalise(*utt->relation(rname)); + return lutt; +} + +LISP FT_us_generate_wave(LISP lutt, LISP l_f_method, LISP l_o_method) +{ + EST_String filter_method = get_c_string(l_f_method); + EST_String ola_method = get_c_string(l_o_method); + EST_Utterance *utt = get_c_utt(lutt); + + EST_Features *f = scheme_param("Param", "unisyn"); + if(f->I("window_symmetric",1) == 0){ + ola_method = "asymmetric_window"; + } + us_generate_wave(*utt, filter_method, ola_method); + + return lutt; +} + +LISP FT_us_mapping(LISP lutt, LISP method) +{ + us_mapping(*get_c_utt(lutt), get_c_string(method)); + return lutt; +} + +LISP FT_us_get_copy_wave(LISP lutt, LISP l_sig_file, LISP l_pm_file, + LISP l_seg_file) +{ + EST_Utterance *utt = get_c_utt(lutt); + EST_Relation seg; + EST_String sig_file = get_c_string(l_sig_file); + EST_String seg_file = get_c_string(l_seg_file); + EST_String pm_file = get_c_string(l_pm_file); + + EST_Track *pm = new EST_Track; + EST_Wave *sig = new EST_Wave; + + if (pm->load(pm_file) != format_ok) + return NIL; + + if (sig->load(sig_file) != format_ok) + return NIL; + + if (seg.load(seg_file) != format_ok) + return NIL; + + if (!ph_is_silence(seg.tail()->f("name"))) + { + EST_Item *n = seg.tail()->insert_after(); + n->set("name", ph_silence()); + n->set("end", seg.tail()->prev()->F("end") + 0.1); + } + + us_get_copy_wave(*utt, *sig, *pm, seg); + return lutt; +} + + +LISP FT_f0_to_pitchmarks(LISP lutt, LISP l_f0_name, LISP l_pm_name, + LISP l_end_time) +{ + EST_Utterance *utt = get_c_utt(lutt); + int num_channels=0; + const float default_f0 = 100.0; + EST_Relation *f0_rel=0, *pm_rel=0; + EST_Track *f0=0, *pm=0; + EST_Item *a; + + float end_time = (l_end_time == NIL) ? -1 : get_c_float(l_end_time); + + f0_rel = utt->relation(get_c_string(l_f0_name), 1); + pm_rel = utt->create_relation(get_c_string(l_pm_name)); + + f0 = track(f0_rel->head()->f("f0")); + pm = new EST_Track; + + a = pm_rel->append(); + a->set_val("coefs", est_val(pm)); + a = pm_rel->append(); + + if (utt->relation_present("SourceCoef")) + { + EST_Track *source_coef = + track(utt->relation("SourceCoef")->head()->f("coefs")); + num_channels = source_coef->num_channels(); + } + + f0_to_pitchmarks(*f0, *pm, num_channels, default_f0, end_time); + + return lutt; +} + +LISP FT_map_to_relation(LISP lutt, LISP lsource_name, LISP ltarget_name, + LISP lrel_name) +{ + EST_Utterance *utt = get_c_utt(lutt); + EST_Track *source_pm = 0; + EST_Track *target_pm = 0; + EST_IVector *map = 0; + target_pm = + track(utt->relation(get_c_string(ltarget_name))->head()->f("coefs")); + source_pm = + track(utt->relation(get_c_string(lsource_name))->head()->f("coefs")); + map = ivector(utt->relation("US_map")->head()->f("map")); + + utt->create_relation(get_c_string(lrel_name)); + + map_to_relation(*map, *utt->relation(get_c_string(lrel_name)), + *source_pm, *target_pm); + + return NIL; +} + +void festival_UniSyn_init(void) +{ + proclaim_module("UniSyn"); + + register_unisyn_features(); + + init_subr_2( "wavevector.getwave", FT_wavevector_get_wave, + "(wavevector.getwave WAVEVECTOR FRAMENUM)\n\ + retrieves an EST_Wave frame (int FRAMENUM) from a wavevector."); + + init_subr_1("us_linear_smooth_amplitude", FT_us_linear_smooth_amplitude, + "(us_linear_smooth_amplitude UTT)\n\ + Perform linear amplitute smoothing on diphone joins."); + + init_subr_1("us_unit_raw_concat", FT_us_unit_raw_concat, + "(us_init_raw_concat UTT)."); + + init_subr_2("us_energy_normalise", FT_us_energy_normalise, + "(us_ps_synthesis UTT SIGPR)\n\ + Synthesize utterance UTT using signal processing technique SIGPR \n\ + for the UniSyn pitch-synchronous synthesizer."); + + init_subr_3("us_generate_wave", FT_us_generate_wave, + "(us_td_synthesis UTT FILTER_METHOD OLA_METHOD)\n\ + Synthesize utterance UTT using signal processing technique SIGPR \n\ + for the UniSyn pitch-synchronous synthesizer."); + + init_subr_2("us_mapping", FT_us_mapping, + "(us_mapping UTT method)\n\ + Synthesize utterance UTT using signal processing technique SIGPR \n\ + for the UniSyn pitch-synchronous synthesizer."); + + init_subr_1("us_unit_concat", FT_us_unit_concat, + "(us_unit_concat UTT)\n\ + Concat coef and wave information in unit stream into a single \n\ + Frames structure storing the result in the Frame relation"); + + init_subr_4("us_f0_to_pitchmarks", FT_f0_to_pitchmarks, + "(us_f0_to_pitchmarks UTT F0_relation PM_relation END_TIME)\n\ + From the F0 contour in F0_relation, create a set of pitchmarks\n\ + in PM_relation. If END_TIME is not nil, Extra pitchmarks will be \n\ + created at the default interval up to this point"); + + init_subr_4("map_to_relation", FT_map_to_relation, + "(map_to_relation UTT Source_relation Target_relation new_relation)\n\ + From the F0 contour in F0_relation, create a set of pitchmarks\n\ + in PM_relation. If END_TIME is not nil, Extra pitchmarks will be \n\ + created at the default interval up to this point"); + + init_subr_4("us_get_copy_wave", FT_us_get_copy_wave, + "(warp_utterance UTT (Wavefile Pitchmark_file))\n\ + Change waveform to match prosodic specification of utterance."); + + +#ifdef HAVE_US_TDPSOLA_TM + us_init_tdpsola(); +#endif + +} + +/* + + init_subr_2("us_F0targets_to_pitchmarks", FT_us_F0targets_to_pitchmarks, + "(us_F0targets_to_pitchmarks UTT Segment_Relation)\n\ + Make set of pitchmarks according to F0 target specification"); + +LISP FT_merge_pitchmarks(LISP lutt, LISP l_pm1, LISP l_pm2, + LISP l_guide_name) +{ + EST_Utterance *utt = get_c_utt(lutt); + + EST_Track *pm1 = + track(utt->relation(get_c_string(l_pm1), 1)->head()->f("coefs", 1)); + EST_Track *pm2 = + track(utt->relation(get_c_string(l_pm2), 1)->head()->f("coefs", 1)); + + EST_Relation *guide = utt->relation(get_c_string(l_guide_name), 1); + + EST_Relation *pm_rel = utt->create_relation("TargetCoefs"); + + EST_Track *target_pm = new EST_Track; + + EST_Item *a = pm_rel->append(); + a->fset_val("coefs", est_val(target_pm)); + + merge_pitchmarks(*get_c_utt(lutt), *pm1, *pm2, *target_pm, *guide); + + return lutt; +} +LISP FT_warp_pitchmarks(LISP lutt, LISP l_pm_file, LISP l_seg_file) +{ + EST_Utterance *utt = get_c_utt(lutt); + + EST_String pm_file = get_c_string(l_pm_file); + EST_String seg_file = get_c_string(l_seg_file); + + EST_Track *pm = new EST_Track; + EST_Relation seg; + + if (pm->load(pm_file) != format_ok) + return NIL; + + if (seg.load(seg_file) != format_ok) + return NIL; + + warp_pitchmarks(*utt, pm, seg, *utt->relation("Segment")); + + return lutt; +} + + init_subr_3("us_warp_pitchmarks", FT_warp_pitchmarks, + "(warp_utterance UTT (Wavefile Pitchmark_file))\n\ + Change waveform to match prosodic specification of utterance."); + +LISP FT_us_load_utt_segments(LISP l_utt, LISP l_filename) +{ + EST_String filename = get_c_string(l_filename); + EST_Utterance tu; + EST_Utterance *u = get_c_utt(l_utt); + EST_Item *s, *t; + + if (tu.load(filename) != format_ok) + festival_error(); + + u->relation("Segment")->clear(); + + for (s = tu.relation("Segment")->head(); s; s = s->next()) + { + t = u->relation("Segment")->append(); + t->fset("name", s->fS("name")); + t->fset("end", s->fS("end")); + } + + return l_utt; +} + +void us_F0targets_to_pitchmarks(EST_Utterance &utt, + const EST_String &seg_relation); + +LISP FT_us_F0targets_to_pitchmarks(LISP lutt, LISP lseg) +{ + EST_String s = (lseg == NIL) ? "" : get_c_string(lseg); + us_F0targets_to_pitchmarks(*get_c_utt(lutt), s); + + return lutt; +} + + +*/ diff --git a/src/modules/UniSyn/UniSyn.h b/src/modules/UniSyn/UniSyn.h new file mode 100644 index 0000000..77cdc2a --- /dev/null +++ b/src/modules/UniSyn/UniSyn.h @@ -0,0 +1,403 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Paul Taylor */ +/* Date: March 1998 */ +/* --------------------------------------------------------------------- */ +/* */ +/*************************************************************************/ + + +#ifndef __UNISYN_H__ +#define __UNISYN_H__ + +#include "festival.h" + + +typedef EST_TVector EST_WaveVector; + +#ifdef HAVE_US_TDPSOLA_TM +void us_init_tdpsola(); +void us_tdpsola_synthesis(EST_Utterance &utt, + const EST_String &ola_method); +#endif + + +void us_linear_smooth_amplitude( EST_Utterance *utt ); + +/**@name Functions for Concatenating Units + +*/ + +//@{ + +/** Iterate through the Unit relation and create the + SourceCeof relation, which contains a series of + windowed frames of speech and a track of pitch-synchronous + coefficients. + + SourceCoef contains a single item + with two features, coefs and + frame + + coefs'value is a track with all the + concatenated pitchmarks and coefficients from the units. + + us_unit_concat is where the pitch synchronous + windowing of the frames in each Unit is performed and the result + of this is stored as the value of frame + + Require:Unit + + Provide:SourceCoef + + + + @param utt: utterance + + @param window_factor: This specifies + how large the analysis window is in relation to the local pitch + period. A value of 1.0 is often used as this means each frame + approximately extends from the previous pitch mark to the next. + + @param window_name: This specifies + the type of window used. "hanning" is standard but any window type + available from the signal processing library can be used. + + @param window_symmetric: if this is set to true, then symmetric + analysis windows are used centred at each pitch mark, with size + determined by the time difference between current and previous + pitchmarks. + + @param no_waveform: if this is set to true, only the coefficients + are copied into SourceCoef - no waveform analysis is performed. +*/ + +void us_unit_concat(EST_Utterance &utt, float window_factor, + const EST_String &window_name, + bool no_waveform=false, + bool window_symmetric=true); + + +/** This function provides the setup for copy resynthesis. In copy +resynthesis, a natural waveform is used as the source speech for +synthesis rather than diphones or other concatenated units. This is +often useful for testing a prosody module or for altering the pitch or +duration of a natural waveform for an experiment. (As such, this +function should really be thought of as a very simple unit selection +module) + + In addition to the speech waveform itself, the function +requires a set of pitchmarks in the standard form, and a set of labels +which mark segment boundaries. The Segment +relation must already exist in the utterance prior to calling this +function. + +First, the function creates aUnit relation with +a single item containing the waveform and the pitchmarks. Next it adds +a set of source_end features to each item in +the Segment relation. It does this by +calculating a mapping between the Segment +relation and the input labels. This mapping is performed by dynamic +programming, as often the two sets of labels don't match exactly. + + +The final result, therefore is a Unit relation and Segment relation +with source_end features. As this is exactly the same output of the +standard concantenative synthesis modules, from here on the utterance +can be processed as if the units were from a genuine synthesizer. + + +Copy synthesis itself can be performed by .... + + + Require:Segment + + Provide:Unit + + + + @param utt: utterance + + @param source_sig: waveform + @param source_pm: pitchmarks belonging to waveform + @param source_seg: set of items with end times referring to points + in the waveform + +*/ + +void us_get_copy_wave(EST_Utterance &utt, EST_Wave &source_sig, + EST_Track &source_pm, EST_Relation &source_seg); + +/** This function produces a waveform from the Unit relation without +prosodic modification. In effect, this function simply concatenates +the waveform parts of the units in the unit relation. An overlap add +operation is performed at unit boundaries so that waveform +discontinuities don't occur. + + +*/ +void us_unit_raw_concat(EST_Utterance &utt); + +/** Items in the Unit relation can take an optional +flagenergy_factor, which scales the amplitude +of the unit waveform. This is useful because units often have +different energy levels due to different recording circumstances. An +energy_factor of 1.0 leaves the waveform +unchanged. + +*/ + +void us_energy_normalise(EST_Relation &unit); + +//@} + + +/**@name Functions for Producing Mappings + +*/ + +//@{ + +/** This function produces the mapping between the SourceCoef track +and TargetCoef track. The mapping is controlled by two types of +modification, duration and pitch. + +Duration is specified by the Segment +relation. Each item in this relation has two features +source_end and +target_end.source_end is +marks the end point of that segment in the concatenated set of source +coefficients, while target_end marks the +desired end of that segment. + + Pitch modification is specified by the patterns of pitchmarks +in the SourceCoef track and +TargetCoef track. While these tracks actually +represent periods, their reciprocal represents the source and target +F0 contours. + + +The mapping is an integer array with one element for every pitchmark in +the TargetCoef track. Therefore, every target pitchmark has a mapping +element, and the value of that element is the frame number in the +SourceCoef track which should be used to generate the frame of speech +for that target pitchmark. Depending on the mapping, source frames can +be duplicated or skipped. + + If the duration is constant, a higher target pitch will +mean source frames are duplicated. If the pitch is constant, a longer +target duration will also mean source frames are duplicated. The +duration and pitch modifications are calculated at the same time, +leading to a single mapping. + +Require:SourceCoef, TargetCoef, Segment + + + Provide:US_Map + +*/ + + +void us_mapping(EST_Utterance &utt, const EST_String &method); + + +// for graphical display only: +void map_to_relation(EST_IVector &map, EST_Relation &r, + const EST_Track &source_pm, + const EST_Track &target_pm); +//@} + +/**@name Functions for Generating Waveforms + +*/ + +//@{ + +/** Standard waveform generation function. This function genrates the +actual synthetic speech waveform, using information in the SourceCoef, +TargetCoef and US_map relations. + + +The first stage involves time domain processing, whereby a speech +waveform or residual waveform is generated. The second (optional) +stage passes this waveform through the set of filter coefficients +specified in the TargetCoef track. The output synthetic waveform is +put in the Wave relation. + + +LPC resynthesis is performed by the lpc_filter_1 function. + + + + Require:SourceCoef, TargetCoef, + US_map + Provide:Wave + + + @param utt: utterance + @param filter_method: type of filter used - normally "lpc" or none ("") + @param td_method: type of time domain synthesis. +*/ + +void us_generate_wave(EST_Utterance &utt, + const EST_String &filter_method, + const EST_String &ola_method); + +/** This copies coefficients from source_coef +into target_coef according to the frame mapping +specified by +map. target_coef should +already have been allocated, and the pitchmarks in the time array set +to appropriate values. (this can be done by the f0_to_pitchmarks function). + +*/ + +void map_coefs(EST_Track &source_coef, EST_Track &target_coef, + EST_IVector &map); + +/** Time domain resynthesis. + +Generate a speech waveform by copying frames into a set of time +positions given by target_pm. The frame used for each time position is +given by map, and the frames themselves are stored individually as +waveforms in frames. + + +@param target_sig: output waveform +@param target_pm: new pitchmark positions +@param frames: array containing waveforms, each representing a single analysis + frame +@param map: mapping between target_pm and frames. + +*/ + +void td_synthesis(EST_WaveVector &frames, + EST_Track &target_pm, EST_Wave &target_sig, + EST_IVector &map); + + +/** Variant of td_synthesis, where each frame is re-windowed according to the +size of the local synthesis pitch period. + +@param target_sig: output waveform +@param target_pm: new pitchmark positions +@param frames: array containing waveforms, each representing a single analysis + frame +@param map: mapping between target_pm and frames. + +*/ + +void td_synthesis2(EST_WaveVector &frames, + EST_Track &target_pm, EST_Wave &target_sig, + EST_IVector &map); + +//@} + + +void asymmetric_window_td_synthesis(EST_WaveVector &frames, + EST_Track &target_pm, + EST_Wave &target_sig, + EST_IVector &map, + EST_IVector &frame_pm_indices); + + +/**@name Pitchmark Functions + +*/ +//@{ + +/** This function generates the target pitchmarks from the target F0 +contour. The pitchmarks are generated by reading a value, \(f_{0}\) +off the f0 contour at time \(t\), calculating the local pitch period +\(\tau = 1/f_{0}\), and placing a pitchmark at time \(T + t\). The +process is then repeated by reading the F0 value at this new point and +so on. + + The F0 contour must be continuous in all regions, that is +unvoiced regions must have pseudo f0 values also. Although artificial +contours are best generated in this way to begin with, the function +\ref{**} can be used to interpolate through unvoiced regions for +non-continuous contours. + + + As the last F0 value in the contour may not be the end of the +utterance (for example if the last phone is unvoiced), the pitchmarks may be extended past the end of the contour. + + + +After processing, the generated track only contains the target +pitchmarks, but later functions may fill the amplitude array of the +track with target coefficients, and hence the space for these can be +allocated at this stage. + + +@param fz: input F0 contour. + +@param pm: set of pitchmarks to be generated. These are set to the +correct size in the function. + +@param num_channels: (optional) number of coefficients used in further +processing. + +@param default_f0: (optional) f0 value for interpolated end values + +@param target_end: (optional) fill from the end of the contour to this +point with default f0 values. + +*/ +void f0_to_pitchmarks(EST_Track &fz, EST_Track &pm, int num_channels=0, + float default_f0=100.0, float target_end=-1); + + + +/** This is a utility function for converting a set of pitchmarks back +to an F0 contour and is usually used in system development etc. The +generated F0 is evenly spaced. + +@param pm: input set of pitchmarks to be generated + +@param fz: otuput F0 contour. + +@param shift: frame shift of generated contour in seconds. +*/ + +void pitchmarks_to_f0(EST_Track &pm, EST_Track &fz, float shift); + +//@} + +void register_unisyn_features(void); + +#endif // __UNISYN_H__ + diff --git a/src/modules/UniSyn/ps_synthesis.cc b/src/modules/UniSyn/ps_synthesis.cc new file mode 100644 index 0000000..b3db579 --- /dev/null +++ b/src/modules/UniSyn/ps_synthesis.cc @@ -0,0 +1,302 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Paul Taylor */ +/* Date: 6 Jan 1998 */ +/* --------------------------------------------------------------------- */ +/* LPC residual synthesis alternative version */ +/* */ +/*************************************************************************/ + +#include "us_synthesis.h" +#include "UniSyn.h" +#include "siod.h" +#include "EST_sigpr.h" +#include "EST_error.h" +#include +#include "EST_inline_utils.h" + +void us_generate_wave(EST_Utterance &utt, + const EST_String &filter_method, + const EST_String &ola_method) +{ + EST_IVector *map, *frame_pm_indices; + EST_WaveVector *frames; + EST_Track *source_coef, *target_coef; + EST_Wave *sig; + EST_FVector gain; + + frames = wavevector(utt.relation("SourceCoef", 1)->head()->f("frame")); + + source_coef = track(utt.relation("SourceCoef", 1)->head()->f("coefs")); + target_coef = track(utt.relation("TargetCoef", 1)->head()->f("coefs")); + + map = ivector(utt.relation("US_map", 1)->head()->f("map")); + + sig = new EST_Wave; + + if ( ola_method == "asymmetric_window" ){ + frame_pm_indices = ivector(utt.relation("SourceCoef", 1)->head()->f("pm_indices")); + asymmetric_window_td_synthesis( *frames, *target_coef, *sig, *map, *frame_pm_indices ); + } + else if ( ola_method == "synth_period" ) + td_synthesis2(*frames, *target_coef, *sig, *map); + else + td_synthesis(*frames, *target_coef, *sig, *map); + + if (filter_method == "lpc") + { + map_coefs(*source_coef, *target_coef, *map); + // fast version + lpc_filter_fast(*target_coef, *sig, *sig); + // slower version (but cleaner) + //lpc_filter_1(*target_coef, *sig, *sig); + } + + add_wave_to_utterance(utt, *sig, "Wave"); +} + + +void map_coefs(EST_Track &source_coef, EST_Track &target_coef, + EST_IVector &map) +{ + int i, j; + int m; + + if (source_coef.num_channels() != target_coef.num_channels()) + EST_error("Different numbers of channels in LPC resynthesis: " + "source %d, target %d\n", source_coef.num_channels(), + target_coef.num_channels()); + + if (map.n() > target_coef.num_frames()) + m = target_coef.num_frames(); + else + m = map.n(); + + for (i = 0; i < m; ++i) + for (j = 0; j < target_coef.num_channels(); ++j) + target_coef.a_no_check(i, j) = + source_coef.a_no_check(map.a_no_check(i), j); + // There can be one or two frames at the end of target_coed without + // a map. Here we zero them, blindly assuming they are in silence + for ( ; i < target_coef.num_frames(); i++) + for (j = 0; j < target_coef.num_channels(); ++j) + target_coef.a_no_check(i, j) = 0; +} + +void td_synthesis(EST_WaveVector &frames, + EST_Track &target_pm, EST_Wave &target_sig, + EST_IVector &map) +{ + int t_start; + int i, j; + float sr; + int last_sample=0; + int map_n = map.n(); + + if( (frames.length()>0) && (map_n>0) ){ + sr = (float)frames(0).sample_rate(); + last_sample = (int)(rint(target_pm.end()*sr)) + + ((frames(frames.length()-1).num_samples() - 1)/2);//window_signal guarantees odd + target_sig.resize(last_sample+1); + target_sig.fill(0); + target_sig.set_sample_rate((int)sr); + + for( i=0; i=0 ) + target_sig.a_no_check(j + t_start) += frame.a_no_check(j); + } + } +} + +void asymmetric_window_td_synthesis(EST_WaveVector &frames, + EST_Track &target_pm, + EST_Wave &target_sig, + EST_IVector &map, + EST_IVector &frame_pm_indices) +{ + int t_start; + int i, j; + float sr; + int last_sample=0; + int map_n = map.n(); + + + +#if defined(EST_DEBUGGING) + cerr << "(maplength framelength pm_indiceslength) " + << map_n << " " + << frames.n() << " " + << frame_pm_indices.n() << endl; +#endif + + + if( (frames.length()>0) && (map_n>0) ){ + sr = (float)frames(0).sample_rate(); + + last_sample = (int)(rint(target_pm.end()*sr)) + + (frames(map(map_n-1)).num_samples() - frame_pm_indices(map(map_n-1)) - 1); + + target_sig.resize(last_sample+1, EST_ALL, 0); //0 -> don't set values + target_sig.fill(0); + target_sig.set_sample_rate((int)sr); + + for( i=0; i=0 + // (it might be less than one where prosodic modification + // has moved the first pitch mark relative to wave start) + for( j=-min(0,t_start); j window; + EST_FVector f; + + float s_window_factor= Param().F("unisyn.window_factor", 1.0); + + if (frames.length()> 0) + sr = (float)frames(0).sample_rate(); + else + sr = 16000; // sort of, who cares, its going to be a 0 sample waveform + + if (map.n() > 0) + last_sample = (int)(target_pm.end() * sr) + + (frames(map(map.n()-1)).num_samples() / 2); + + target_sig.resize(last_sample); + target_sig.fill(0); + target_sig.set_sample_rate((int)sr); + + for (i = 0; i < map.n(); ++i) + { + const EST_Wave &frame = frames(map(i)); + + s_period = + (int)(get_frame_size(target_pm, i, (int)sr) * s_window_factor); +// cout << "period: " << s_period << endl; + + // start of window is mid point of analysis window + // minus local synth period + window_start = (frame.num_samples() / 2) - s_period; + + EST_Window::window_signal(frame, "hanning", window_start, + s_period * 2, f, 1); + + t_start = ((int)(target_pm.t(i) * sr) + - (f.n() / 2)); + + for (j = 0; j < f.n(); ++j) + if (j + t_start>=0) + target_sig.a_no_check(j + t_start) += (short)f.a_no_check(j); + } +} + + + +/*static void debug_options(EST_Relation &source_lab, EST_Relation &unit, + EST_Track &source_coef, + EST_Track &target_coef, EST_TVector &frames) +{ + EST_IVector map; + EST_Wave sig; + EST_Relation pm_lab; + + if (siod_get_lval("us_debug_save_source", NULL) != NIL) + { + make_segment_double_mapping(source_lab, source_coef, source_lab, + source_coef, map); + td_synthesis(source_coef, frames, source_coef, sig, map); + sig.save("source_sig.wav", "nist"); + } + + if (siod_get_lval("us_debug_save_source_coefs", NULL) != NIL) + source_coef.save("source_coefs.pm", "est"); + + if (siod_get_lval("us_debug_save_target_pm", NULL) != NIL) + target_coef.save("target_coefs.pm"); + + if (siod_get_lval("us_debug_save_source_lab", NULL) != NIL) + source_lab.save("source.lab"); + + if (siod_get_lval("us_debug_save_target_unit_lab", NULL) != NIL) + unit.save("target_unit.lab"); + +} +*/ + diff --git a/src/modules/UniSyn/us_features.cc b/src/modules/UniSyn/us_features.cc new file mode 100644 index 0000000..98517bd --- /dev/null +++ b/src/modules/UniSyn/us_features.cc @@ -0,0 +1,285 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1998 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black and Paul Taylor */ +/* Date : February 1998 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* An implementation of Metrical Tree Phonology */ +/* */ +/*=======================================================================*/ + +#include +#include "festival.h" +#include "EST_error.h" +#include "us_features.h" + +void add_feature_function(EST_Relation &r, const EST_String &fname, + const EST_String &funcname) +{ + for (EST_Item *p = r.head(); p; p = p->next()) + p->set_function(fname, funcname); +} + +void add_non_terminal_features(EST_Item *s, + EST_Features &f) +{ + EST_Features::Entries a; + + for (EST_Item *p = s; p; p = p->next()) + { + if (daughter1(p) != 0) + { + add_non_terminal_features(daughter1(p), f); + for (a.begin(f); a; ++a) + p->set_val(a->k, a->v); + } + } +} + +void add_non_terminal_features(EST_Relation &r, + EST_Features &f) +{ + add_non_terminal_features(r.head(), f); +} + + +EST_Item *named_daughter(EST_Item *syl, const EST_String &fname, + const EST_String &fval) +{ + if ((daughter1(syl) != 0) && (daughter1(syl)->f(fname) == fval)) + return daughter1(syl); + if ((daughter2(syl) != 0) && (daughter2(syl)->f(fname) == fval)) + return daughter2(syl); + return 0; +} + +EST_Item *syl_nucleus(EST_Item *syl_struct) +{ + EST_Item *t; + if (syl_struct == 0) + return 0; + + if ((t = named_daughter(syl_struct, "sylval", "Rhyme")) != 0) + { +// cout << "rhyme: " << *t << endl; + t = named_daughter(t, "sylval", "Nucleus"); +// cout << "nucleus: " << *t << endl; + return daughter1(t); + } + + return 0; +} + + +EST_Item *nth(EST_Relation &r, int n) +{ + int i = 1; + for (EST_Item *s = r.head(); s; s = s->next(), ++i) + if (n == i) + return s; + + cerr << "Couldn't find item " << n << " in relation " << r.name() + << " of length " << r.length() << endl; + festival_error(); + return 0; +} + +EST_Item *nth_leaf(EST_Item *r, int n) +{ + int i = 1; + EST_Item *p; + + for (p = first_leaf_in_tree(r); + p != next_leaf(last_leaf_in_tree(r)); p = next_leaf(p), ++i) + if (n == i) + return p; + + cerr << "Couldn't find leaf " << n << " in relation " + << r->relation()->name() <F("end") - s->F("start"); +} + +EST_Val usf_start(EST_Item *s) +{ + s = s->as_relation("Segment"); + //cout << "in usf_start\n"; + //cout << *s << endl; + /* EST_Relation *r = s->relation(); + if (r->f.S("timing_style") != "segment") + EST_warning("Attempted to use start() feature function " + "in non segment relation\n"); + */ + return (prev(s) == 0) ? 0.0 : prev(s)->F("end"); +} + +EST_Val usf_tilt_phrase_position(EST_Item *s) +{ + EST_String rel_name = s->S("time_path"); + EST_Item *t, *a; + + if ((t = s->as_relation(rel_name)) == 0) + { + cerr << "item: " << *s << endl; + EST_error("No relation %s for item\n", (const char *) rel_name); + } + + a = parent(t); + + cout << "us features phrase pos\n"; + //cout << "dereferencing syllable: " << *a << endl; + cout << "start: " << a->F("start") << endl; + cout << "end: " << a->F("end") << endl; + + if (s->S("name") == "phrase_start") + return a->F("start"); + else + return a->F("end"); +} + +EST_Val usf_tilt_event_position(EST_Item *s) +{ + EST_String rel_name = s->S("time_path"); + EST_Item *t, *a; + + if ((t = s->as_relation(rel_name)) == 0) + EST_error("No relation %s for item\n", (const char *) rel_name); + + a = parent(t); + + cout << "us features tilt pos\n"; + cout << "dereferencing syllable: " << *a << endl; + cout << "vowel_start: " << a->F("vowel_start") << endl; + cout << "start: " << a->F("start") << endl; + cout << "end: " << a->F("end") << endl; + + return a->F("vowel_start") + s->F("rel_pos",0.0); +} + +EST_Val usf_leaf_end(EST_Item *s) +{ + if (!s->f_present("time_path")) + EST_error("Attempted to use leaf end() feature function on " + "item with no time_path feature set: %s\n", + (const char *)s->relation()->name()); + + EST_String rel_name = s->S("time_path"); + EST_Item *t, *a; + + if ((t = s->as_relation(rel_name)) == 0) + EST_error("No relation %s for item\n", (const char *) rel_name); + + a = last_leaf_in_tree(t); + return a->F("end"); +} + +EST_Val usf_leaf_start(EST_Item *s) +{ + if (!s->f_present("time_path")) + EST_error("Attempted to use leaf start() feature function on " + "item with no time_path feature set: %s\n", + (const char *)s->relation()->name()); + + EST_String rel_name = s->S("time_path"); + EST_Item *t, *a; + + if ((t = s->as_relation(rel_name)) == 0) + EST_error("No relation %s for item\n", (const char *) rel_name); + + a = first_leaf_in_tree(t); + // cout << "this is the first node of the tree\n"; + //cout << *a << endl; + return a->F("start"); +} + +EST_Val usf_int_start(EST_Item *s) +{ + EST_String rel_name = "IntonationPhrase"; + EST_Item *t, *a; + + if ((t = s->as_relation(rel_name)) == 0) + EST_error("No relation %s for item\n", (const char *) rel_name); + + a = first_leaf_in_tree(parent(t)->as_relation("MetricalTree")); + return a->F("start"); +} + +EST_Val usf_int_end(EST_Item *s) +{ + EST_String rel_name = "IntonationPhrase"; + EST_Item *t, *a; + + if ((t = s->as_relation(rel_name)) == 0) + EST_error("No relation %s for item\n", (const char *) rel_name); + + a = last_leaf_in_tree(parent(t)->as_relation("MetricalTree")); + return a->F("end"); +} + +#endif + +EST_Val usf_vowel_start(EST_Item *s) +{ + if (!s->f_present("time_path")) + EST_error("Attempted to use vowel_time() feature function " + "in relation with no time_relation feature defined\n"); + + EST_String rel_name = s->S("time_path"); + + EST_Item *n = syl_nucleus(s->as_relation(rel_name)); + + n = n->as_relation("Segment"); + + return n->F("start"); +} + +void register_unisyn_features(void) +{ +// register_featfunc("unisyn_duration", usf_duration); +// register_featfunc("unisyn_start", usf_start); + register_featfunc("unisyn_vowel_start", usf_vowel_start); +// register_featfunc("unisyn_leaf_end", usf_leaf_end); +// register_featfunc("unisyn_leaf_start", usf_leaf_start); +// register_featfunc("unisyn_int_end", usf_int_end); +// register_featfunc("unisyn_int_start", usf_int_start); +// register_featfunc("unisyn_tilt_event_position", usf_tilt_event_position); +// register_featfunc("unisyn_tilt_phrase_position", usf_tilt_phrase_position); +} diff --git a/src/modules/UniSyn/us_features.h b/src/modules/UniSyn/us_features.h new file mode 100644 index 0000000..08d1bb8 --- /dev/null +++ b/src/modules/UniSyn/us_features.h @@ -0,0 +1,68 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1998 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black and Paul Taylor */ +/* Date : February 1998 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* An implementation of Metrical Tree Phonology */ +/* */ +/*=======================================================================*/ + +#ifndef __US_FEATURES_H__ +#define __US_FEATURES_H__ + +#include "festival.h" + +EST_Val usf_vowel_start(EST_Item *s); + +void add_feature_function(EST_Relation &r, const EST_String &fname, + const EST_String &f); + +EST_Item *syl_nucleus(EST_Item *syl_struct); +EST_Item *named_daughter(EST_Item *syl, const EST_String &fname, + const EST_String &fval); + +EST_Val vowel_start_time(EST_Item *s); + +bool verify_utterance_relations(EST_Utterance &u, const EST_String &names, + int err); +bool verify_relation_features(EST_Relation &r, const EST_String &names, + int err); +bool verify_relation_features(EST_Utterance &u, const EST_String rname, + const EST_String &names, int err); + +EST_Item *nth_leaf(EST_Item *r, int n); +EST_Item *nth(EST_Relation &r, int n); + + +#endif diff --git a/src/modules/UniSyn/us_mapping.cc b/src/modules/UniSyn/us_mapping.cc new file mode 100644 index 0000000..1f3e1b3 --- /dev/null +++ b/src/modules/UniSyn/us_mapping.cc @@ -0,0 +1,856 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Paul Taylor */ +/* Date: 6 Jan 1998 */ +/* --------------------------------------------------------------------- */ +/* LPC residual synthesis alternative version */ +/* */ +/*************************************************************************/ + +#include "EST_error.h" +#include "EST_inline_utils.h" +#include "us_synthesis.h" + +#include "Phone.h" + +#include + +// void make_segment_single_mapping(EST_Relation &source_lab, +// EST_Track &source_pm, +// EST_Track &target_pm, EST_IVector &map) +// { +// int i = 0; +// int s_i_start, s_i_end, t_i_start, t_i_end; +// EST_Item *s; +// float s_end, s_start, t_end, t_start, f, m; +// map.resize(target_pm.num_frames()); + +// s_start = t_start = 0.0; + +// if (target_pm.t(target_pm.num_frames() - 1) < +// source_lab.tail()->F("end",0)) +// { +// EST_warning("Target pitchmarks end before end of target segment " +// "timings (%f vs %f). Expect a truncated utterance\n", +// target_pm.t(target_pm.num_frames() - 1), +// source_lab.tail()->F("end",0.0)); +// } + +// //cout << "Source_pm" << source_pm.equal_space() << endl << endl; +// //cout << "Target_pm" << target_pm.equal_space() << endl << endl; + +// for (s = source_lab.head(); s; s = s->next()) +// { +// s_end = s->F("source_end"); +// t_end = s->F("end"); + +// s_i_start = source_pm.index_below(s_start); +// s_i_end = source_pm.index_below(s_end); +// t_i_start = target_pm.index_below(t_start); +// t_i_end = target_pm.index_below(t_end); + +// // fudge to make sure that at least one frame is available +// if (s_i_end <= s_i_start) +// s_i_end += 1; + +// //printf("%d %d %d %d\n", s_i_start, s_i_end, t_i_start, t_i_end); +// //printf("%f %f %f %f\n\n", s_start, s_end, t_start, t_end); + +// m = float (s_i_end - s_i_start)/ float(t_i_end - t_i_start); +// for (i = t_i_start, f = 0.0; i < t_i_end; ++i, ++f) +// map[i] = EST_NINT(f * m) + s_i_start; +// s_start = s->F("source_end"); +// t_start = s->F("end"); +// } +// if (i == 0) +// map.resize(0); // nothing to synthesize +// else +// map.resize(i - 1); +// } + + +void make_segment_single_mapping(EST_Relation &source_lab, + EST_Track &source_pm, + EST_Track &target_pm, EST_IVector &map) +{ + int i = 0; + int s_i_start, s_i_end, t_i_start, t_i_end; + EST_Item *s; + float s_end, s_start, t_end, t_start, m; + map.resize(target_pm.num_frames()); + + s_start = t_start = 0.0; + s_i_start = t_i_start = 0; + + if (target_pm.t(target_pm.num_frames() - 1) < + source_lab.tail()->F("end",0)) + { + EST_warning("Target pitchmarks end before end of target segment " + "timings (%f vs %f). Expect a truncated utterance\n", + target_pm.t(target_pm.num_frames() - 1), + source_lab.tail()->F("end",0.0)); + } + + + + for (s = source_lab.head(); s; s = s->next()) + { + + // printf( "*********************************************\nphone %s\n", s->S("name").str()); + + s_end = s->F("source_end"); + t_end = s->F("end"); + + s_i_end = source_pm.index_below(s_end); + t_i_end = target_pm.index_below(t_end); + + // make sure that at least one frame is available + if (s_i_end <= s_i_start) + s_i_end += 1; + +// printf("(s_i_start s_i_end t_i_start t_i_end) %d %d %d %d\n", +// s_i_start, s_i_end, t_i_start, t_i_end ); +// printf("(s_i_start-s_i_end t_i_start-t_i_end) %d %d\n", +// s_i_end - s_i_start, t_i_end - t_i_start); + + + + // OK for time alignment mapping function here to be single + // linear across subcomponents?... + // m = float(t_i_end-t_i_start+1)/float (s_i_end-s_i_start+1); + m = (t_end-t_start)/(s_end-s_start); + //m =1.0; + // printf( "m=%f\n", m ); + + // time offsets for relative times + float apm_t_off = (s_i_start==0) ? 0.0 : source_pm.t(s_i_start-1); + float tpm_t_off = (t_i_start==0) ? 0.0 : target_pm.t(t_i_start-1); + +// printf("apm_t_off = %f\ntpm_t_off = %f\n", apm_t_off, tpm_t_off); + + int apm_i = s_i_start; // analysis pitch mark index + float apm_t = source_pm.t(apm_i)-apm_t_off;// analysis pitch mark time + float next_apm_t = source_pm.t(apm_i+1)-apm_t_off; + + for( i=t_i_start; i<=t_i_end; ++i ){ + float tpm_t = target_pm.t(i)-tpm_t_off; // target pitch mark time + + // find closest pitchmark (assume only need forward search from current + // point, since pitchmarks should always be increasing) + while( (apm_i<=s_i_end) && (fabs((next_apm_t*m)-tpm_t) <= fabs((apm_t*m)-tpm_t)) ){ +// printf("(next_apm_t apm_t) %f %f\n", +// fabs((next_apm_t*m)-tpm_t), fabs((apm_t*m)-tpm_t) ); + apm_t = next_apm_t; + ++apm_i; + next_apm_t = source_pm.t(apm_i+1)-apm_t_off; + } + +// // printf( "tpm %d = apm %d\n", i, apm_i ); + +// int slow_index = source_pm.index( target_pm(i) ); + +// printf( "(my slow) %d %d\n", apm_i, slow_index ); + + map[i] = apm_i; + } + + // for next loop + s_i_start = s_i_end+1; + t_i_start = t_i_end+1; + s_start = source_pm.t(s_i_start); + t_start = target_pm.t(t_i_start); + + } + if (i == 0) + map.resize(0); // nothing to synthesize + else + map.resize(i); +} + + +void make_linear_mapping(EST_Track &pm, EST_IVector &map) +{ + int pm_num_frames = pm.num_frames(); + + map.resize(pm_num_frames); + + for (int i = 0; i < pm_num_frames; ++i) + map[i] = i; +} + + +static bool contiguous( const EST_Item*left, const EST_Item* right ) +{ + if( (item(left->f("source_ph1")))->next() == item(right->f("source_ph1")) ) + return true; + + return false; +} + + +// note this function overlooks the spacing of the 1st pitchmark (could +// assume previous pitchmark at 0.0 time...) +static void pitchmarksToSpaces( const EST_Track &pm, EST_IVector *spaces, + int start_pm, int end_pm, int wav_srate ) +{ + int left_pm, right_pm; + int num_frames = end_pm-start_pm; + spaces->resize( num_frames, 0 ); + + left_pm = (int) rint( pm.t(start_pm)*wav_srate ); //should always be > 0 + for( int i=0; if("sig"))->sample_rate(); + + // currently, the pitchmarks are just moved, there's no deletion or + // insertion + int target_pm_length = source_pm.length(); + target_pm.resize(target_pm_length, source_pm.num_channels()); + + // places to keep temporary debugging information + EST_IVector source_spacing(target_pm_length); + EST_IVector target_spacing(target_pm_length); + EST_IVector voicing(target_pm_length); + + // handle special case of first half of first diphone unit + EST_Item *diphone_left = units.head(); + + int left_start_index = diphone_left->I("middle_frame"); + int left_end_index = source_pm.index(diphone_left->F("end")); + + for( int i=0; inext(); + diphone_right; + diphone_right=diphone_left->next() ){ + + printf( "%s\t%f\n", diphone_left->S("name").str(), diphone_left->F("end")); + + int right_start_index = left_end_index + 1; + int right_end_index = right_start_index + diphone_right->I("middle_frame"); + + printf( "%d %d %d %d (l_start, l_end, r_start, r_end\n", + left_start_index, + left_end_index, + right_start_index, + right_end_index ); + + EST_String join_phone_name = item(diphone_left->f("ph1"))->next()->S("name"); + + cerr << "phone contigous " << contiguous(diphone_left,diphone_right) << endl; + + ///////////DEBUG////////////////////////////////////////////// + int voicing_val; + if( ph_is_sonorant( join_phone_name ) && + ! ph_is_silence( join_phone_name ) ){ + voicing_val = 1; + } + else + voicing_val = 0; + + for( int i=left_start_index; i=0; --i ){ +// int ns = (spaces[i]+spaces[i+1])/2; +// spaces[i+1] += spaces[i] - ns; +// spaces[i] = ns; +// } + +// for( int i=right_start_index-left_start_index; iF("end") ); + diphone_left = diphone_right; + } + + // copy the remaining pitchmarks + for( int i=left_start_index; if("sig"))->sample_rate(); + + // currently, the pitchmarks are just moved, there's no deletion or + // insertion + int target_pm_length = source_pm.length(); + target_pm.resize(target_pm_length, source_pm.num_channels()); + + // places to keep temporary debugging information + EST_IVector source_spacing(target_pm_length); + EST_IVector target_spacing(target_pm_length); + EST_IVector voicing(target_pm_length); + + // handle special case of first half of first diphone unit + EST_Item *diphone_left = units.head(); + + int left_start_index = diphone_left->I("middle_frame"); + int left_end_index = source_pm.index(diphone_left->F("end")); + + for( int i=0; inext(); + diphone_right; + diphone_right=diphone_left->next() ){ + + printf( "%s\t%f\n", diphone_left->S("name").str(), diphone_left->F("end")); + + int right_start_index = left_end_index + 1; + int right_end_index = right_start_index + diphone_right->I("middle_frame"); + + printf( "%d %d %d %d (l_start, l_end, r_start, r_end\n", + left_start_index, + left_end_index, + right_start_index, + right_end_index ); + + EST_String join_phone_name = item(diphone_left->f("ph1"))->next()->S("name"); + + cerr << "phone contigous " << contiguous(diphone_left,diphone_right) << endl; + + ///////////DEBUG////////////////////////////////////////////// + int voicing_val; + if( ph_is_sonorant( join_phone_name ) && + ! ph_is_silence( join_phone_name ) ){ + voicing_val = 1; + } + else + voicing_val = 0; + + for( int i=left_start_index; i 0 ){ +// //////////////////////////////////////////////////////////////////////// +// // filter them to modify + + +// //////////////////////////////////////////////////////////////////////// +// // copy modified pitchmark positions back into correct region of +// // target_pm track +// printf( "** using modified spaces ** \n" ); + +// for( int i=left_start_index; iF("end") ); + diphone_left = diphone_right; + } + + // copy the remaining pitchmarks + for( int i=left_start_index; i(source_pm).save( "/home/korin/projects/smoothing_temp/f0/sourceCoef.est" ) + != write_ok ) + EST_warning( "couldn't write sourceCoef.est file" ); +} + + +void us_mapping(EST_Utterance &utt, const EST_String &method) +{ + EST_Relation *source_lab, *target_lab; + EST_IVector *map; + EST_Track *source_coef=0, *target_coef=0; + + source_coef = track(utt.relation("SourceCoef")->head()->f("coefs")); + target_coef = track(utt.relation("TargetCoef")->head()->f("coefs")); + + map = new EST_IVector; + +// cout << "mapping method: " << method << endl; + if (method != "segment_single") + source_lab = utt.relation("SourceSegments"); + target_lab = utt.relation("Segment", 1); + +/* if (method == "segment") + make_segment_double_mapping(*source_lab, *source_coef, *target_lab, + *target_coef, *map); + else if (method == "dp_segment") + make_dp_mapping(*source_lab, *source_coef, *target_lab, + *target_coef, "Match", *map); + */ + if (method == "linear") + make_linear_mapping(*source_coef, *map); + else if (method == "segment_single") + make_segment_single_mapping(*target_lab, *source_coef, + *target_coef, *map); + else if (method == "interpolate_joins"){ + cerr << "Doing interpolate_joins\n"; + EST_Relation *units = utt.relation("Unit"); + make_join_interpolate_mapping(*source_coef, *target_coef, *units,*map); + } + else if (method == "interpolate_joins2"){ + cerr << "Doing interpolate_joins2\n"; + EST_Relation *units = utt.relation("Unit"); + make_join_interpolate_mapping2(*source_coef, *target_coef, *units,*map); + } + else + EST_error("Mapping method \"%s\" not found\n", (const char *)method); + + utt.create_relation("US_map"); + EST_Item *item = utt.relation("US_map")->append(); + item->set_val("map", est_val(map)); +} + + +void add_wave_to_utterance(EST_Utterance &u, EST_Wave &sig, + const EST_String &name) +{ + u.create_relation(name); + EST_Item *item = u.relation(name)->append(); + item->set_val("wave", est_val(&sig)); +} + +void map_to_relation(EST_IVector &map, EST_Relation &r, + const EST_Track &source_pm, + const EST_Track &target_pm) +{ + EST_Item *s, *t, *a=NULL; + EST_Utterance *u = r.utt(); + int i; + +// cout << "source: " << source_pm; +// cout << "target: " << target_pm; + + u->create_relation("smap"); + u->create_relation("tmap"); + + for (i = 0; i < source_pm.num_frames(); ++i) + { + s = u->relation("smap")->append(); + s->set("index", i); + s->set("end", source_pm.t(i)); + } + + for (i = 0; i < target_pm.num_frames(); ++i) + { + s = u->relation("tmap")->append(); + s->set("index", i); + s->set("end", target_pm.t(i)); + } + + EST_Item *last_s = 0; + + for (s = u->relation("smap")->head(); s; s = s->next()) + { + int n = s->I("index"); + for (t = u->relation("tmap")->head(); t; t = t->next()) + { + if (map(t->I("index")) == n) + { + if (last_s != s) + a = u->relation("lmap")->append(s); + last_s = s; + a->append_daughter(t); + t->set("map", n); + } + } + } +} + +/* +void make_segment_double_mapping(EST_Relation &source_lab, + EST_Track &source_pm, + EST_Relation &target_lab, + EST_Track &target_pm, EST_IVector &map) +{ + int i = 0; + int s_i_start, s_i_end, t_i_start, t_i_end; + EST_Item *s, *t; + float s_end, s_start, t_end, t_start, f, m; + map.resize(target_pm.num_frames()); + + s_start = t_start = 0.0; + + if (target_pm.t(target_pm.num_frames() - 1) < + target_lab.tail()->F("end")) + EST_warning("Target pitchmarks end before end of target segment " + "timings. Expect a truncated utterance.\n"); + + for (s = source_lab.head(), t = target_lab.head(); s && t; + s = s->next(), t = t->next()) + { + if (s->S("name") != t->S("name")) + cerr << "Warning: Source and Target segment names do not match: " + << s->S("name") << " " << t->S("name") << endl; + + s_end = s->F("end"); + t_end = t->F("end"); + + s_i_start = source_pm.index_below(s_start); + s_i_end = source_pm.index_below(s->F("end")); + t_i_start = target_pm.index_below(t_start); + t_i_end = target_pm.index_below(t->F("end")); + + // fudge to make sure that at least one frame is available + if (s_i_end <= s_i_start) + s_i_end += 1; + + // printf("%d %d %d %d\n", s_i_start, s_i_end, t_i_start, t_i_end); + // printf("%f %f %f %f\n", s_start, s_end, t_start, t_end); + + m = float (s_i_end - s_i_start)/ float(t_i_end - t_i_start); + for (i = t_i_start, f = 0.0; i < t_i_end; ++i, ++f) + map[i] = EST_NINT(f * m) + s_i_start; + + s_start = s->F("end"); + t_start = t->F("end"); + } + if (i == 0) + map.resize(0); // nothing to synthesize + else + map.resize(i - 1); +} + + +void make_dp_mapping(EST_Relation &source_lab, EST_Track &source_pm, + EST_Relation &target_lab, EST_Track &target_pm, + const EST_String &match_name, EST_IVector &map) +{ + int i = 0, j; + int s_i_start, s_i_end, t_i_start, t_i_end; + EST_Item *s, *t; + float s_end, s_start, t_end, t_start, f, m, prev_end; + map.resize(target_pm.num_frames()); + + map.fill(-1); + + s_start = t_start = 0.0; + + // should really be replaced by feature functions. + for (prev_end = 0.0, s = source_lab.head(); s; s = s->next()) + { + s->set("start", prev_end); + prev_end = s->F("end"); + } + + // should really be replaced by feature functions. + for (prev_end = 0.0, s = target_lab.head(); s; s = s->next()) + { + s->set("start", prev_end); + prev_end = s->F("end"); + } + + if (target_pm.t(target_pm.num_frames() - 1) < + target_lab.tail()->F("end", 1)) + EST_warning("Target pitchmarks end before end of target segment " + "timings. Expect a truncated utterance.\n"); + + for (s = source_lab.head(); s; s = s->next()) + { + s_start = s->F("start"); + + cout << "source: " << *s << endl; + + while (s && (!s->in_relation(match_name))) + s = s->next(); + + cout << "active source: " << *s << endl; + + s_end = s->F("end"); + + cout << "daughter: " << daughter1(s->as_relation(match_name)) << endl; + cout << "parent: " << parent(s->as_relation(match_name)) << endl; + + t = parent(s->as_relation(match_name)); + + cout << "active target: " << *t << endl; + + t_end = t->F("end"); + t_start = t->F("start"); + + s_i_start = source_pm.index_below(s_start); + s_i_end = source_pm.index_below(s->F("end")); + t_i_start = target_pm.index_below(t_start); + t_i_end = target_pm.index_below(t->F("end")); + + // fudge to make sure that at least one frame is available + if (s_i_end <= s_i_start) + s_i_end += 1; + + //printf("%d %d %d %d\n", s_i_start, s_i_end, t_i_start, t_i_end); + //printf("%f %f %f %f\n", s_start, s_end, t_start, t_end); + + m = float (s_i_end - s_i_start)/ float(t_i_end - t_i_start); + for (i = t_i_start, f = 0.0; i < t_i_end; ++i, ++f) + map[i] = EST_NINT(f * m) + s_i_start; + + cout << endl; + + } + + for (i = 0, j = 0; i < target_pm.num_frames(); ++i) + { + cout << map(i) << " "; + if (map(i) != -1) + { + map[j] = map(i); + cout << map(j) << " "; + target_pm.t(j++) = target_pm.t(i); + } + } + + if (j == 0) + map.resize(0); // nothing to synthesize + else + map.resize(j); +} +*/ diff --git a/src/modules/UniSyn/us_prosody.cc b/src/modules/UniSyn/us_prosody.cc new file mode 100644 index 0000000..10cb85e --- /dev/null +++ b/src/modules/UniSyn/us_prosody.cc @@ -0,0 +1,541 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Paul Taylor */ +/* Date: 6 Jan 1998 */ +/* --------------------------------------------------------------------- */ +/* UniSyn prosody manipulation functions */ +/* */ +/*************************************************************************/ + +#include "us_synthesis.h" +#include "Phone.h" + +//static void add_end_silences(EST_Relation &segment); +//static void add_end_silences(EST_Relation &segment, EST_Relation &target); + +void pitchmarks_to_f0(EST_Track &pm, EST_Track &fz, float shift) +{ + int i; + float period; + + fz.resize((int)(pm.end()/shift), 1); + fz.fill_time(shift); + + for (i = 0; i < fz.num_frames() -1 ; ++i) + { + period = get_time_frame_size(pm, pm.index_below(fz.t(i))); + fz.a(i) = 1.0 /period; + } +} + +void f0_to_pitchmarks(EST_Track &fz, EST_Track &pm, int num_channels, + float default_f0, float target_end) +{ + int i; + float max = 0.0; + float fz_end; + + // Its impossible to guess the length of the pitchmark array before + // hand. Here we find the upper limit and resize at the end + for (i = 0; i < fz.num_frames(); ++i) + { + if (fz.a_no_check(i) < 0) + fz.a_no_check(i) = 0; + if (fz.a_no_check(i) > 500) + fz.a_no_check(i) = fz.a_no_check(i-1); + if (fz.a_no_check(i) > max) + max = fz.a_no_check(i); + } + + // Coefficients will also be placed in here, so its best allocate + // space for their channels now + fz_end = fz.end(); + pm.resize(int(max * (Gof(fz_end, target_end))) + 10, num_channels); + + + int fz_len = fz.length(); + float t1 = 0.0; //first pitchmark convention + float t2; + + float f1 = fz.a_no_check(0); //arbitrary init + float f2; + + double area = 0.5; // init value + int pm_i = 0; + int pm_len = pm.length(); + for( int i=0; i= 1.0) && (pm_i < pm_len) ){ + area -= 1.0; + float discriminant = f2*f2 - 2.0 * area * slope; + if (discriminant < 0.0) discriminant = 0.0; + pm.t(pm_i++) = t2 - 2.0 * area / (f2 + sqrt (discriminant)); + } + t1 = t2; + f1 = f2; + } + + float default_shift = 1.0 / default_f0; + if (target_end > fz_end) + for (; t1 < target_end; ++pm_i) + t1 = pm.t(pm_i) = t1 + default_shift; + + pm.resize(pm_i-1, num_channels); +} + + + +/* Convert an F0 contour into a set of pitchmarks. This is done by the +obvious iterative function. + +Space before the first defined F0 value is filled with regularly space +pitchmarks at intervals 1/def_f0. If the target_end value is +specified, more default pitchmarks are placed after the end of the +last f0 value until time target_end has been reached. +*/ + +void f0_to_pitchmarks_orig(EST_Track &fz, EST_Track &pm, int num_channels, + float default_f0, float target_end) +{ + int i; + float max = 0.0, prev_pm = 0.0, val; + float fz_end; + +// cout << "fz end: " << fz.end() << endl; +// cout << "fz n fg: " << fz.num_frames() << endl; + + // Its impossible to guess the length of the pitchmark array before + // hand. Here we find the upper limit and resize at the end + for (i = 0; i < fz.num_frames(); ++i) + { + if (fz.a_no_check(i) < 0) + fz.a_no_check(i) = 0; + if (fz.a_no_check(i) > 500) + fz.a_no_check(i) = fz.a_no_check(i-1); + if (fz.a_no_check(i) > max) + max = fz.a_no_check(i); + } + + // Coefficients will also be placed in here, so its best allocate + // space for their channels now + fz_end = fz.end(); + pm.resize(int(max * (Gof(fz_end, target_end))) + 10, num_channels); + +// cout << "fz end: " << fz.end() << endl; +// cout << "fz n fg: " << fz.num_frames() << endl; +// cout << "pmn fg: " << pm.num_frames() << endl; + + for (i = 0; prev_pm < fz_end; ++i) + { + val = fz.a(prev_pm) > 0.0 ? fz.a(prev_pm) : default_f0; + pm.t(i) = prev_pm + (1.0 / val); + prev_pm = pm.t(i); + } + + if (target_end > fz_end) + for (; prev_pm < target_end; ++i) + { + pm.t(i) = prev_pm + (1.0 / default_f0); + prev_pm = pm.t(i); + } + + pm.resize(i - 1, num_channels); +} + +// not sure if this is useful +void linear_pitchmarks(EST_Track &source_pm, EST_Track &target_pm, + float start_f0, float end_f0) +{ + int i; + float m, length, pitch; + target_pm.resize(source_pm.num_frames(), source_pm.num_channels()); + + length = (float)source_pm.num_frames() / (end_f0 - start_f0); + + target_pm.t(0) = 0.0; + m = (end_f0 - start_f0) / length; + + for(i = 1; i < target_pm.num_frames(); ++i) + { + pitch = (((float)i / (float) target_pm.num_frames()) + * (end_f0 - start_f0)) + start_f0; + target_pm.t(i) = target_pm.t(i - 1) + (1 /pitch); + } +} + +// not sure if this is useful +void stretch_f0_time(EST_Track &f0, float stretch, + float s_last_time, float t_last_time) +{ + for (int i = 0 ; i < f0.num_frames(); ++i) + { +// cout << i << " o t:" << f0.t(i) << endl; + f0.t(i) = ((f0.t(i) - s_last_time) * stretch) + t_last_time; +// cout << i << " m t:" << f0.t(i) << endl; + } +} + +// make target F0 from source F0, with same F0 values as original, +// but durations specified by target_seg. + +/* +void us_F0targets_to_pitchmarks(EST_Utterance &utt, + const EST_String &seg_relation) +{ + utt.create_relation("TargetCoef"); + EST_Track *target_coef = new EST_Track; + EST_Item *end_seg; + int num_channels = 0; + float end; + + if (utt.relation_present("SourceCoef")) + { + EST_Track *source_coef = + track(utt.relation("SourceCoef")->head()->f("coefs")); + num_channels = source_coef->num_channels(); + } + + if (seg_relation == "") + end_seg = utt.relation("Segment", 1)->last(); + else + end_seg = utt.relation(seg_relation, 1)->last(); + + if (end_seg) + end = end_seg->F("end"); + else + end = 0; + + targets_to_pitchmarks(*(utt.relation("Target")), *target_coef, + num_channels,end); + + EST_Item *item = utt.relation("TargetCoef")->append(); + item->set("name", "coef"); + item->set_val("coefs",est_val(target_coef)); + +} + +void targets_to_pitchmarks(EST_Relation &targ, EST_Track &pitchmarks, + int num_channels,float end) +{ + EST_Item *s; + float time, f0, prev_time, prev_f0, m, max; + int i; + + // Its impossible to guess the length of the pitchmark array before + // hand. Here we find the upper limit and resize at the end + for (max = 0.0, s = targ.first_leaf(); s; s = next_leaf(s)) + if (s->F("f0") > max) + max = s->F("f0"); + + pitchmarks.resize((int)(max * 1.1 * end)+1, num_channels); + + prev_time = 0; + prev_f0 = targ.first_leaf() ? targ.first_leaf()->F("f0") : 120; + pitchmarks.t(0) = 0.0; + + for (i = 1, s = targ.first_leaf(); s; s = next_leaf(s)) + { + time = s->f("pos"); + f0 = s->F("f0"); + + if (f0 < 30) // to protect against with duff IntTarget algorithms + continue; + if (time == prev_time) + continue; + else if (time < prev_time) + { + cerr << "UniSyn: warning target in wrong order at " << prev_time; + cerr << " ignored" << endl; + continue; + } + m = (f0 - prev_f0) / (time - prev_time); + + + { + f0 = (m * (pitchmarks.t(i - 1) - prev_time)) + prev_f0; + pitchmarks.t(i) = pitchmarks.t(i - 1) + 1.0/f0; + } + prev_time = time; + prev_f0 = f0; + } + // Ensure pitch marks go to the end of the utterance + // This will effectively mean the last half diphone will be extend over + // the whol final segment. This will only be reasonable if the + // final segment is a silence. + for (; pitchmarks.t(i - 1) < end; ++i) + pitchmarks.t(i) = pitchmarks.t(i - 1) + 1.0/prev_f0; + pitchmarks.resize(i, pitchmarks.num_channels()); +} +*/ + + +/*static void add_end_silences(EST_Relation &segment, EST_Relation &target) +{ + EST_Item *t, *n; + float shift = 0.0; + const float pause_duration = 0.1; + + t = segment.head(); + if (!ph_is_silence(t->f("name"))) + { + n = t->insert_before(); + n->set("name", ph_silence()); + n->set("dur", pause_duration); + shift += pause_duration; + } + + t = segment.tail(); + if (!ph_is_silence(t->f("name"))) + { + n = t->insert_after(); + n->set("name", ph_silence()); + n->set("dur", pause_duration); + shift += pause_duration; + } + dur_to_end(segment); + + target.tail()->set("pos", (target.tail()->F("pos") + shift)); +} + +void merge_pitchmarks(EST_Utterance &u, EST_Track &pm1, + EST_Track &pm2, EST_Track &target_pm, + EST_Relation &guide) +{ + EST_Item *s; + float s_end, s_start; + int s_i_start, s_i_end; + int i, j = 0; + (void) u; + + target_pm.resize(1000000, 0); + s_start = 0.0; + + for (s = guide.head(); s; s = s->next()) + { + s_end = s->F("end", 1); + if (s->fI("use_pm") == 1) + { + s_i_start = pm1.index_below(s_start); + s_i_end = pm1.index_below(s_end); + for (i = s_i_start; i < s_i_end; ++i, ++j) + target_pm.t(j) = pm1.t(i); + } + else + { + s_i_start = pm2.index_below(s_start); + s_i_end = pm2.index_below(s_end); + for (i = s_i_start; i < s_i_end; ++i, ++j) + target_pm.t(j) = pm2.t(i); + } + s_start = s_end; + } +} + +void warp_f0(EST_Track &source_f0, EST_Relation &source_seg, + EST_Track &target_f0, EST_Relation &target_seg) +{ + EST_Item *s, *t; + float prev_source_end = 0.0, prev_target_end = 0.0; + EST_Track part; + int frame_start, frame_end; + float stretch, t_last_time = 0, s_last_time = 0; + EST_Relation match("Match"); + EST_Item xx; + EST_Track str; + int i = 0; + + dp_match(target_seg, source_seg, match, local_cost, &xx); + + target_f0 = source_f0; + frame_start = 0; + frame_end = 0; + + str.resize(target_seg.length(), 1); + + cout << "tag: " << target_seg << endl; + + for (t = target_seg.head(); t; t = t->next()) + { + s = daughter1(t,"Match"); + if (s == 0) // ie extra phone in target specification + continue; + + frame_end = source_f0.index(s->f("end")); + if ((frame_end - frame_start) < 1) + { + cout << "Warning no frames for: " << *t << endl; + continue; + } + target_f0.sub_track(part, frame_start, (frame_end - frame_start + 1), + 0, EST_ALL); + + stretch = (t->F("end") - prev_target_end) / + (s->F("end") - prev_source_end); + + str.a(i) = stretch; + str.t(i++) = t->F("end"); + + cout << "\nstretch: " << stretch << endl; + cout << "source: " << *s << endl; + cout << "target: " << *t << endl; + cout << "frames: " << frame_start << " " << frame_end << endl; + + stretch_f0_time(part, stretch, s_last_time, t_last_time); + + prev_target_end = t->f("end"); + prev_source_end = s->f("end"); + frame_start = frame_end + 1; + t_last_time = part.end(); + s_last_time = source_f0.t(frame_end); + cout << "last time = " << s_last_time << " " << t_last_time << endl; + } + target_f0.resize(frame_end, 1); + target_f0.a(target_f0.num_frames() - 1) = 100; + str.save("zz_stretch"); +} + +void warp_pitchmarks(EST_Utterance &utt, EST_Track *source_pm, + EST_Relation &source_seg, EST_Relation &target_seg) +{ + EST_Track source_f0, target_f0, *target_pm; + + target_pm = new EST_Track; + + cout << "tag: "<< target_seg << endl; + + add_end_silences(target_seg); + + + cout << "tag 2: "<< target_seg << endl; + + pitchmarks_to_f0(*source_pm, source_f0, 0.01); + + cout << "tag 3: "<< target_seg << endl; + + warp_f0(source_f0, source_seg, target_f0, target_seg); + + f0_to_pitchmarks(target_f0, *target_pm); + + utt.create_relation("TargetCoef"); + utt.create_relation("SourceSegments"); + + *utt.relation("SourceSegments") = source_seg; + + EST_Item *item = utt.relation("TargetCoef")->append(); + + target_f0.save("tt_tar.f0", "est"); + target_seg.save("tt_tar.lab"); + source_seg.save("tt_sou.lab"); + source_f0.save("tt_sou.f0", "est"); + + target_pm->save("target_coef_a.pm","est"); + item->set("name", "coefs"); + item->set_val("coefs", est_val(target_pm)); +} + +float local_cost(const EST_Item *s1, const EST_Item *s2) +{ +<<<<<<< us_prosody.cc + utt.create_relation("TargetCoef"); + EST_Track *target_coef = new EST_Track; + EST_Item *end_seg; + int num_channels = 0; + float end; + + if (utt.relation_present("SourceCoef")) + { + EST_Track *source_coef = + track(utt.relation("SourceCoef")->head()->f("coefs")); + num_channels = source_coef->num_channels(); + } +======= + float insertion_cost = get_c_int(siod_get_lval("met_insertion", NULL)); + float deletion_cost = get_c_int(siod_get_lval("met_deletion", NULL)); + float substitution_cost = + get_c_int(siod_get_lval("met_substitution", NULL)); +>>>>>>> 1.14 + + EST_String null_sym = "nil"; + + // otherwise cost is either insertion cost, or cost_matrix value + if (s1->name() == s2->name()) + return 0; + else + { + if (s1->name() == null_sym) + return insertion_cost; + else if (s2->name() == null_sym) + return deletion_cost; + else + return substitution_cost; + } +} +typedef +float (*local_cost_function)(const EST_Item *item1, + const EST_Item *item2); + +bool dp_match(const EST_Relation &lexical, + const EST_Relation &surface, + EST_Relation &match, + local_cost_function lcf, + EST_Item *null_syl); + + + +*/ + +/*static void add_end_silences(EST_Relation &segment) +{ + EST_Item *t, *n; + + t = segment.head(); + if (!ph_is_silence(t->f("name"))) + { + n = t->insert_before(); + n->set("name", ph_silence()); + } + + t = segment.tail(); + if (!ph_is_silence(t->f("name"))) + { + n = t->insert_after(); + n->set("name", ph_silence()); + } +} + +*/ diff --git a/src/modules/UniSyn/us_synthesis.h b/src/modules/UniSyn/us_synthesis.h new file mode 100644 index 0000000..ddcf910 --- /dev/null +++ b/src/modules/UniSyn/us_synthesis.h @@ -0,0 +1,132 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Paul Taylor */ +/* Date: March 1998 */ +/* --------------------------------------------------------------------- */ +/* */ +/*************************************************************************/ + + +#ifndef __US_SYNTHESIS_H__ +#define __US_SYNTHESIS_H__ + +#include "EST.h" + +typedef EST_TVector EST_WaveVector; + +VAL_REGISTER_TYPE_DCLS(wavevector,EST_WaveVector) +VAL_REGISTER_TYPE_DCLS(ivector,EST_IVector) + +SIOD_REGISTER_TYPE_DCLS( wavevector, EST_WaveVector) + +void add_wave_to_utterance(EST_Utterance &u, EST_Wave &sig, + const EST_String &name); + + +/*void us_lpc_synthesis(EST_Utterance &utt); + +void us_mapping(EST_Utterance &utt, const EST_String &method); +void us_td_synthesis(EST_Utterance &utt, const EST_String &filter_method, + const EST_String &ola_method); + +void lpc_synthesis(EST_Track &source_coef, EST_Track &target_coef, + EST_TVector &frames, EST_Wave &sig, + EST_IVector &map); + +void window_signal(EST_Wave &sig, EST_Track &pm, + EST_TVector &frames, int &i, float scale, + float window_factor, + const EST_String &window_name); + +void window_signal(EST_Relation &unit_stream, + EST_TVector &frames, + float window_factor, + EST_String window_name); + +void concatenate_coefs(EST_Relation &unit_stream, + EST_Track &source_lpc); + +void us_unit_copy_wave(EST_Utterance &utt, EST_Wave &source_sig, + EST_Track *source_pm); + +void us_unit_concat(EST_Utterance &utt); + +void us_full_cut(EST_Relation &unit); + +void us_energy_normalise(EST_Relation &unit); + +void us_unit_raw_concat(EST_Utterance &utt); + + +void pitchmarks_to_f0(EST_Track &pm, EST_Track &fz, float shift); + +void f0_to_pitchmarks(EST_Track &fz, EST_Track &pm, float target_end = -1.0); + +void targets_to_pitchmarks(EST_Relation &targ, EST_Track &pitchmarks, + int num_channels,float end); + +void targets_to_f0(EST_Relation &targ, EST_Track &f0, const float shift); + +void stretch_f0_time(EST_Track &f0, float stretch, + float s_last_time, float t_last_time); + +void warp_f0(EST_Track &source_f0, EST_Relation &source_seg, + EST_Track &target_f0, EST_Relation &target_seg); + +void warp_pitchmarks(EST_Utterance &utt, EST_Track *source_pm, + EST_Relation &source_seg, EST_Relation &target_seg); + +void us_F0targets_to_pitchmarks(EST_Utterance &utt, + const EST_String &seg_relation); + +void add_wave_to_utterance(EST_Utterance &u, EST_Wave &sig, + const EST_String &name); + +void make_linear_mapping(EST_Track &pm, EST_IVector &map); + +void make_dp_mapping(EST_Relation &source_lab, EST_Track &source_pm, + EST_Relation &target_lab, EST_Track &target_pm, + const EST_String &match_name, EST_IVector &map); + +void make_segment_double_mapping(EST_Relation &source_lab, EST_Track &source_pm, + EST_Relation &target_lab, + EST_Track &target_pm, EST_IVector &map); + +void make_segment_single_mapping(EST_Relation &source_lab, EST_Track &source_pm, + EST_Track &target_pm, EST_IVector &map); +*/ + + + +#endif // __US_SYNTHESIS_H__ diff --git a/src/modules/UniSyn/us_unit.cc b/src/modules/UniSyn/us_unit.cc new file mode 100644 index 0000000..d90e922 --- /dev/null +++ b/src/modules/UniSyn/us_unit.cc @@ -0,0 +1,631 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Paul Taylor */ +/* Date: 6 Jan 1998 */ +/* --------------------------------------------------------------------- */ +/* Acoustic Unit Concatenation */ +/* */ +/*************************************************************************/ + + +#include "siod.h" +#include "EST_sigpr.h" +#include "EST_wave_aux.h" +#include "EST_track_aux.h" +#include "EST_ling_class.h" +#include "us_synthesis.h" +#include + +#include "Phone.h" + +void merge_features(EST_Item *from, EST_Item *to, int keep_id); + +void dp_time_align(EST_Utterance &utt, const EST_String &source_name, + const EST_String &target_name, + const EST_String &time_name, + bool do_start); + +void concatenate_unit_coefs(EST_Relation &unit_stream, EST_Track &source_lpc); +void us_unit_raw_concat(EST_Utterance &utt); + +void window_units(EST_Relation &unit_stream, + EST_TVector &frames, + float window_factor, + EST_String window_name, + bool window_symmetric, + EST_IVector *pm_indices=0); + +bool dp_match(const EST_Relation &lexical, + const EST_Relation &surface, + EST_Relation &match, + float ins, float del, float sub); + +void map_match_times(EST_Relation &target, const EST_String &match_name, + const EST_String &time_name, bool do_start); + + +static void window_frame(EST_Wave &frame, EST_Wave &whole, float scale, + int start, int end, EST_WindowFunc *window_function, + int centre_index=-1) +{ + int i, j, send; + EST_TBuffer window; + int window_length = (end-start)+1; + + if (frame.num_samples() != (window_length)) + frame.resize(window_length); + frame.set_sample_rate(whole.sample_rate()); + // Ensure we have a safe end + if (end < whole.num_samples()) + send = end; + else + send = whole.num_samples(); + + + int print_centre; + if ( centre_index < 0 ){ + window_function( window_length, window, -1 ); + print_centre = (window_length-1)/2+start; + } + else{ + window_function( window_length, window, (centre_index-start)); + print_centre = centre_index; + } + + +#if defined(EST_DEBUGGING) + cerr << "(start centre end window_length wholewavelen) " + << start << " " + << print_centre << " " + << end << " " + << window_length << " " + << whole.num_samples() << endl; +#endif + + + // To allow a_no_check access we do this in three stages + for (i = 0, j = start; j < 0; ++i, ++j) + frame.a_no_check(i) = 0; + for ( ; j < send; ++i, ++j) + frame.a_no_check(i) = (int)((float)whole.a_no_check(j) * window(i) * scale); + for ( ; j < end; ++j,++i) + frame.a_no_check(i) = 0; + + +#if defined(EST_DEBUGGING) + // It's not always very nice to resynthesise speech from + // inserted zeros! These checks should alert the user (me ;) + if( start<0 ) + EST_warning( "padded start of pitch period with zeros (index %d)", i ); + + if( end>whole.num_samples() ) + EST_warning( "padded end of pitch period with zeros (frame %d)", i ); +#endif +} + + +// The window_signal function has been changed in several ways: +// +// *) The function now has an asymmetric window mode. +// +// In this mode, asymmetric windows are used from pitchmark at t-1 +// to pitchmark at time t+1, with the maximum value of 1.0 at +// pitchmark at time t. +// +// *) In the original symmetric mode: +// +// The first change is to ensure the window frames always have an +// odd number of samples (a convention for how to handle rounding +// problems when converting from times (float) to sample numbers +// (int)). The centre sample corresponds to the pitch mark time. +// +// The second change is that the estimate of local pitch period is +// always based in current and *previous* pitchmark. In the case +// of the first pitch mark in track pm, the previous pitchmark is +// assumed to be at zero time. Hopefully, this won't break much. +// However, if this convention is not used everywhere else that +// it's needed and some things break, then arguably those +// things need to be fixed to adhere to this same convention... +void window_signal(EST_Wave &sig, EST_Track &pm, + EST_WaveVector &frames, int &i, float scale, + float window_factor, + EST_WindowFunc *window_function, + bool window_symmetric, + EST_IVector *pm_indices=0) +{ + float first_pos, period=0.0; + float prev_pm, current_pm; + int first_sample, centre_sample, last_sample; + int sample_rate = sig.sample_rate(); + int pm_num_frames = pm.num_frames(); + + // estimate first period as pitchmark time itself (i.e. assume a previous + // pitchmark at 0.0 time, waveform sample 0) + prev_pm = 0.0; + + + if( window_symmetric ) + { + if (pm_num_frames < 1 ) + EST_error( "Attempted to Window around less than 1 pitchmark" ); + + for( int j=0; jn() << endl; +#endif + + ++i; + } + } +} + +void window_units( EST_Relation &unit_stream, + EST_TVector &frames, + float window_factor, + EST_String window_name, + bool window_symmetric, + EST_IVector *pm_indices ) +{ + int i; + EST_Wave *sig; + EST_Item *u; + EST_Track *coefs; + int num = 0; + float scale; + EST_WindowFunc *window_function; + + for (u = unit_stream.head(); u; u = u->next()) + num += track(u->f("coefs"))->num_frames(); + frames.resize(num); + + if( pm_indices != 0 ) + pm_indices->resize(num); + + if (window_name == "") + window_name = "hanning"; + + window_function = EST_Window::creator(window_name); + + for (i = 0, u = unit_stream.head(); u; u = u->next()) + { + sig = wave(u->f("sig")); + coefs = track(u->f("coefs")); + scale = (u->f_present("scale") ? u->F("scale") : 1.0); + + window_signal(*sig, *coefs, frames, i, scale, window_factor, + window_function, window_symmetric, pm_indices); + } +} + + +void us_unit_concat(EST_Utterance &utt, float window_factor, + const EST_String &window_name, + bool no_waveform=false, + bool window_symmetric=true) + +{ + EST_Relation *unit_stream; + EST_Track *source_coef = new EST_Track; + EST_WaveVector *frames = new EST_WaveVector; + EST_IVector *pm_indices = 0; + + unit_stream = utt.relation("Unit", 1); + + concatenate_unit_coefs(*unit_stream, *source_coef); + + utt.create_relation("SourceCoef"); + EST_Item *item = utt.relation("SourceCoef")->append(); + item->set("name", "coef"); + item->set_val("coefs", est_val(source_coef)); + + if (!no_waveform){ + if( !window_symmetric ) + pm_indices = new EST_IVector; + + window_units(*unit_stream, *frames, + window_factor, window_name, window_symmetric, pm_indices); + + item->set_val("frame", est_val(frames)); + + if( !window_symmetric ) + item->set_val("pm_indices", est_val(pm_indices)); + } +} + + +void us_get_copy_wave(EST_Utterance &utt, EST_Wave &source_sig, + EST_Track &source_coefs, EST_Relation &source_seg) +{ + EST_Item *s, *n; + + if (!utt.relation_present("Segment")) + EST_error("utterance must have \"Segment\" relation\n"); + + utt.create_relation("TmpSegment"); + + for (s = source_seg.head(); s; s = s->next()) + { + n = utt.relation("TmpSegment")->append(); + merge_features(n, s, 0); + } + + utt.relation("Segment")->remove_item_feature("source_end"); + + dp_time_align(utt, "TmpSegment", "Segment", "source_", 0); + + utt.create_relation("Unit"); + EST_Item *d = utt.relation("Unit")->append(); + + + EST_Wave *ss = new EST_Wave; + *ss = source_sig; + + EST_Track *c = new EST_Track; + *c = source_coefs; + + d->set_val("sig", est_val(ss)); + d->set_val("coefs", est_val(c)); + + utt.remove_relation("TmpSegment"); +} + + +void us_energy_normalise(EST_Relation &unit) +{ + EST_Wave *sig; + + for (EST_Item *s = unit.head(); s; s = s->next()) + { + sig = wave(s->f("sig")); + if (s->f_present("energy_factor")) + sig->rescale(s->F("energy_factor")); + } +} + +void us_unit_raw_concat(EST_Utterance &utt) +{ + EST_Wave *sig, *unit_sig; + EST_Track *unit_coefs=0; + float window_factor; + int i, j, k; + int first_pm, last_pm, last_length; + float first_pos, last_pos; + + window_factor = get_c_float(siod_get_lval("window_factor", + "UniSyn: no window_factor")); + sig = new EST_Wave; + + sig->resize(1000000); + sig->fill(0); + j = 0; + + for (EST_Item *s = utt.relation("Unit", 1)->head(); s; s = s->next()) + { + unit_sig = wave(s->f("sig")); + unit_coefs = track(s->f("coefs")); + + first_pos = unit_coefs->t(1); + first_pm = (int)(first_pos * (float)unit_sig->sample_rate()); + + last_pos = unit_coefs->t(unit_coefs->num_frames()-2); + last_pm = (int)(last_pos * (float)unit_sig->sample_rate()); + last_length = unit_sig->num_samples() - last_pm; + +// cout << "first pm: " << first_pm << endl; +// cout << "last pm: " << last_pm << endl; +// cout << "last length: " << last_length << endl; + + j -= first_pm; + + for (i = 0; i < first_pm; ++i, ++j) + sig->a_safe(j) += (short)((((float) i)/ (float)first_pm) *(float)unit_sig->a_safe(i)+0.5); + + for (; i < last_pm; ++i, ++j) + sig->a(j) = unit_sig->a(i); + + for (k = 0; i < unit_sig->num_samples(); ++i, ++j, ++k) + sig->a_safe(j) += (short)((1.0 - (((float) k) / (float) last_length)) + * (float)unit_sig->a_safe(i) + 0.5); + +// j -= last_length; +// j += 2000; + } + + sig->resize(j); + sig->set_sample_rate(16000); + + add_wave_to_utterance(utt, *sig, "Wave"); +} + + +void concatenate_unit_coefs(EST_Relation &unit_stream, EST_Track &source_lpc) +{ + int num_source_frames = 0; + int num_source_channels = 0;; + float prev_time, abs_offset, rel_offset, period, offset; + int i, j, k, l; + EST_Track *coefs; + + EST_Item *u = unit_stream.head(); + if( u == 0 ){ + //sometimes we are just asked to synthesise empty utterances, and + //code elsewhere wants us to continue... + source_lpc.resize(0,0); + } + else{ + EST_Track *t = 0; + for ( ; u; u = u->next()) + { + t = track(u->f("coefs")); + num_source_frames += t->num_frames(); + } + + num_source_channels = t->num_channels(); + + source_lpc.resize(num_source_frames, num_source_channels); + source_lpc.copy_setup(*t); + + prev_time = 0.0; + // copy basic information + for (i = 0, l = 0, u = unit_stream.head(); u; u = u->next()) + { + coefs = track(u->f("coefs")); + + for (j = 0; j < coefs->num_frames(); ++j, ++i) + { + for (k = 0; k < coefs->num_channels(); ++k) + source_lpc.a_no_check(i, k) = coefs->a_no_check(j, k); + source_lpc.t(i) = coefs->t(j) + prev_time; + } + + prev_time = source_lpc.t(i - 1); + u->set("end", prev_time); + u->set("num_frames", coefs->num_frames()); + } + } + + // adjust pitchmarks + abs_offset = 0.0; + rel_offset = 0.0; + // absolute offset in seconds + abs_offset = get_c_float(siod_get_lval("us_abs_offset", "zz")); + // relative offset as a function of local pitch period + rel_offset = get_c_float(siod_get_lval("us_rel_offset", "zz")); + + if( abs_offset!=0.0 || rel_offset!=0.0 ){ + cerr << "Adjusting pitchmarks" << endl; + for (i = 0; i < source_lpc.num_frames(); ++i){ + period = get_time_frame_size(source_lpc, (i)); + offset = abs_offset + (rel_offset * period); + source_lpc.t(i) = source_lpc.t(i) + offset; + } + } +} + +// jointimes specifies centre of last pitch period in each +// concatenated unit +// void us_linear_smooth_amplitude( EST_Wave *w, +// const EST_Track &pm, +// const EST_FVector &jointimes) +// { +// int num_joins = jointimes.length(); + +// EST_Track *factor_contour = new EST_Track( num_joins ); + +// for( int i=0; iresize( pp_length, 1 ); + + for( int i=0; ia_no_check(i,0) = 0.0, j=0; ja_no_check( i, 0 ) += pow( float(frame.a_no_check( j )), float(2.0) ); + + contour->a_no_check(i,0) = sqrt( contour->a_no_check(i,0) / (float)j ); + contour->t(i) = pm.t(i); + } + + return contour; +} + +EST_Val ffeature(EST_Item *item,const EST_String &fname); + +void us_linear_smooth_amplitude( EST_Utterance *utt ) +{ + EST_WaveVector *pp = wavevector(utt->relation("SourceCoef")->first()->f("frame")); + EST_Track *pm = track(utt->relation("SourceCoef")->first()->f("coefs")); + + EST_Track *energy = us_pitch_period_energy_contour( *pp, *pm ); + energy->save( "./energy_track.est", "est" ); + + FILE *ofile = fopen( "./join_times.est", "w" ); + EST_Relation *units = utt->relation("Unit"); + for( EST_Item *u=units->head(); u; u=u->next() ){ + + EST_Item *diphone_left = u; + // EST_Item *diphone_right = u->next(); + + fprintf( ofile, "%s\t%f\n", diphone_left->S("name").str(), diphone_left->F("end")); + + EST_Item *join_phone_left = item(diphone_left->f("ph1"))->next(); + EST_String phone_name = join_phone_left->S("name"); + if( ph_is_sonorant( phone_name ) && !ph_is_silence( phone_name )){ + + //if( (ffeature(join_phone_left, "ph_vc")).S() == "+"){ // ideally for sonorants + + cerr << "smoothing phone " << join_phone_left->S("name") << "\n"; + + // EST_Item *join_phone_right = item(diphone_right->f("ph1")); + + int left_end_index = energy->index(diphone_left->F("end")); + int right_start_index = left_end_index + 1; + float left_power = energy->a(left_end_index,0); + float right_power = energy->a(right_start_index,0); + + float mean_power = (left_power+right_power)/2.0; + float left_factor = left_power/mean_power; + float right_factor = right_power/mean_power; + + int smooth_start_index = left_end_index-5; + int smooth_end_index = right_start_index+5; + + + // rescale left pitch periods + float factor = 1.0; + float factor_incr = (left_factor-1.0)/(float)(left_end_index - smooth_start_index); + for( int i=smooth_start_index; i<=left_end_index; ++i, factor+=factor_incr ){ + (*pp)[i].rescale( factor, 0 ); + cerr << "rescaled frame " << i << "(factor " << factor << ")\n"; + } + + // rescale right pitch periods + factor = right_factor; + factor_incr = (1.0-right_factor)/(float)(smooth_end_index-right_start_index); + for( int i=right_start_index; i<=smooth_end_index; ++i, factor+=factor_incr){ + (*pp)[i].rescale( factor, 0 ); + cerr << "rescaled frame " << i << "(factor " << factor << ")\n"; + } + } + else + cerr << "no smoothing for " << join_phone_left->S("name") << "\n"; + + cerr <params; +} + +LISP us_diphone_init(LISP args) +{ + EST_String x; + USDiphIndex *d_index = new USDiphIndex; + d_index->grouped = false; + d_index->params = args; + d_index->name = get_param_str("name",args,"name"); + d_index->index_file = get_param_str("index_file",args,""); + + read_diphone_index(d_index->index_file, *d_index); + + // This is needed because there is no get_param_EST_String function + x = get_param_str("grouped",args,""); + if (x == "true") + { + d_index->grouped = true; + if (d_index->ts.open(d_index->index_file) != 0) + { + cerr << "US DB: can't open grouped diphone file " + << d_index->index_file << endl; + festival_error(); + } + // set up the character constant values for this stream + d_index->ts.set_SingleCharSymbols(";"); + } + else + { + *cdebug << ":" << get_param_str("grouped",args,"") << ":" << endl; + *cdebug << "index grouped:" << d_index->grouped << endl; + *cdebug << "true:" << true << endl; + *cdebug << "false:" << false << endl; + + d_index->coef_dir = get_param_str("coef_dir",args,""); + d_index->sig_dir = get_param_str("sig_dir",args,""); + + d_index->coef_ext = get_param_str("coef_ext",args,""); + d_index->sig_ext = get_param_str("sig_ext",args,""); + } + + us_add_diphonedb(d_index); + + return rintern(d_index->name); +} + +LISP FT_us_full_cut(LISP lutt, LISP lrname) +{ + EST_Utterance *utt = get_c_utt(lutt); + EST_String rname = get_c_string(lrname); + + us_full_cut(*utt->relation(rname)); + +// parse_diphone_times(*(utt->relation(rname)), +// *(utt->relation("SourceSegments"))); + + return lutt; +} + +void festival_UniSyn_diphone_init(void) +{ + proclaim_module("UniSyn_diphone"); + + init_subr_0("us_list_dbs", us_list_dbs, + "(us_list_dbs)\n\ + List names of UniSyn databases currently loaded."); + + init_subr_0("us_db_params", us_db_params, + "(us_db_params)\n\ + Return parameters of current UniSyn database."); + + init_subr_1("us_db_select", us_select_db, + "(us_db_select NAME)\n\ + Select named UniSyn database."); + + init_subr_1("us_get_diphones", FT_us_get_diphones, + "(us_get_synthesis UTT)\n\ + Construct a unit stream in UTT comprising suitable diphones. The unit \n\ + stream produced is suitable for immediate use in us_ps_synthesis."); + + init_subr_2("us_make_group_file", us_make_group_file, + "(us_make_group_file FILENAME PARAMS)\n\ + Make a group file from the currently specified diphone set. PARAMS \n\ + is an optional assoc list and allows specification of the \n\ + track_file_format (default est_binary), sig_file_format (default \n\ + snd) and sig_sample_format (default mulaw). This is recommended \n\ + for LPC databases. For PSOLA based databases the sig_sample_format \n\ + should probably be set to short."); + + init_subr_2("us_full_cut", FT_us_full_cut, + "(us_ps_synthesis UTT SIGPR)\n\ + Synthesize utterance UTT using signal processing technique SIGPR \n\ + for the UniSyn pitch-synchronous synthesizer."); + + init_subr_1("us_diphone_init", us_diphone_init, + "(us_diphone_init DIPHONE_NAME)\n\ + Initialise UniSyn diphone synthesizer with database DIPHONE_NAME."); + + init_subr_1("diphone_present", us_check_diphone_presence, + "(diphone_present? STR)\n\ + Checks whether the given STRing corresponds to any diphone in the\n\ + current database."); + +} diff --git a/src/modules/UniSyn_diphone/us_diphone.h b/src/modules/UniSyn_diphone/us_diphone.h new file mode 100644 index 0000000..4acc0f0 --- /dev/null +++ b/src/modules/UniSyn_diphone/us_diphone.h @@ -0,0 +1,97 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Paul Taylor */ +/* Date: 6 Jan 1998 */ +/* --------------------------------------------------------------------- */ +/* LPC residual synthesis alternative version */ +/* */ +/*************************************************************************/ + +#ifndef __US_DIPHONE_H__ +#define __US_DIPHONE_H__ + +#include "EST_FileType.h" +#include "EST_TVector.h" +#include "EST_THash.h" +#include "EST_Token.h" +#include "ling_class/EST_Relation.h" + +SIOD_REGISTER_CLASS_DCLS(us_db,USDiphIndex) +VAL_REGISTER_CLASS_DCLS(us_db,USDiphIndex) + +class USDiphIndex { +public: + USDiphIndex(); + ~USDiphIndex(); + + EST_String name; + EST_String index_file; + EST_String group_file; + EST_String track_file_format; + EST_String sig_file_format; + bool grouped; + int index_offset; + EST_TokenStream ts; // for grouped diphones + + EST_String coef_dir; + EST_String sig_dir; + + EST_String coef_ext; + EST_String sig_ext; + LISP params; + + EST_TVector diphone; + EST_TStringHash dihash; +}; + +int read_diphone_database(const EST_String &filename, USDiphIndex &index); +void check_us_db(); + +void load_separate_diphone(int unit, bool keep_full, + const EST_String &cut_type="all"); + +int find_diphone_index(const EST_Item &d); + +void parse_diphone_times(EST_Relation &diphone_stream, + EST_Relation &source_lab); + +void us_get_diphones(EST_Utterance &utt); + +void us_full_cut(EST_Relation &unit); + +LISP us_check_diphone_presence(LISP name); + + +#endif // __US_DIPHONE_H__ + diff --git a/src/modules/UniSyn_diphone/us_diphone_index.cc b/src/modules/UniSyn_diphone/us_diphone_index.cc new file mode 100644 index 0000000..99cb68f --- /dev/null +++ b/src/modules/UniSyn_diphone/us_diphone_index.cc @@ -0,0 +1,605 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Paul Taylor */ +/* Date: 1998, 1999 */ +/* --------------------------------------------------------------------- */ +/* LPC residual synthesis alternative version */ +/* */ +/*************************************************************************/ + +#include "siod.h" +#include "EST.h" +#include "us_diphone.h" +#include "Phone.h" + +static bool US_full_coefs = false; +USDiphIndex *diph_index = 0; +extern LISP us_dbs; + +static void us_get_all_diphones(EST_Relation &diphone); +static void load_grouped_diphone(int unit); +void get_diphone(EST_Item &d); +static int find_diphone_index_simple(const EST_String &d,USDiphIndex &di); +void us_check_db(); +void us_add_diphonedb(USDiphIndex *db); +void load_full_diphone(int unit); + +VAL_REGISTER_CLASS(us_db,USDiphIndex) +SIOD_REGISTER_CLASS(us_db,USDiphIndex) + +USDiphIndex::USDiphIndex() : dihash(1500) +{ + gc_protect(¶ms); +} + +USDiphIndex::~USDiphIndex() +{ + gc_unprotect(¶ms); + +} + +void us_check_db() +{ + if (diph_index == 0) + EST_error("US DB: no diphone database loaded\n"); + diph_index->ts.restart(); +} + +void awb_free_diph_index() +{ + /* chaising memory leaks */ + if (diph_index != 0) + { + delete diph_index; + diph_index = 0; + } +} + +/*static EST_String get_diphone_name(EST_Item *item,const EST_String dir) +{ + // Get diphone name which may differ from phone name + // Looks for us_diphone_, us_diphone, or name in that order + EST_String d1; + static EST_String dname = "us_diphone"; + + if (!item) + return ""; + else if ((d1 = item->S(dname+"_"+dir)) != "0") + return d1; + else if ((d1 = item->S(dname)) != "0") + return d1; + else + return item->S("name"); +} +*/ + +static EST_String get_diphone_name(EST_Item *item,const EST_String dir) +{ + // Get diphone name which may differ from phone name + // Looks for us_diphone_, us_diphone, or name in that order + EST_String d1; + static EST_String dname = "us_diphone"; + static EST_String def = "0"; + + if (!item) + return ""; + else if ((d1 = (EST_String)item->f(dname+"_"+dir,def)) != "0") + return d1; + else if ((d1 = (EST_String)item->f(dname,def)) != "0") + return d1; + else + return item->f("name","0").string(); +} + + +void us_get_diphones(EST_Utterance &utt) +{ + // Create unit stream with coefficients and signals for each + // diphone + EST_Item *p, *d; + EST_String name1, name2, file; + + us_check_db(); + + if (!utt.relation_present("Unit")) + utt.create_relation("Unit"); + + US_full_coefs = (siod_get_lval("us_full_coefs", NULL) == NIL) + ? false : true; + + p = utt.relation("Segment")->head(); + name1 = get_diphone_name(p,"left"); // left part of first diphone + + utt.relation("Unit")->f.set("grouped", ((diph_index->grouped) ? 1 : 0)); + + if (!diph_index->grouped) + { + utt.relation("Unit")->f.set("coef_dir", diph_index->coef_dir); + utt.relation("Unit")->f.set("sig_dir", diph_index->sig_dir); + utt.relation("Unit")->f.set("coef_ext", diph_index->coef_ext); + utt.relation("Unit")->f.set("sig_ext", diph_index->sig_ext); + } + + for (p = p->next(); p; p = p->next()) + { + d = utt.relation("Unit")->append(); + name2 = get_diphone_name(p,"right"); + d->set("name", (name1 + "-" + name2)); + get_diphone(*d); + name1 = get_diphone_name(p,"left"); + } + +// utt.create_relation("SourceSegments"); +// for (p = utt.relation("Segment", 1)->head(); p; p = p->next()) +// { +// d = utt.relation("SourceSegments")->append(); +// d->set_name(p->name()); +// } + + if (!US_full_coefs) + parse_diphone_times(*(utt.relation("Unit", 1)), + *(utt.relation("Segment", 1))); +} + +LISP us_check_diphone_presence(LISP name) +{ + /* requested by "Nicholas Volk" */ + int x = find_diphone_index_simple(get_c_string(name),*diph_index); + if ( x < 0 ) + return NIL; + else + return name; +} + +LISP us_make_group_file(LISP lname, LISP params) +{ + EST_String group_file, index_file; + EST_String track_file_format, sig_file_format, sig_sample_format; + EST_Relation diphone; + EST_TokenStream ts; + EST_Item *d; + EST_Wave *sig; + EST_Track *tr; + FILE *fp, *fp_group; + const int block_size = 1024; + int pos; + + us_check_db(); + + track_file_format = get_param_str("track_file_format",params,"est_binary"); + sig_file_format = get_param_str("sig_file_format",params,"snd"); + sig_sample_format = get_param_str("sig_sample_format",params,"mulaw"); + + group_file = make_tmp_filename(); + group_file += ".group"; + index_file = get_c_string(lname); + us_get_all_diphones(diphone); + + if ((fp = fopen(group_file, "wb")) == NULL) + EST_error("US DB: failed to open group file as temporary file\n"); + + for (d = diphone.head(); d; d = d->next()) + { + sig = wave(d->f("sig")); + tr = track(d->f("coefs")); + + pos = ftell(fp); + d->set("track_start", pos); + tr->save(fp, track_file_format); + + pos = ftell(fp); + d->set("wave_start", pos); + sig->save_file(fp, sig_file_format, sig_sample_format, EST_NATIVE_BO); + } + fclose(fp); + + if ((fp = fopen(index_file, "wb")) == NULL) + EST_error("US DB: failed to open group file \"%s\" for writing\n", + (const char *) index_file); + + fprintf(fp, "EST_File index\n"); + fprintf(fp, "DataType ascii\n"); + fprintf(fp, "NumEntries %d\n", diphone.length()); + fprintf(fp, "IndexName %s\n", (const char *)diph_index->name); + fprintf(fp, "DataFormat grouped\n"); + fprintf(fp, "Version 2\n"); + fprintf(fp, "track_file_format %s\n",(const char *)track_file_format); + fprintf(fp, "sig_file_format %s\n",(const char *)sig_file_format); + fprintf(fp, "EST_Header_End\n"); + + for (d = diphone.head(); d; d = d->next()) + fprintf(fp, "%s %d %d %d\n", (const char *)d->S("name"), + d->I("track_start"), d->I("wave_start"), + d->I("middle_frame")); + + // Copy binary data from temporary group file to end of + // real group file + char buf[block_size]; + int r; + + if ((fp_group = fopen(group_file, "rb")) == NULL) + { + fprintf(stderr,"Unexpected lost temporary group file from \"%s\"\n", + (const char *)group_file); + return NIL; + } + + while ((r = fread(buf, sizeof(char), block_size, fp_group)) != 0) + fwrite(buf, sizeof(char), r, fp); + + fclose(fp); + fclose(fp_group); + unlink(group_file); + + return NIL; +} + +static void us_get_all_diphones(EST_Relation &diphone) +{ + EST_Item *d; + EST_String name1; + + for (int i = 0; i < diph_index->diphone.n(); ++i) + { + d = diphone.append(); + d->set("name", diph_index->diphone[i].S("name")); + get_diphone(*d); + } +} + +int read_diphone_index(const EST_String &filename, + USDiphIndex &di) +{ + EST_TokenStream ts; + int i, ref; + int num_entries; + EST_Option hinfo; + EST_EstFileType t; + EST_String v, pointer, n; + bool ascii; + EST_read_status r; + EST_Item d; + + di.index_offset = 0; + + if (ts.open(filename) != 0) + { + cerr << "US DB: can't open index file " << filename << endl; + return misc_read_error; + } + // set up the character constant values for this stream + ts.set_SingleCharSymbols(";"); + + if ((r = read_est_header(ts, hinfo, ascii, t)) != format_ok) + return r; + if (t != est_file_index) + return misc_read_error; + + num_entries = hinfo.ival("NumEntries"); + di.grouped = (hinfo.val("DataFormat") == "grouped") ? true : false; + di.diphone.resize(num_entries); + + if (di.grouped) + { + di.track_file_format = hinfo.val_def("track_file_format","est"); + di.sig_file_format = hinfo.val_def("sig_file_format","est"); + for (i = 0; i < num_entries; ++i) + { + di.diphone[i].set("name", ts.get().string()); + di.diphone[i].set("count", 0); + di.diphone[i].set("track_start", atoi(ts.get().string())); + di.diphone[i].set("wave_start", atoi(ts.get().string())); + di.diphone[i].set("middle_frame", atoi(ts.get().string())); + di.dihash.add_item(di.diphone[i].f("name"),i); + } + di.index_offset = ts.tell(); + // cout << "index offset = " << di.index_offset << endl; + } + else + { + int n_num_entries = num_entries; + for (i = 0; i < n_num_entries; ++i) + { + if (ts.eof()) + EST_error("US DB: unexpected EOF in \"%s\": %d entries " + "instead of %s\n", (const char *)filename, + i, num_entries); + + n = ts.get().string(); + di.diphone[i].set("name", n); + di.diphone[i].set("filename", ts.get().string()); + di.diphone[i].set("count", 0); + if (!di.diphone[i].S("filename").contains("&", 0)) + { + di.diphone[i].set("start", atof(ts.get().string())); + di.diphone[i].set("middle", atof(ts.get().string())); + di.diphone[i].set("end", ts.get().string()); + if ((di.diphone[i].F("start")>=di.diphone[i].F("middle"))|| + (di.diphone[i].F("middle") >= di.diphone[i].F("end"))) + { + cerr << "US DB: diphone index for " << n << + " start middle end not in order, ignored " << endl; + i--; + n_num_entries--; + } + } + di.dihash.add_item(n,i); + } + + // now copy reference entries + for (i = 0; i < n_num_entries; ++i) + if (di.diphone[i].S("filename").contains("&", 0)) + { + pointer = di.diphone[i].S("filename").after("&", 0); +// cout << "pointer: = " << pointer << endl; + if ((ref = find_diphone_index_simple(pointer,di)) == -1) + { + cerr << "US DB: Illegal diphone pointer in index file: " + << i << " " + << di.diphone[i].S("name") << " -> " << + di.diphone[i].S("filename") << endl; + EST_error(""); + } + di.diphone[i].set("filename", + di.diphone[ref].S("filename")); + di.diphone[i].set("start", di.diphone[ref].S("start")); + di.diphone[i].set("middle", di.diphone[ref].S("middle")); + di.diphone[i].set("end", di.diphone[ref].S("end")); + // accurate values of these depend on where the + // pitchmarks are placed. + // di.diphone[i].fset("first_dur",di.diphone[ref].fS("first_dur")); + // di.diphone[i].fset("second_dur", + //di.diphone[ref].fS("second_dur")); + } + } + + return format_ok; +} + +void get_diphone(EST_Item &d) +{ + int unit; + + unit = find_diphone_index(d); + + if (diph_index->diphone[unit].f("count") == 0) + { + if (diph_index->grouped) + load_grouped_diphone(unit); + else + { + if (US_full_coefs) + load_full_diphone(unit); + else + load_separate_diphone(unit, false, "all"); + } + diph_index->diphone[unit].set("count", d.I("count", 0) + 1); + } + + if (US_full_coefs) + { + d.set_val("full_sig", diph_index->diphone[unit].f("full_sig")); + d.set_val("full_coefs", diph_index->diphone[unit].f("full_coefs")); + } + else + { + d.set_val("sig", diph_index->diphone[unit].f("sig")); + d.set_val("coefs", diph_index->diphone[unit].f("coefs")); + d.set_val("middle_frame", + diph_index->diphone[unit].f("middle_frame")); + } + + if (!diph_index->grouped) + { + d.set_val("filename", diph_index->diphone[unit].f("filename")); + d.set_val("diphone_start", diph_index->diphone[unit].F("start")); + d.set_val("diphone_middle", diph_index->diphone[unit].F("middle")); + d.set_val("diphone_end", diph_index->diphone[unit].F("end")); + } +} + +static void load_grouped_diphone(int unit) +{ + int middle_frame; + EST_Track *coefs; + EST_Wave *sig; + int wave_start, track_start; + + coefs = new EST_Track; + sig = new EST_Wave; + + track_start = diph_index->diphone[unit].I("track_start"); + wave_start = diph_index->diphone[unit].I("wave_start"); + middle_frame = diph_index->diphone[unit].I("middle_frame"); + + diph_index->ts.seek(track_start + diph_index->index_offset); + coefs->load(diph_index->ts); // type is self determined at present + + diph_index->ts.seek(wave_start + diph_index->index_offset); + sig->load(diph_index->ts,diph_index->sig_file_format); + + diph_index->diphone[unit].set_val("coefs", est_val(coefs)); + diph_index->diphone[unit].set("middle_frame", middle_frame); + + diph_index->diphone[unit].set_val("sig", est_val(sig)); +} + + +static int find_diphone_index_simple(const EST_String &d,USDiphIndex &di) +{ + int found,r; + + r = di.dihash.val(d,found); + if (found) + return r; + else + return -1; +} + +int find_diphone_index(const EST_Item &d) +{ + // Find the approrpiate entry in the diphone index table + // mapping to alternates if required. + int index; + EST_String diname = d.f("name"); + + // If all goes well the diphone will be found directly in the index + index=find_diphone_index_simple(diname,*diph_index); + if ((index=find_diphone_index_simple(diname,*diph_index)) != -1) + return index; + + // But for various reasons it might not be there so allow + // a run time specification of alternates. This isn't optimal + // but is better than falling immediately back on just a default + // diphone + LISP alt_left = get_param_lisp("alternates_left",diph_index->params,NIL); + LISP alt_right = get_param_lisp("alternates_right",diph_index->params,NIL); + EST_String di_left = diname.before("-"); + EST_String di_right = diname.after("-"); + EST_String di_left_alt = get_param_str(di_left,alt_left,di_left); + EST_String di_right_alt = get_param_str(di_right,alt_right,di_right); + EST_String di_alt = di_left_alt+"-"+di_right_alt; + + if ((index=find_diphone_index_simple(di_alt,*diph_index)) != -1) + { +// cout << "UniSyn: using alternate diphone " << di_alt << " for " << +// diname << endl; + return index; + } + + // It really isn't there so return the default one and print and error + // now + EST_String default_diphone = + get_param_str("default_diphone",diph_index->params,""); + + if (default_diphone != "") + { + index = find_diphone_index_simple(default_diphone,*diph_index); + if (index == -1) + { + cerr << "US DB: can't find diphone " << d.f("name") + << " and even default diphone (" << default_diphone + << ") doesn't exist" << endl; + EST_error(""); + } + else + cerr << "UniSyn: using default diphone " << default_diphone << + " for " << diname << endl; + return index; + } + else + { + cerr << "US DB: can't find diphone " << d.f("name") << + " nor alternatives" << endl; + EST_error(""); + } + return -1; +} + +void us_full_cut(EST_Relation &unit) +{ + EST_Track *full_coefs, *sub_coefs; + EST_Wave *full_sig, sub_sig; + EST_Item *s; + int pm_start, pm_end, pm_middle; + int samp_start, samp_end; + float start_time; + + for (s = unit.head(); s; s = s->next()) + { + sub_coefs = new EST_Track; + + full_coefs = track(s->f("full_coefs")); + full_sig = wave(s->f("full_sig")); + + pm_start = full_coefs->index(s->F("diphone_start")); + pm_middle = full_coefs->index(s->F("diphone_middle")); + pm_end = full_coefs->index(s->F("diphone_end")); + + full_coefs->copy_sub_track(*sub_coefs, pm_start, + pm_end - pm_start + 1); + + start_time = full_coefs->t(Gof((pm_start - 1), 0)); + + for (int j = 0; j < sub_coefs->num_frames(); ++j) + sub_coefs->t(j) = sub_coefs->t(j) - start_time; + + + s->set("middle_frame", pm_middle - pm_start -1); + s->set_val("coefs", est_val(sub_coefs)); + + // go to the periods before and after + samp_start = (int)(full_coefs->t(Gof((pm_start - 1), 0)) + * (float)full_sig->sample_rate()); + if (pm_end+1 < full_coefs->num_frames()) + pm_end++; + samp_end = (int)(full_coefs->t(pm_end) + * (float)full_sig->sample_rate()); + + + full_sig->sub_wave(sub_sig, samp_start, samp_end - samp_start + 1); + EST_Wave *sig = new EST_Wave(sub_sig); + + s->set_val("sig", est_val(sig)); + + } +} + + +void us_add_diphonedb(USDiphIndex *db) +{ + // Add this to list of loaded diphone dbs and select it + LISP lpair; + + if (us_dbs == NIL) + gc_protect(&us_dbs); + + lpair = siod_assoc_str(db->name,us_dbs); + + if (lpair == NIL) + { // new diphone db of this name + us_dbs = cons(cons(rintern(db->name), + cons(siod(db),NIL)), + us_dbs); + } + else + { // already one of this name + cerr << "US_db: warning redefining diphone database " + << db->name << endl; + setcar(cdr(lpair),siod(db)); + } + + diph_index = db; +} + diff --git a/src/modules/UniSyn_diphone/us_diphone_unit.cc b/src/modules/UniSyn_diphone/us_diphone_unit.cc new file mode 100644 index 0000000..b8210de --- /dev/null +++ b/src/modules/UniSyn_diphone/us_diphone_unit.cc @@ -0,0 +1,278 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Author: Paul Taylor */ +/* Date: 1998, 1999 */ +/* --------------------------------------------------------------------- */ +/* LPC residual synthesis alternative version */ +/* */ +/*************************************************************************/ + +#include "siod.h" +#include "EST.h" +#include "us_diphone.h" +#include "Phone.h" + +extern USDiphIndex *diph_index; + +void dur_to_end(EST_Relation &r) +{ + float prev_end = 0; + + for (EST_Item *p = r.head(); p ; p = p->next()) + { + p->set("end", p->F("dur") + prev_end); + prev_end = p->F("end"); + } +} + +void add_end_silences(EST_Relation &segment, EST_Relation &target) +{ + EST_Item *t, *n; + float shift = 0.0; + const float pause_duration = 0.1; + + t = segment.head(); + if (!ph_is_silence(t->f("name"))) + { + n = t->insert_before(); + n->set("name", ph_silence()); + n->set("dur", pause_duration); + shift += pause_duration; + } + + t = segment.tail(); + if (!ph_is_silence(t->S("name"))) + { + n = t->insert_after(); + n->set("name", ph_silence()); + n->set("dur", pause_duration); + shift += pause_duration; + } + dur_to_end(segment); + + target.tail()->set("pos", (target.tail()->F("pos") + shift)); +} + +void add_end_silences(EST_Relation &segment) +{ + EST_Item *t, *n; + + t = segment.head(); + if (!ph_is_silence(t->S("name"))) + { + n = t->insert_before(); + n->set("name", ph_silence()); + } + + t = segment.tail(); + if (!ph_is_silence(t->S("name"))) + { + n = t->insert_after(); + n->set("name", ph_silence()); + } +} + +void parse_diphone_times(EST_Relation &diphone_stream, + EST_Relation &source_lab) +{ + EST_Item *s, *u; + EST_Track *pm; + int e_frame, m_frame = 0; + float dur_1 = 0.0, dur_2 = 0.0, p_time; + float t_time = 0.0, end; + p_time = 0.0; + + for (s = source_lab.head(), u = diphone_stream.head(); u; u = u->next(), + s = s->next()) + { + pm = track(u->f("coefs")); + + e_frame = pm->num_frames() - 1; + m_frame = u->I("middle_frame"); + + if (m_frame < 0) m_frame=0; + dur_1 = pm->t(m_frame); + if (e_frame < m_frame) e_frame=m_frame; + dur_2 = pm->t(e_frame) - dur_1; + + s->set("source_end", (dur_1 + p_time)); + + p_time = s->F("source_end") + dur_2; + + end = dur_1 + dur_2 + t_time; + t_time = end; + u->set("end", t_time); + } + if (s) + s->set("source_end", (dur_2 + p_time)); +} + +void load_separate_diphone(int unit, bool keep_full, + const EST_String &cut_type) +{ + // Load in the coefficients and signame for this diphone + // It caches the results in the diphone index entry, though + // someone else may clear them. Note the full file is loaded + // each time which isn't optimal if there are multiple diphones + // is the same file + int samp_start, samp_end; + int pm_start, pm_end, pm_middle; + EST_Track full_coefs, dcoefs, *coefs; +// float q_start, q_middle, q_end; + + if (full_coefs.load(diph_index->coef_dir + "/" + + diph_index->diphone[unit].S("filename") + + diph_index->coef_ext) != format_ok) + { + cerr << "US DB: failed to read coefs file from " << + diph_index->coef_dir + "/" + + diph_index->diphone[unit].S("filename") + + diph_index->coef_ext << endl; + EST_error(""); + } + + pm_start = full_coefs.index(diph_index->diphone[unit].f("start")); + pm_middle = full_coefs.index(diph_index->diphone[unit].f("middle")); + pm_end = full_coefs.index(diph_index->diphone[unit].f("end")); + + // option for taking half a diphone only + if (cut_type == "first_half") + pm_end = pm_middle; + else if (cut_type == "second_half") + pm_start = pm_middle; + + // find time of mid-point, i.e. boundary between phones + full_coefs.sub_track(dcoefs, pm_start, pm_end - pm_start + 1, 0, EST_ALL); + // Copy coefficients so the full coeffs can be safely deleted + coefs = new EST_Track(dcoefs); + for (int j = 0; j < dcoefs.num_frames(); ++j) + coefs->t(j) = dcoefs.t(j) - full_coefs.t(Gof((pm_start - 1), 0)); + + diph_index->diphone[unit].set("first_dur", + full_coefs.t(pm_middle) - + full_coefs.t(pm_start)); + + diph_index->diphone[unit].set("second_dur", + full_coefs.t(pm_end) - + full_coefs.t(pm_middle)); + + if (keep_full) + { + EST_Track *f = new EST_Track; + *f = full_coefs; + diph_index->diphone[unit].set_val("full_coefs",est_val(f)); + } + + diph_index->diphone[unit].set_val("coefs", est_val(coefs)); + diph_index->diphone[unit].set("middle_frame", pm_middle - pm_start -1); + + EST_Wave full_sig, sub_sig; + + if (diph_index->sig_dir == "none") + return; + + if (full_sig.load(diph_index->sig_dir + "/" + + diph_index->diphone[unit].f("filename") + + diph_index->sig_ext) != format_ok) + { + cerr << "US DB: failed to read signal file from " << + diph_index->sig_dir + "/" + + diph_index->diphone[unit].f("filename") + + diph_index->sig_ext << endl; + EST_error(""); + } + + // go to the periods before and after + samp_start = (int)(full_coefs.t(Gof((pm_start - 1), 0)) + * (float)full_sig.sample_rate()); + if (pm_end+1 < full_coefs.num_frames()) + pm_end++; + + samp_end = (int)(full_coefs.t(pm_end) * (float)full_sig.sample_rate()); + full_sig.sub_wave(sub_sig, samp_start, samp_end - samp_start + 1); + EST_Wave *sig = new EST_Wave(sub_sig); + + diph_index->diphone[unit].set_val("sig", est_val(sig)); + + if (keep_full) + { + EST_Wave *s = new EST_Wave; + *s = full_sig; + diph_index->diphone[unit].set_val("full_sig", est_val(s)); + } +} + +void load_full_diphone(int unit) +{ + // Load in the coefficients and signame for this diphone + // It caches the results in the diphone index entry, though + // someone else may clear them. Note the full file is loaded + // each time which isn't optimal if there are multiple diphones + // is the same file + int pm_start, pm_end, pm_middle; + EST_Track *full_coefs; + + full_coefs = new EST_Track; + + if (full_coefs->load(diph_index->coef_dir + "/" + + diph_index->diphone[unit].f("filename") + + diph_index->coef_ext) != format_ok) + { + cerr << "US DB: failed to read coefs file from " << + diph_index->coef_dir + "/" + + diph_index->diphone[unit].f("filename") + + diph_index->coef_ext << endl; + EST_error(""); + } + + pm_start = full_coefs->index(diph_index->diphone[unit].f("start")); + pm_middle = full_coefs->index(diph_index->diphone[unit].f("middle")); + pm_end = full_coefs->index(diph_index->diphone[unit].f("end")); + + diph_index->diphone[unit].set_val("full_coefs", est_val(full_coefs)); + + EST_Wave *full_sig = new EST_Wave; + + if (full_sig->load(diph_index->sig_dir + "/" + + diph_index->diphone[unit].f("filename") + + diph_index->sig_ext) != format_ok) + { + cerr << "US DB: failed to read signal file from " << + diph_index->sig_dir + "/" + + diph_index->diphone[unit].f("filename") + + diph_index->sig_ext << endl; + EST_error(""); + } + diph_index->diphone[unit].set_val("full_sig", est_val(full_sig)); +} diff --git a/src/modules/UniSyn_phonology/Makefile b/src/modules/UniSyn_phonology/Makefile new file mode 100644 index 0000000..738df94 --- /dev/null +++ b/src/modules/UniSyn_phonology/Makefile @@ -0,0 +1,56 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## Metrical Trees ## +########################################################################### + +TOP=../../.. +DIRNAME=src/modules/UniSyn_phonology +H = us_duration.h +TSRCS = us_aux.cc +CPPSRCS = UniSyn_phonology.cc mettree.cc syllabify.cc subword.cc \ + UniSyn_build.cc us_duration.cc unisyn_tilt.cc $(TSRCS) +SRCS = $(CPPSRCS) + +OBJS = $(CPPSRCS:.cc=.o) + +FILES=Makefile $(SRCS) $(H) unisyn_phonology.mak + +LOCAL_INCLUDES = -I../include + +INLIB = $(TOP)/src/lib/libFestival.a + +ALL = .buildlib + +include $(TOP)/config/common_make_rules + + diff --git a/src/modules/UniSyn_phonology/UniSyn_build.cc b/src/modules/UniSyn_phonology/UniSyn_build.cc new file mode 100644 index 0000000..4beb939 --- /dev/null +++ b/src/modules/UniSyn_phonology/UniSyn_build.cc @@ -0,0 +1,987 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Paul Taylor */ +/* Date : June 1998 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Metrical Tree based Phonology system */ +/* */ +/*=======================================================================*/ + +#include +#include +#include +#include +#include "festival.h" + +#include "../UniSyn/us_features.h" + +void merge_features(EST_Item *from, EST_Item *to, int keep_id); +void insert_schwa(EST_Item *n); + +extern EST_Features phone_def; +void subword_metrical_tree(EST_Item *w, EST_Relation &syllable, + EST_Relation &metricaltree); + +void lex_to_phones(const EST_String &name, const EST_String &pos, + EST_Relation &phone); +void trans_to_phones(EST_Item *w, EST_Relation &trans, + EST_Relation &phone); + +void fix_syllables(EST_Item *nw, EST_Utterance &word); + +typedef +float (*local_cost_function)(const EST_Item *item1, + const EST_Item *item2); + +void add_metrical_functions(EST_Utterance &utt); +bool dp_match(const EST_Relation &lexical, + const EST_Relation &surface, + EST_Relation &match, + local_cost_function lcf, + EST_Item *null_syl); +float local_cost(const EST_Item *s1, const EST_Item *s2); + +void add_times(EST_Relation &lexical, EST_Relation &surface, + EST_Relation &match); + +void add_initial_silence(EST_Relation &lexical, EST_Relation &surface, + EST_Relation &match); + +void subword_metrical_tree(EST_Relation &syllable, + EST_Relation &metricaltree); + +void add_metrical_functions(EST_Utterance &utt); + +int syllabify_word(EST_Item *nw, EST_Relation &phone, + EST_Relation &sylstructure, EST_Relation &syl, int flat); + +void add_even_segment_times(EST_Item *w, EST_Relation &phone); +void lex_to_phones(EST_Utterance &u, const EST_String &relname); +void phonemic_trans(EST_Relation &trans); + +void add_single_phrase(EST_Utterance &utt, EST_Item *t); + +LISP FT_add_trans_metrical_tree(LISP l_utt, LISP lf_input, LISP lf_output); +#if 0 +static void add_trans_phrase(EST_Utterance &utt, const EST_String &i_name, + const EST_String &s_name); +#endif + +float local_cost(const EST_Item *s1, const EST_Item *s2) +{ + float insertion_cost = get_c_int(siod_get_lval("met_insertion", NULL)); + float deletion_cost = get_c_int(siod_get_lval("met_deletion", NULL)); + float substitution_cost = + get_c_int(siod_get_lval("met_substitution", NULL)); + + EST_String null_sym = "nil"; + + // otherwise cost is either insertion cost, or cost_matrix value + if (s1->name() == s2->name()) + return 0; + else + { + if (s1->name() == null_sym) + return insertion_cost; + else if (s2->name() == null_sym) + return deletion_cost; + else + return substitution_cost; + } +} + +void trans_to_phones(EST_Item *w, EST_Relation &trans, EST_Relation &phone) +{ + int prev_phone; + EST_Item *t, *p; + int r; + + prev_phone = w->prev() ? prev(w)->I("phon_ref") : -1; + r = w->I("phon_ref"); + + for (t = trans.head(); t; t = t->next()) + { + if ((t->f("name") == "sil") || (t->f("name") == "pau")) + continue; + if ((t->I("ref") > prev_phone) && (t->I("ref") <= r)) + { + p = phone.append(); + p->set("name", t->S("name")); + p->set_val("end", t->f("end")); + p->set_val("start", t->f("start")); + p->set("df", phone_def.A(p->S("name"))); + } + } +} + +void add_trans_intonation(EST_Utterance &utt, const EST_String &i_name, + const EST_String &s_name, int add_words) +{ + EST_Item *s, *w, *t, *a, *b; + EST_String wref; + int s_num; + EST_String w_num; + EST_String is_name = i_name + s_name; + + utt.relation(i_name)->f.set("intonation_style", "tilt"); + + utt.create_relation(is_name); + cout << "created : " << is_name << endl; + + // optional feature to add intonation events to words rather than syllables + if (add_words) + utt.create_relation("IntonationWord"); + + for (t = utt.relation(i_name, 1)->head(); t; t = t->next()) + { + t->f_remove("end"); + if (!t->f_present("word_ref")) + add_single_phrase(utt, t); + else + { + w_num = t->S("word_ref"); + s_num = t->I("syl_num"); + + for (w = utt.relation("Word", 1)->head(); w; w = w->next()) + { + if (w->S("id") == w_num) + break; + } + if (w == 0) + { + cerr << "Error: couldn't find word ref " << endl; + cerr << "For intonation event " << *t << endl; + festival_error(); + } + if (add_words) + { + if (!w->in_relation("IntonationWord")) + b = utt.relation("IntonationWord")->append(w); + else + b = w->as_relation("IntonationWord"); + b->append_daughter(t); + } + + // cout << "matching word: " << w->name() << endl; + if ((b = w->as_relation("WordStructure")) == 0) + EST_error("Item is not in WordStructure\n"); + if ((s = nth_leaf(b, s_num)) == 0) + { + cerr << "Intonation element " << *t << + "\nis linked to syllable " << s_num << + " but word \"" << w->S("name") << "\"" + " has only " << num_leaves(b) << " syllables\n"; + } + // cout << "here is s\n"; + // cout << "matching syllable: " << *s << endl; + + if (!s->in_relation(is_name)) + a = utt.relation(is_name)->append(s); + else + a = s->as_relation(is_name); + a->append_daughter(t); + +// cout << "s1: " << s->S("id", "XX") << endl; + s = s->as_relation(s_name); + if (s == 0) + cerr << "Syllable with id " << nth_leaf(b, s_num)->S("id") << "exists " + "but is not in syllable relation. Suspect corrupted " + "lexical conversion\n"; + +// cout << "s2: " << s->S("id", "XX") << endl; + + // change to relative positions if not already specified + if (!t->f_present("rel_pos")) + t->set("rel_pos", t->F("time") - s->F("vowel_start")); + + t->set("time_path", is_name); + t->set_function("time", + "standard+unisyn_tilt_event_position"); +// cout << "end syl:" << endl; + t->f_remove("word_ref"); + t->f_remove("syl_num"); + } + } +// add_trans_phrase(utt, i_name, s_name); +} + +void syl_to_word_intonation(EST_Utterance &utt) +{ + EST_Item *s, *w, *t=0, *b; + + utt.create_relation("IntonationWord"); + + for (s = utt.relation("Syllable", 1)->head(); s; s = s->next()) + { + if (!s->in_relation("IntonationSyllable")) + continue; + + w = root(s, "WordStructure"); + + if (w == 0) + { + cerr << "Error: couldn't find word ref " << endl; + cerr << "For intonation event " << *t << endl; + festival_error(); + } + if (!w->in_relation("IntonationWord")) + b = utt.relation("IntonationWord")->append(w); + else + b = w->as_relation("IntonationWord"); + + for (t = daughter1(s->as_relation("IntonationSyllable")); t; t = t->next()) + b->append_daughter(t); + } +} + +static bool legal_daughter(EST_Item *r, const EST_String &iname, + const EST_StrList &valid) +{ + if (!r->in_relation(iname)) + return false; + if (strlist_member(valid, daughter1(r->as_relation(iname))->S("name", ""))) + return true; + return false; +} + +void intonation_diagnostics(EST_Utterance &ref, EST_Utterance &test, + const EST_String &rel, const EST_StrList &valid) +{ + EST_Item *r, *t; + EST_String iname = "Intonation" + rel; + + for (r = ref.relation(rel, 1)->head(), t = test.relation(rel, 1)->head(); r && t; + r = r->next(), t = t->next()) + { + if (legal_daughter(r, iname, valid) && legal_daughter(t, iname, valid)) + t->set("i_status", "COR"); + else if (legal_daughter(r, iname, valid) && (!legal_daughter(t, iname, valid))) + t->set("i_status", "DEL"); + else if (!legal_daughter(r, iname, valid) && legal_daughter(t, iname, valid)) + t->set("i_status", "INS"); +// else +// t->set("i_status", "0"); + } +} + +#if 0 +static void add_trans_phrase(EST_Utterance &utt, const EST_String &i_name, + const EST_String &s_name) +{ + EST_Item *s, *t, *a, *p; + float pos, max, d; + EST_String is_name = i_name + s_name; + + for (t = utt.relation(i_name, 1)->head(); t; t = t->next()) + { + if (t->in_relation(is_name)) + continue; + pos = t->F("time"); + max = 100000.0; + + cout << "here 1\n"; + + for (p = utt.relation(s_name)->head(); p; p = p->next()) + { + if (t->S("name","0") == "phrase_end") + d = fabs(pos - p->end()); + else + d = fabs(pos - p->start()); + if (d < max) + { + max = d; + s = p; + } + } + a = utt.relation(is_name)->append(t); + } +} +#endif + +LISP FT_add_trans_intonation(LISP l_utt, LISP lf_int, LISP l_add_words) +{ + EST_String int_file = get_c_string(lf_int); + EST_Utterance *u = get_c_utt(l_utt); + EST_Relation lab; + EST_Item *s, *n; + int add_words = (l_add_words == NIL) ? 0 : 1; + + u->create_relation("Intonation"); + + if (lab.load(int_file) != format_ok) + EST_error("Couldn't load file %s\n", (const char *) int_file); + + for (s = lab.head(); s; s = s->next()) + { + n = u->relation("Intonation")->append(); + merge_features(n, s, 1); + if (n->S("name") =="afb") + n->set("name", "a"); + else if (n->S("name") == "m") + { + n->set("name", "a"); + n->set("minor", 1); + } + else if ((n->S("name") == "a") || (n->S("name") == "arb") + || (n->S("name") == "rb") || (n->S("name") == "phrase_end") + || (n->S("name") == "phrase_start") + || (n->S("name") == "fb") ) // tmp check (awb) + continue; + else + EST_error("Illegal intonation name \"%s\"\n", (const char *) n->S("name")); + } + + add_trans_intonation(*u, "Intonation", "Syllable", add_words); + return l_utt; +} + +LISP FT_add_trans_word(LISP l_utt, LISP lf_word, LISP keep_times) +{ + EST_String word_file = get_c_string(lf_word); + EST_Utterance *u = get_c_utt(l_utt); + EST_Relation lab; + EST_Item *s, *n; + float p_end = 0; + + u->create_relation("Word"); + + if (lab.load(word_file) != format_ok) + EST_error("Couldn't load file %s\n", (const char *) word_file); + + for (s = lab.head(); s; s = s->next()) + { + s->set("start", p_end); + p_end = s->F("end"); + if ((s->S("name") == "pau") || (s->S("name") == "sil")) + continue; + n = u->relation("Word")->append(); + merge_features(n, s, 0); + if (keep_times == NIL) + { + n->f_remove("end"); + n->f_remove("start"); + } + } + + return l_utt; +} + +LISP FT_add_f0_points(LISP l_utt, LISP lf_f0) +{ + EST_String f0_file = get_c_string(lf_f0); + EST_Utterance *u = get_c_utt(l_utt); + EST_Track f0; + EST_Item *s; + float prev_mid, next_mid; + + if (f0.load(f0_file) != format_ok) + EST_error("Couldn't load file %s\n", (const char *) f0_file); + + for (s = u->relation("Segment")->head(); s; s = s->next()) + { + prev_mid = s->prev() ? + (prev(s)->F("end") + prev(s)->F("start"))/2.0 : 0.0; + next_mid = s->next() ? + (next(s)->F("end") + next(s)->F("start"))/2.0 : 0.0; + + s->set("prev_mid_f0", f0.a(f0.index(prev_mid))); + s->set("start_f0", f0.a(f0.index(s->F("start")))); + s->set("mid_f0", f0.a(f0.index((s->F("end") + s->F("start"))/2.0))); + s->set("end_f0", f0.a(f0.index(s->F("end")))); + s->set("next_mid_f0", f0.a(f0.index(next_mid))); + } + + return l_utt; +} + +LISP FT_add_coefs(LISP l_utt, LISP lf_coef) +{ + EST_String coef_file = get_c_string(lf_coef); + + EST_Utterance *u = get_c_utt(l_utt); + EST_Track coef; + EST_Item *s; + float prev_mid, next_mid; + EST_FVector *frame; + + cout << "loading\n"; + if (coef.load(coef_file) != format_ok) + EST_error("Couldn't load file %s\n", (const char *) coef_file); + cout << "done\n"; + + frame = new EST_FVector; + frame->fill(0.0); // special case for first frame. + + for (s = u->relation("Segment")->head(); s; s = s->next()) + { + prev_mid = s->prev() ? + (prev(s)->F("end") + prev(s)->F("start"))/2.0 : 0.0; + next_mid = s->next() ? + (next(s)->F("end") + next(s)->F("start"))/2.0 : 0.0; + + frame = new EST_FVector; + coef.copy_frame_out(coef.index((s->F("end") + s->F("start"))/2.0), + *frame); + s->set_val("mid_coef", est_val(frame)); + + frame = new EST_FVector; + coef.copy_frame_out(coef.index(s->F("end")), *frame); + s->set_val("end_coef", est_val(frame)); + + frame = new EST_FVector; + coef.copy_frame_out(coef.index(s->F("start")), *frame); + s->set_val("start_coef", est_val(frame)); + + frame = new EST_FVector; + coef.copy_frame_out(coef.index(prev_mid), *frame); + s->set_val("prev_mid_coef", est_val(frame)); + + frame = new EST_FVector; + coef.copy_frame_out(coef.index(next_mid), *frame); + s->set_val("next_mid_coef", est_val(frame)); + } + + return l_utt; + +// imid = coef.index((s->F("end") + s->F("start"))/2.0); +// iend = coef.index(s->F("end")); + +} + +LISP FT_add_xml_relation(LISP l_utt, LISP xml_file) +{ + EST_Utterance *u, tmp; + + u = get_c_utt(l_utt); + + tmp.clear(); + tmp.load(get_c_string(xml_file)); + + EST_Features::Entries p; + + for (p.begin(tmp.relations); p; ++p) + { + relation(p->v)->remove_item_feature("actuate"); + relation(p->v)->remove_item_feature("estExpansion"); + relation(p->v)->remove_item_feature("xml:link"); + relation(p->v)->remove_item_feature("href"); + relation(p->v)->remove_item_feature("show"); + } + + utterance_merge(*u, tmp, "id"); + + return l_utt; +} + +void fix_syllables(EST_Item *nw, EST_Utterance &word) +{ + EST_Item *t, *n, *s, *m; + + if (word.relation("Syllable")->length() == word.relation("SurfaceSyllable")->length()) + return; + + cout << "Word \"" << word.relation("Word")->head()->name() << "\" has " + << word.relation("Syllable")->length() << + " lexical syllables and " << + word.relation("SurfaceSyllable")->length() << + " surface syllables\n"; + + for (s = word.relation("Syllable")->head(); s; s = s->next()) + { + t = s->as_relation("SylStructure"); + n = syl_nucleus(t); + + m = daughter1(n->as_relation("Match")); + if (m == 0) + insert_schwa(n->as_relation("Segment")); + } + + word.relation("SylStructure")->clear(); + word.relation("Syllable")->clear(); + word.relation("Match")->clear(); + + syllabify_word(nw, *word.relation("Segment"), + *word.relation("SylStructure"), + *word.relation("Syllable"), 0); + +// syllabify_word(nw, *word.relation("SurfacePhone"), +// *word.relation("SurfaceSylStructure"), +// *word.relation("SurfaceSyllable")); + + EST_Item xx; + + dp_match(*word.relation("Segment"), + *word.relation("SurfacePhone"), + *word.relation("Match"), local_cost, &xx); + +} + + +/* Add segment durations from file */ +void add_trans_duration(EST_Utterance &utt, const EST_String &segfile) +{ + EST_Utterance word; + EST_Relation phone, lab; + EST_Item *s, *n; + EST_StrList plist; + float phone_start; + EST_Item xx; + + if (lab.load(segfile) != format_ok) + EST_error("Couldn't load file %s\n", (const char *) segfile); + + phone_start = 0.0; + utt.create_relation("LabelSegment"); + utt.create_relation("Match"); + + for (s = lab.head(); s; s = s->next()) + { + if (!phone_def.present(s->S("name"))) + EST_error("Phone %s is not defined in phone set\n", (const char *) + s->S("name")); + n = utt.relation("LabelSegment")->append(); + merge_features(n, s, 1); + n->set("start", phone_start); + phone_start = s->F("end"); + n->set("dur", n->F("end") - n->F("start")); + } + + dp_match(*utt.relation("Segment"), *utt.relation("LabelSegment"), + *utt.relation("Match"), local_cost, &xx); + + add_times(*utt.relation("Segment"), *utt.relation("LabelSegment"), + *utt.relation("Match")); + + for (s = utt.relation("Segment")->head(); s; s = s->next()) + { + s->set("target_dur", (s->F("end") - s->F("start"))); + s->f_remove("end"); + s->f_remove("dur"); + s->f_remove("start"); + } +} + +static void add_silences(EST_Utterance &utt,EST_Item *w) +{ + EST_Item *s; + int r; + + if (w == 0) // insert initial silence + { + s = utt.relation("LabelSegment")->head(); + if (s->name() == "pau") + { + EST_Item *sil = utt.relation("Segment")->append(); + sil->set("name","pau"); + sil->set("start",s->F("start")); + sil->set("end",s->F("end")); + } + return; + } + + cout << "Looking at inserting\n"; + // Intermeditate silences + r = w->I("phon_ref"); + for (s=utt.relation("LabelSegment")->head(); s; s=s->next()) + { + if (r == s->I("ref")) + { + if (next(s)->name() == "pau") + { + cout << "actually inserting\n"; + EST_Item *sil = utt.relation("Segment")->append(); + sil->set("name","pau"); + sil->set("start",s->F("end")); + sil->set("end",next(s)->F("end")); + } + return; + } + } +} + +void add_trans_seg(EST_Utterance &utt, const EST_String &segfile) +{ + EST_Utterance word; + EST_Relation phone, lab; + EST_Item *s, *w, *nw, *n; + EST_StrList plist; + float phone_start; + LISP lutt; + int i; + LISP l_pdef; + + l_pdef = siod_get_lval("darpa_fs", NULL); + lisp_to_features(l_pdef, phone_def); + + utt.create_relation("LabelSegment"); + utt.create_relation("tmpSegment"); + utt.create_relation("Syllable"); + utt.create_relation("Segment"); + utt.create_relation("WordStructure"); + + if (lab.load(segfile) != format_ok) + EST_error("Couldn't load file %s\n", (const char *) segfile); + + phone_start = 0.0; + + for (s = lab.head(); s; s = s->next()) + { + if (!phone_def.present(s->S("name"))) + EST_error("Phone %s is not defined in phone set\n", (const char *) + s->S("name")); + n = utt.relation("LabelSegment")->append(); +// cout << "append ls id " << n->S("id") << endl; + merge_features(n, s, 1); +// cout << "keep ls id " << n->S("id") << endl; + n->set("start", phone_start); + phone_start = s->F("end"); + } + +// phonemic_trans(*utt.relation("LabelSegment")); +/* for (w = utt.relation("Word")->head(); w != 0; w = n) + { + n = w->next(); + w->f_remove("end"); + if ((w->f("name") == "sil") || (w->f("name") == "pau")) + utt.relation("Word")->remove_item(w); + } +*/ + gc_protect(&lutt); + + word.create_relation("Word"); + word.create_relation("Match"); + word.create_relation("NewMatch"); + word.create_relation("Segment"); + word.create_relation("tmpSegment"); + word.create_relation("SylStructure"); + word.create_relation("Syllable"); + word.create_relation("WordStructure"); + + // Note starts are hardwired here because feature function thing + // isn't fully operational and because deleting silence messes + // it up. + +/* s = utt.relation("LabelSegment")->head(); + if ((s->f("name") == "pau") || (s->f("name") == "sil")) + { + w = utt.relation("SurfacePhone")->append(); + + w->set("name", "pau"); + w->set("end", s->F("end")); + w->set("start", s->F("start")); + w->set("df", phone_def.A("pau")); + } +*/ + + add_silences(utt,0); + + for (i = 0, w = utt.relation("Word")->head(); w != 0; w = w->next(), ++i) + { + word.clear_relations(); + word.f.set("max_id", 0); + cout << "word: " << *w << endl; + lex_to_phones(w->f("name"), w->f("pos", ""), + *word.relation("Segment")); + trans_to_phones(w, *utt.relation("LabelSegment"), + *word.relation("tmpSegment")); + + nw = word.relation("Word")->append(); + nw->set("name", w->S("name")); + + syllabify_word(nw, *word.relation("Segment"), + *word.relation("SylStructure"), + *word.relation("Syllable"), 0); + +// subword_list(nw, *word.relation("Syllable"), +// *word.relation("MetricalTree")); + + if (siod_get_lval("mettree_debug", NULL) != NIL) + word.save("word_lex.utt", "est"); + + EST_Item xx; + dp_match(*word.relation("Segment"), *word.relation("tmpSegment"), + *word.relation("Match"), local_cost, &xx); + + +// fix_syllables(nw, word); + + subword_metrical_tree(nw, *word.relation("Syllable"), + *word.relation("WordStructure")); +// cout << "C2\n"; + + if (siod_get_lval("mettree_debug_word", NULL) != NIL) + word.save("word_dp.utt", "est"); + + if (siod_get_lval("mettree_debug_word", NULL) != NIL) + if (get_c_int(siod_get_lval("mettree_debug_word", NULL)) == i) + word.save("word_nth.utt", "est"); + + word.remove_relation("SurfaceSylStructure"); + word.remove_relation("SurfaceMetrcialTree"); + word.remove_relation("SurfaceSyllable"); + //cout << "32\n"; + EST_String wid = w->S("id"); + utterance_merge(utt, word, w, word.relation("Word")->head()); + + w->set("id", wid); + + add_silences(utt,w); + } + cout << "time2\n"; + +// utt.save("test.utt"); + + +/* add_initial_silence(*utt.relation("Segment"), + *utt.relation("SurfacePhone"), + *utt.relation("Match")); + */ + + add_times(*utt.relation("Segment"), *utt.relation("LabelSegment"), + *utt.relation("Match")); + + // utt.relation("Word")->f.set("timing_style", "segment"); + // cout << "here d\n"; + +// add_feature_function(*utt.relation("SurfacePhone"), "dur", +// usf_duration); + + add_metrical_functions(utt); + + // if silences aren't wanted we still have to build with them so that + // start times before pauses are done properly. + if (!siod_get_lval("unisyn_build_with_silences",NULL)) + for (s = next(utt.relation("Segment")->head());s;s = s->next()) + if ((prev(s)->S("name") != "pau") && (prev(s)->S("name") != "sil")) + s->set_function("start", "standard+unisyn_start"); + else + utt.relation("Segment")->remove_item(prev(s)); + + + + utt.relation("Segment")->remove_item_feature("stress_num"); + utt.relation("Word")->remove_item_feature("phon_ref"); + + utt.remove_relation("tmpSegment"); + + if (siod_get_lval("mettree_debug", NULL) != NIL) + utt.save("met_data.utt", "est"); + + gc_unprotect(&lutt); + cout << "here c\n"; +} + + +LISP FT_add_trans_seg(LISP l_utt, LISP lf_seg) +{ + add_trans_seg(*get_c_utt(l_utt), get_c_string(lf_seg)); + return l_utt; +} + +LISP FT_add_trans_duration(LISP l_utt, LISP lf_seg) +{ + add_trans_duration(*get_c_utt(l_utt), get_c_string(lf_seg)); + return l_utt; +} + +LISP FT_syl_to_word_intonation(LISP l_utt) +{ + syl_to_word_intonation(*get_c_utt(l_utt)); + return l_utt; +} + +LISP FT_intonation_diagnostics(LISP l_ref, LISP l_test, LISP l_rel_name, LISP l_valid) +{ + EST_StrList valid; + + siod_list_to_strlist(l_valid, valid); + intonation_diagnostics(*get_c_utt(l_ref), *get_c_utt(l_test), + get_c_string(l_rel_name), valid); + + return NIL; +} + + +/*LISP FT_metrical_data(LISP lf_word, LISP lf_seg, LISP lf_int, LISP lf_met) +{ + EST_Utterance *u = new EST_Utterance; + LISP l_utt = siod_make_utt(u); + EST_String word_file = get_c_string(lf_word); + + u->f.set("fileroot", basename(word_file, "*")); + + if (lf_met) + { + if (siod_get_lval("us_xml_metrical_trees", NULL) != NIL) + data_metrical_tree(*u, get_c_string(lf_met), "xml"); + else + data_metrical_tree(*u, get_c_string(lf_met), ""); + } + else + syntax_metrical_tree(*u, get_c_string(lf_word)); + + if (lf_int) + data_metrical_lex(*get_c_utt(l_utt), get_c_string(lf_seg), + get_c_string(lf_int)); + else + data_metrical_lex(*get_c_utt(l_utt), get_c_string(lf_seg), ""); + + return l_utt; +} +*/ + + + +/* Specific to s-expression weather. + Should be replaced when metrical trees go into XML. + +void data_metrical_tree(LISP l, EST_Item *met_parent, EST_Relation &word) +{ + EST_String mv, name; + int id, phon_ref; + LISP a; + EST_Item *m; + + //cout << "full entry\n"; +// lprint(l); + //cout << "now parsing\n"; + + mv = get_c_string(car(l)); + //cout << "adding node strength: " << mv << endl; + // root nodes are added in calling routine. + if (mv != "r") + m = met_parent->append_daughter(); + else + m = met_parent; + m->set("MetricalValue", mv); + + if (siod_atomic_list(cdr(l))) + { + // cout << "atomic cdr is: "; +// lprint(cdr(l)); + a = cdr(l); + name = get_c_string(car(a)); + phon_ref = get_c_int(car(cdr(a))); + id = get_c_int(car(cdr(cdr(a)))); + m->set("name", name); + m->set("phon_ref", phon_ref); + m->set("id", id); + word.append(m); + + //cout << "adding " << name << " on id: " << id << endl; + return; + } + //cout << "\ndoing left branch\n"; + data_metrical_tree(car(cdr(l)), m, word); + //cout << "\ndoing right branch\n"; + data_metrical_tree(car(cdr(cdr(l))), m, word); +} +*/ + +int lisp_tree_to_xml(ofstream &outf, LISP l) +{ + int id; + LISP a; + EST_String mv; + + //cout << "full entry\n"; +// lprint(l); + //cout << "now parsing\n"; + + mv = get_c_string(car(l)); + + outf << "\n"; + return 0; + } + else + outf << ">\n"; + + if (lisp_tree_to_xml(outf, car(cdr(l)))) + outf << "\n"; + if (lisp_tree_to_xml(outf, car(cdr(cdr(l))))) + outf << "\n"; + + return 1; +} + +LISP FT_add_trans_metrical_tree(LISP l_utt, LISP lf_input, LISP lf_output) +{ + EST_Utterance *utt; + LISP lmet, l; + EST_Item *m; + + utt = get_c_utt(l_utt); + + utt->create_relation("Word"); + utt->create_relation("Token"); + utt->create_relation("MetricalTree"); + + lmet = vload(get_c_string(lf_input), 1); + + + ofstream outf; + outf.open(get_c_string(lf_output)); + + outf << "\n"; + outf << "]>\n"; + + outf << "n"; + +// lprint(lmet); + + // have to ensure that next id of build nodes is greater than + // any in the file. This should be done properly sometime. + utt->set_highest_id(10000); + + for (l = lmet; l ; l = cdr(l)) + { + m = utt->relation("MetricalTree")->append(); +// cout << "\nNew Tree\n"; + if (lisp_tree_to_xml(outf, car(l))) + outf << "" << endl; + } + + outf << "\n"; + + return l_utt; +} diff --git a/src/modules/UniSyn_phonology/UniSyn_phonology.cc b/src/modules/UniSyn_phonology/UniSyn_phonology.cc new file mode 100644 index 0000000..a9e2c1f --- /dev/null +++ b/src/modules/UniSyn_phonology/UniSyn_phonology.cc @@ -0,0 +1,653 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Paul Taylor */ +/* Date : June 1998 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Metrical Tree based Phonology system */ +/* */ +/*=======================================================================*/ + +#include +#include "festival.h" +//#include "development/EST_FeatureData.h" +#include "../UniSyn/us_features.h" + +/**** FUNCTIONS FOR UNISYN BUILD ****/ +LISP FT_add_trans_seg(LISP l_utt, LISP lf_seg); +LISP FT_add_trans_duration(LISP l_utt, LISP lf_seg); +LISP FT_add_trans_metrical_tree(LISP l_utt, LISP lf_input, LISP lf_output); +LISP FT_add_xml_relation(LISP l_utt, LISP xml_file); +LISP FT_add_trans_word(LISP l_utt, LISP lf_word); +LISP FT_add_trans_intonation(LISP l_utt, LISP lf_int, LISP l_add_words); +LISP FT_add_f0_points(LISP l_utt, LISP lf_f0); +LISP FT_add_coefs(LISP l_utt, LISP lf_f0); + +LISP FT_syl_to_word_intonation(LISP l_utt); +LISP FT_intonation_diagnostics(LISP l_ref, LISP l_test, LISP l_rel_name, LISP l_valid); +/**** END FUNCTIONS FOR UNISYN BUILD ****/ + +void extend_tree(EST_Item *m, EST_Item *p, const EST_String &terminal, + const EST_String &second_tree); + +void parse_words(EST_Utterance &utt); + +LISP FT_add_trans_word(LISP l_utt, LISP lf_word, LISP keep_times); + +LISP FT_US_add_intonation(LISP utt); +LISP FT_focus_nth_item(LISP utt, LISP lrel, LISP w); +LISP FT_focus_nth_tree_item(LISP utt, LISP lrel, LISP w); +LISP FT_foot_nth_item(LISP utt, LISP w); + +void make_prosodic_tree(EST_Item *m, EST_Item *p); + +void phrase_factor(EST_Utterance &u, const EST_String &base_stream, + const EST_String &mettree); + +void stress_factor1(EST_Utterance &u, const EST_String &base_stream, + const EST_String &m); +void stress_factor2(EST_Utterance &u, const EST_String &base_stream, + const EST_String &m); + +void main_stress(EST_Item *s); +void footing(EST_Item *n1); +void add_monotone_targets(EST_Utterance &u, float start_f0, + float end_f0); + +void parse_words(EST_Utterance &utt); + +void syntax_to_metrical_words(EST_Utterance &utt); + +void binaryize_tree(EST_Utterance &utt, const EST_String &base_tree, + const EST_String &new_tree); + +/*void assign_phone_z_scores(EST_Utterance &u, const EST_String &seg_name); + +void promote_vowel_z_score(EST_Utterance &u, const EST_String &st_name, + const EST_String &syl_name); + + +void promote_mean_z_score(EST_Utterance &u, const EST_String &st_name, + const EST_String &syl_name); + +void find_cvc_words(EST_Utterance &u); +*/ +void tilt_to_f0(EST_Utterance &u); +void scale_tilt(EST_Relation &ev, float shift, float scale); + +void vowel_tilt_to_abs_tilt(EST_Utterance &u); +void targets_to_f0(EST_Relation &targ, EST_Track &f0, const float shift); + +void syntax_metrical_tree(EST_Utterance &utt, const EST_String &wordfile); + + +void tilt_to_f0(EST_Relation &intonation, EST_Relation &f0); + +void legal_metrical_tree(EST_Item *s); + +LISP FT_focus_nth_item(LISP utt, LISP lrel, LISP w) +{ + EST_Utterance *u = get_c_utt(utt); + int f = get_c_int(w); + EST_String relname = get_c_string(lrel); + EST_Item *n; + int i; + + cout << "Focusing item " << f << " in relation " << relname << endl; + + for (i = 1, n = u->relation(relname)->head(); n; n = n->next(), ++i) + if (i == f) + break; + + if (n == 0) + { + cerr << "Error: Can't focus node " << f << + " in a relation with only " << i << " items\n"; + return NIL; + } + + main_stress(n->as_relation("MetricalTree")); + if (siod_get_lval("mettree_debug", NULL) != NIL) + u->save("focus.utt", "est"); + + return utt; +} + +LISP FT_focus_nth_tree_item(LISP utt, LISP lrel, LISP w) +{ + EST_Utterance *u = get_c_utt(utt); + EST_String dir; + EST_String relname = get_c_string(lrel); + EST_Item *n; + LISP l; + + n = u->relation(relname)->head(); + for (l = w; (l != NIL) && (n); l = cdr(l)) + { + dir = get_c_string(car(l)); + cout << "dir = " << dir << endl; + if (dir == "l") + n = daughter1(n); + else + if (dir == "r") + n = daughter2(n); + else + cerr << "Bad instruction: " << dir << endl; + } + + main_stress(n->as_relation("MetricalTree")); + if (siod_get_lval("mettree_debug", NULL) != NIL) + u->save("focus.utt", "est"); + + return utt; +} + +LISP FT_foot_nth_item(LISP utt, LISP w) +{ + EST_Utterance *u = get_c_utt(utt); + int f = get_c_int(w); + (void) f; + (void)u; + EST_Item *n; + int i; + + cout << "Footing item " << f << endl; + + for (i = 1, n = u->relation("Syllable")->head(); n; n = n->next(), ++i) + if (i == f) + break; + + if (n == 0) + { + cerr << "Error: Can't foot node " << f << + " in a relation with only " << i << " items\n"; + return NIL; + } + + footing(n->as_relation("MetricalTree")); + + if (siod_get_lval("mettree_debug", NULL) != NIL) + u->save("foot.utt", "est"); + + return utt; +} + +LISP FT_US_add_intonation(LISP utt) +{ + (void) utt; + EST_Utterance *u = get_c_utt(utt); + EST_String base_int; + + /* lt = siod_get_lval("us_acc_thresh", NULL); + t = (lt == NIL) ? DEF_THRESH : get_c_float(lt); + + lt = siod_get_lval("us_base_int", NULL); + if (lt == NIL) + base_int = "Syllable"; + else + { + char *x = get_c_string(lt); + base_int = x; + } + + add_intonation(*u, base_int, t); + */ + + float start_f0 = get_c_float(siod_get_lval("us_start_f0", NULL)); + float end_f0 = get_c_float(siod_get_lval("us_end_f0", NULL)); + + u->create_relation("f0"); + EST_Track *f0 = new EST_Track; + EST_Item *a = u->relation("f0")->append(); + a->set_val("f0",est_val(f0)); + + add_monotone_targets(*u, start_f0, end_f0); + + targets_to_f0(*u->relation("Target"), *f0, 0.01); + + return utt; +} + +LISP FT_extend_tree(LISP l_utt, LISP largs) +{ + EST_Item *p, *m; + EST_Utterance *u = get_c_utt(l_utt); + + EST_String new_tree = get_c_string(car(largs)); + EST_String first_tree = get_c_string(car(cdr(largs))); + EST_String second_tree = get_c_string(car(cdr(cdr(largs)))); + EST_String terminal = get_c_string(car(cdr(cdr(cdr(largs))))); + + u->create_relation(new_tree); + + for (m = u->relation(first_tree)->head(); m; m = m->next()) + { + p = u->relation(new_tree)->append(m); + extend_tree(m, p, terminal, second_tree); + } + return l_utt; +} + +static void add_keep_nodes(EST_Item *n, EST_String keep) +{ + if (n == 0) + return; + + n->set(keep, 1); + + for (EST_Item *p = daughter1(n); p; p = p->next()) + add_keep_nodes(p, keep); +} + +static void remove_sisters(EST_Item *n) +{ + EST_Item *m; + EST_Item *p = parent(n); + if (p == 0) + return; + + for (EST_Item *s = daughter1(p); s; s = m) + { + m = s->next(); + if (s != n) + s->unref_all(); + } + move_sub_tree(n, p); + remove_sisters(p); +} + +LISP FT_copy_sub_tree(LISP l_utt, LISP l_id, LISP l_relation) +{ + EST_Utterance *u = get_c_utt(l_utt); + EST_Utterance *new_utt = new EST_Utterance; + + new_utt = u; + + EST_Item *n = new_utt->id(get_c_string(l_id))-> + as_relation(get_c_string(l_relation)); + + remove_sisters(n); + + // reset n - should now be a root node. + n = new_utt->id(get_c_string(l_id))-> + as_relation(get_c_string(l_relation)); + + // remove other root nodes. + EST_Item *s, *m; + + for (s = new_utt->relation(get_c_string(l_relation))->head(); + s; s = m) + { + m = s->next(); + if (s != n) + s->unref_all(); + } + + n = new_utt->id(get_c_string(l_id))-> + as_relation(get_c_string(l_relation)); + + add_keep_nodes(n, "keep"); + + for (s = new_utt->relation("Segment")->head(); s; s = m) + { + m = s->next(); + if (!s->f_present("keep")) + s->unref_all(); + } + + for (s = new_utt->relation("Syllable")->head(); s; s = m) + { + m = s->next(); + if (!s->f_present("keep")) + s->unref_all(); + } + + for (s = new_utt->relation("Word")->head(); s; s = m) + { + m = s->next(); + if (!s->f_present("keep")) + s->unref_all(); + } + +/* for (s = new_utt->relation("Segment")->head(); + s->S("id") != first_leaf(n)->S("id"); s = m) + { + m = s->next(); + cout << "deleting segment :" << s->S("name") << endl; + s->unref_all(); + } + + for (s = next(last_leaf(n)->as_relation("Segment")); s; s = m) + { + m = s->next(); + cout << "deleting segment :" << s->S("name") << endl; + s->unref_all(); + } +*/ + + cout << "h1\n"; + LISP n_utt; + cout << "h1\n"; + n_utt = siod(new_utt); + cout << "h1\n"; + + return n_utt; +} + +void add_syllable_name(EST_Item *syl, const EST_String &fname); +void add_non_terminal_features(EST_Item *s, + EST_Features &f); + +LISP FT_add_match_features(LISP l_utt) +{ + EST_Item *p; + EST_Utterance *u = get_c_utt(l_utt); + + for (p = u->relation("Word")->head(); p; p = p->next()) + p->set("match", p->S("name")); + + for (p = u->relation("Segment")->head(); p; p = p->next()) + p->set("match", p->S("name")); + + for (p = u->relation("Syllable")->head(); p; p = p->next()) + add_syllable_name(p, "match"); + + EST_Features tf; + tf.set_function("end", "standard+unisyn_leaf_end"); + tf.set_function("start", "standard+unisyn_leaf_start"); + tf.set_function("dur", "standard+duration"); + + tf.set("time_path", "ProsodicTree"); + tf.set("time_path", "ProsodicTree"); + + add_non_terminal_features(u->relation("ProsodicTree")->head(), tf); + + return l_utt; +} + +LISP FT_tilt_to_f0(LISP l_utt, LISP l_f0_name) +{ + EST_String f0_name = get_c_string(l_f0_name); + EST_Utterance *u = get_c_utt(l_utt); + + EST_Relation *f0 = u->create_relation(f0_name); + + tilt_to_f0(*u->relation("Intonation"), *f0); + return l_utt; +} + +LISP FT_scale_tilt(LISP l_utt, LISP l_shift, LISP l_scale) +{ + scale_tilt(*(get_c_utt(l_utt)->relation("Intonation")), + get_c_float(l_shift), get_c_float(l_scale)); + return l_utt; +} + +LISP FT_vowel_tilt_to_abs_tilt(LISP l_utt) +{ + vowel_tilt_to_abs_tilt(*get_c_utt(l_utt)); + return l_utt; +} + +LISP FT_phrase_factor(LISP l_utt, LISP l_base_name, LISP l_met_name) +{ + phrase_factor(*get_c_utt(l_utt), get_c_string(l_base_name), + get_c_string(l_met_name)); + return l_utt; +} + +LISP FT_stress_factor(LISP l_utt, LISP l_rel_name, LISP l_num) +{ + if (get_c_int(l_num) == 1) + stress_factor1(*get_c_utt(l_utt), get_c_string(l_rel_name), + "LexicalMetricalTree"); + else + stress_factor2(*get_c_utt(l_utt), get_c_string(l_rel_name), + "LexicalMetricalTree"); + return l_utt; +} + +LISP FT_smooth_f0(LISP l_utt, LISP l_rel_name, LISP l_num) +{ + if (get_c_int(l_num) == 1) + stress_factor1(*get_c_utt(l_utt), get_c_string(l_rel_name), + "LexicalMetricalTree"); + else + stress_factor2(*get_c_utt(l_utt), get_c_string(l_rel_name), + "LexicalMetricalTree"); + return l_utt; +} + +LISP FT_legal_metrical_tree(LISP l_utt) +{ + EST_Utterance *u = get_c_utt(l_utt); + EST_Item *s; + + for (s = u->relation("MetricalTree")->head(); s; s= s->next()) + legal_metrical_tree(s); + + return l_utt; +} +void auto_metrical_lex(EST_Utterance &utt); + +LISP FT_unisyn_lex(LISP l_utt) +{ + EST_Utterance *u = get_c_utt(l_utt); + auto_metrical_lex(*u); + + return l_utt; +} + + +/*LISP FT_MetricalTree_Utt(LISP l_utt) +{ + EST_Utterance *u = get_c_utt(l_utt); + + auto_metrical_tree(*u); + auto_metrical_lex(*u); + + return l_utt; +} +*/ + + + +LISP FT_syntax_to_metrical_words(LISP l_utt) +{ + syntax_to_metrical_words(*get_c_utt(l_utt)); + return l_utt; +} + +LISP FT_binaryize_tree(LISP l_utt, LISP old_tree, LISP new_tree) +{ + binaryize_tree(*get_c_utt(l_utt), get_c_string(old_tree), + get_c_string(new_tree)); + return l_utt; +} + +LISP FT_parse_words(LISP l_utt) +{ + parse_words(*get_c_utt(l_utt)); + return l_utt; +} + +void festival_UniSyn_phonology_init(void) +{ + init_subr_1("US_add_intonation", FT_US_add_intonation, + "(US_add_intonation UTT)"); + + init_subr_1("vowel_tilt_to_abs_tilt", FT_vowel_tilt_to_abs_tilt, "."); + + init_subr_2("tilt_to_f0", FT_tilt_to_f0, "."); + + init_subr_3("scale_tilt", FT_scale_tilt, + "(scale_tilt UTT shift scale)\n" + "Add shift Hz to each event and increase range by a factor\n" + "of scale. (scale UTT 0.0 1.0) leaves the tilt parameters\n" + "unaffected\n"); + +/* init_subr_3("promote_vowel_z_score", FT_promote_vowel_z_score, + "(promote_vowel_z_score UTT RELATION)\n\ + Focus nth item in relation."); + + init_subr_3("promote_mean_z_score", FT_promote_mean_z_score, + "(promote_mean_z_score UTT RELATION)\n\ + Focus nth item in relation."); + + init_subr_2("assign_phone_z_scores", FT_assign_phone_z_scores, + "(assign_phone_z_scores UTT RELATION RELATION)\n\ + Focus nth item in relation."); +*/ + + init_subr_3("focus_nth_item", FT_focus_nth_item, + "(focus_nth_item UTT RELATION WordNumber)\n\ + Focus nth item in relation."); + + init_subr_3("focus_nth_tree_item", FT_focus_nth_tree_item, + "(focus_tree_nth_item UTT RELATION (l r path)\n\ + Focus nth item in relation."); + + init_subr_3("copy_sub_tree", FT_copy_sub_tree, + "(foot_nth_item UTT WordNumber)\n\ + Foot nth item in relation."); + + init_subr_2("foot_nth_item", FT_foot_nth_item, + "(foot_nth_item UTT WordNumber)\n\ + Foot nth item in relation."); + + init_subr_1("legal_metrical_tree", FT_legal_metrical_tree, + "(legal_metrical_tree UTT))\n\ + load data."); + + init_subr_1("add_match_features", FT_add_match_features, + "(legal_metrical_tree UTT))\n\ + load data."); + + init_subr_2("extend_tree", FT_extend_tree, + "(extend_tree utt (new_tree first_tree second_tree terminal))"); + + init_subr_3("phrase_factor", FT_phrase_factor, + "(phrase_factor utt)\n\ + Apply phrase factor algorithm."); + + init_subr_3("stress_factor", FT_stress_factor, + "(stress_factor utt RELATION VERSION)\n\ + Apply version VERSION of stress factor algorithm to RELATION."); + + init_subr_1("syntax_to_metrical_words", FT_syntax_to_metrical_words, + "(add_trans_intonation UTT Tilt file)\n\ + Foot nth item in relation."); + + init_subr_3("binaryize_tree", FT_binaryize_tree, + "(add_trans_intonation UTT Tilt file)\n\ + Foot nth item in relation."); + + init_subr_1("parse_words", FT_parse_words, + "(add_trans_intonation UTT Tilt file)\n\ + Foot nth item in relation."); + + init_subr_1("unisyn_lex", FT_unisyn_lex, + "(unisyn_lex UTT Tilt file)\n\ + Foot nth item in relation."); + + +/**** FUNCTIONS FOR UNISYN BUILD ****/ + + init_subr_3("add_trans_intonation", FT_add_trans_intonation, + "(add_trans_intonation UTT Tilt_file ADD_WORDS)\n\ + Foot nth item in relation."); + + init_subr_2("add_f0_points", FT_add_f0_points, + "(add_f0_points UTT f0 file)\n\ + Foot nth item in relation."); + + init_subr_2("add_coefs", FT_add_coefs, + "(add_coefs UTT f0 file)\n\ + Foot nth item in relation."); + + init_subr_2("add_trans_segment", FT_add_trans_seg, + "(add_trans_segment UTT label_file)\n\ + Add segment information\n\ + Foot nth item in relation."); + + init_subr_2("add_trans_duration", FT_add_trans_duration, + "(add_trans_duration UTT label_file)\n\ + Foot nth item in relation."); + + init_subr_3("add_trans_word", FT_add_trans_word, + "(add_trans_word UTT word_label_file)\n\ + Foot nth item in relation."); + + init_subr_2("add_xml_relation", FT_add_xml_relation, + "(add_xml_relation UTT xml_file))\n\ + load data."); + + init_subr_1("syl_to_word_intonation", FT_syl_to_word_intonation, + "(add_xml_relation UTT xml_file))\n\ + load data."); + + init_subr_4("intonation_diagnostics", FT_intonation_diagnostics, + "(intonation_diagnostics UTT + Foot nth item in relation."); + + // semi redundant + init_subr_3("add_trans_metrical_tree", FT_add_trans_metrical_tree, + "(add_trans_intonation UTT Tilt file)\n\ + Foot nth item in relation."); + + + +/**** END FUNCTIONS FOR UNISYN BUILD ****/ +} + +/* +LISP FT_assign_phone_z_scores(LISP l_utt, LISP l_rel_name) +{ + assign_phone_z_scores(*get_c_utt(l_utt), get_c_string(l_rel_name)); + return l_utt; +} + +LISP FT_promote_mean_z_score(LISP l_utt, LISP l_st_name, LISP l_syl_name) +{ + promote_mean_z_score(*get_c_utt(l_utt), get_c_string(l_st_name), + get_c_string(l_syl_name)); + return l_utt; +} + +LISP FT_promote_vowel_z_score(LISP l_utt, LISP l_st_name, LISP l_syl_name) +{ + promote_vowel_z_score(*get_c_utt(l_utt), get_c_string(l_st_name), + get_c_string(l_syl_name)); + return l_utt; +} +*/ diff --git a/src/modules/UniSyn_phonology/mettree.cc b/src/modules/UniSyn_phonology/mettree.cc new file mode 100644 index 0000000..5abe5b4 --- /dev/null +++ b/src/modules/UniSyn_phonology/mettree.cc @@ -0,0 +1,1338 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1998 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black and Paul Taylor */ +/* Date : February 1998 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* An implementation of Metrical Tree Phonology */ +/* */ +/*=======================================================================*/ + +#include +#include "festival.h" +#include "lexicon.h" +#include "../UniSyn/us_features.h" + +EST_Features phone_def; + +float local_cost(const EST_Item *s1, const EST_Item *s2); + +static void mettree_add_words(EST_Utterance &u); + +void construct_metrical_tree(EST_Utterance &word); +void add_end_silences(EST_Relation &segment); +void StrListtoString(EST_StrList &l, EST_String &s, EST_String sep=" "); + +void parse_wsj_syntax(void); +static void apply_nsr(EST_Utterance &u, const EST_String &tree); +#if 0 +static void remove_punctuation(EST_Utterance &u); +static void add_intonation(EST_Utterance &u, const EST_String &base_int, + float threshold); +#endif + +void subword_list(EST_Item *w, EST_Relation &syllable, + EST_Relation &metricaltree); + +void add_non_terminal_features(EST_Relation &r, + EST_Features &f); + +void stress_factor1(EST_Utterance &u, const EST_String &base_stream, + const EST_String &m); +void stress_factor2(EST_Utterance &u, const EST_String &base_stream, + const EST_String &m); + +void phrase_factor(EST_Utterance &u); + +void main_stress(EST_Item *s); + +void end_to_dur(EST_Relation &r); + +void footing(EST_Item *n1); + +LISP FT_Classic_Phrasify_Utt(LISP args); +LISP FT_Classic_POS_Utt(LISP args); +LISP FT_PParse_Utt(LISP args); +LISP FT_MultiParse_Utt(LISP utt); +void MultiParse(EST_Utterance &u); + +void add_feature_string(EST_Relation &r, const EST_String &fname, + const EST_String &f); + +void add_monotone_targets(EST_Utterance &u, float start_f0, + float end_f0); + + +void clear_feature(EST_Relation &r, const EST_String &name); + +typedef +float (*local_cost_function)(const EST_Item *item1, + const EST_Item *item2); + + + +void subword_phonology(EST_Utterance &word); +void lex_to_phones(EST_Utterance &u, const EST_String &relname); + +bool dp_match(const EST_Relation &lexical, + const EST_Relation &surface, + EST_Relation &match, + local_cost_function lcf, + EST_Item *null_syl); + +//bool dp_match(EST_Relation &a, EST_Relation &b, EST_Relation &c, +// local_cost_function lcf, const EST_String &null_sym); +void lex_to_phones(const EST_String &name, const EST_String &pos, + EST_Relation &phone); + +void subword_metrical_tree(EST_Relation &syllable, + EST_Relation &metricaltree); + + +int syllabify_word(EST_Item *nw, EST_Relation &phone, + EST_Relation &sylstructure, EST_Relation &syl, int flat); + +void subword_metrical_tree(EST_Item *w, EST_Relation &syllable, + EST_Relation &metricaltree); + + +/*void phonemic_trans(EST_Relation &trans) +{ + EST_Item *s, *n; + EST_String a; + +// cout << "trans: " << trans << endl; + + for (s = trans.head(); s; s = s->next()) + { + n = s->next(); +// cout << *s; + if (s->S("name").contains("cl")) + { + a = s->S("name").before("cl"); + if ((next(s) != 0) && (next(s)->S("name") == a)) + trans.remove_item(s); + else if ((next(s) != 0) && (a == "dcl" ) + && (next(s)->S("name") == "jh")) + trans.remove_item(s); + else if ((next(s) != 0) && (a == "tcl" ) + && (next(s)->S("name") == "ch")) + trans.remove_item(s); + else + s->set("name", a); +// cout << "here1: " << a << "\n"; +// s->set("name", s->S("name").before("cl")); + + } + } +} +*/ + + +EST_Item *prev_match(EST_Item *n) +{ + EST_Item *p = n->prev(); + if (p == 0) + return 0; + + if (daughter1(p->as_relation("Match")) == 0) + prev_match(p); + + return daughter1(p->as_relation("Match")); +} + +void insert_schwa(EST_Item *n) +{ + EST_Item *p, *s; + float pp_end = 0; + float schwa_length = 0.01; + + if ((p = prev_match(n)) == 0) + { + cout << "Couldn't insert dummy schwa after " << *n << endl; + return; + } + + p = p->as_relation("SurfacePhone"); + pp_end = (prev(p) != 0) ? prev(p)->F("end",0.0) : 0.0; + + s = p->insert_after(); + + s->set("name", "ax"); + s->set("stress_num", "0"); + + if ((p->F("end",0) - pp_end) < schwa_length) + schwa_length = p->F("dur") / 2.0; + + s->set("end", p->F("end",0)); + p->set("end", p->F("end",0) - schwa_length); + s->set("start", p->F("end",0)); + + s->set("df", phone_def.A("ax")); + +// cout << "end 1:" << s->f("end") << endl; +// cout << "end 2:" << p->f("end") << endl; +} + +void add_initial_silence(EST_Relation &lexical, EST_Relation &surface, + EST_Relation &match) +{ + EST_Item *s, *p, *n, *m; + + s = lexical.head(); + if ((s->f("name") != "pau") && (s->f("name") != "sil")) + { + p = s->insert_before(); + p->set("name", "pau"); + p->set("df", phone_def.A("pau")); + + n = surface.head(); + if ((n->f("name") == "pau") || (n->f("name") == "sil")) + { + m = match.head()->insert_before(p); + m->append_daughter(n); + } + + } +} + +void add_even_segment_times(EST_Item *w, EST_Relation &phone) +{ + EST_Item *s; + int i; + float start,dur=0,n,div; + + start = w->F("start"); + dur = w->F("end") - start; + n = (float)phone.length(); + div = dur/n; + + for (i = 0, s = phone.head(); s; s = s->next(), ++i) + { + s->set("start", start + div * (float) i); + s->set("end", start + div * (float) (i + 1)); + } +} + +#if 0 +static void add_trans_phrase_phrase(EST_Utterance &utt) +{ + EST_Item *s, *t, *a, *r; + EST_Item *first_accent = 0, *last_accent = 0; + bool exist; + + // This looks insanely comlicated, but all it really does is + // add phrase_start and phrase_end items to the root node of + // each metrical tree and then places these in the right position + // in the intonation relation. + + utt.create_relation("IntonationPhrase"); + + for (s = utt.relation("MetricalTree", 1)->head(); s; s = s->next()) + { + for (r = first_leaf_in_tree(s); + r != next_leaf(last_leaf_in_tree(s)); r = next_leaf(r)) + if (r->in_relation("IntonationSyllable")) + { + if (first_accent == 0) + first_accent = + parent(r->as_relation("IntonationSyllable")) + ->as_relation("Intonation"); + last_accent = + parent(r->as_relation("IntonationSyllable")) + ->as_relation("Intonation"); + } + + exist = false; + // cout << "\nroot node: " <<*s << endl; + + if (first_accent) + { + cout << "first accent: " << *first_accent << endl; + a = first_accent->prev(); + + if (a->S("name","") != "phrase_start") + a = first_accent->insert_before(); + } + else + { + if (a == 0) + a = utt.relation("Intonation")->prepend(); + else + a = a->insert_after(); + } + + if (a->S("name","") != "phrase_start" ) // i.e. its a new one + { + a->set("name", "phrase_start"); + a->set("ev:f0", 100); + } + // re-write position as relative to metrical tree + a->set("position", usf_int_start); + // add this as daughter to root node + t = utt.relation("IntonationPhrase")->append(s); + t->append_daughter(a); + exist = false; + //cout << "appended phrase end\n"; + + if (last_accent) + { + cout << "last accent: " << *last_accent << endl; + a = last_accent->next(); + if (a->S("name","") != "phrase_end") + a = last_accent->insert_after(); + } + else + a = a->insert_after(); + + if (a->S("name","") != "phrase_end") + { + a->set("name", "phrase_end"); + a->set("ev:f0", 100); + } + // re-write position as relative to metrical tree + a->set("position", usf_int_end); + + // add this as daughter to root node + t->append_daughter(a); + //cout << "appended phrase start\n"; + first_accent = 0; // trigger for first time operation of loop + } + + // now join any other marked phrase_start/ends to intermediate + // nodes in metrical tree. + + /* for (s = u.relation("Intonation", 1)->head(); s; s = s->next()) + { + if (!s->in_relation("IntonationPhrase") && + !s->in_relation("IntonationSyllable")) + { + pos = s->F("position"); + + + + } + */ + +} +#endif + + +void add_single_phrase(EST_Utterance &utt, EST_Item *t) +{ + EST_Item *s=0, *a, *p; + float pos, max, d = 0, start = 0.0; + + pos = t->F("time"); + max = 100000.0; + + for (p = utt.relation("Syllable")->head(); p; p = p->next()) + { + if (t->S("name") == "phrase_end") + d = fabs(pos - p->F("end")); + else + d = fabs(pos - start); + + if (d < max) + { + max = d; + s = p; + } + start = p->F("end"); + } + +/* if (s) + cout << "joining syllable " << *s << endl; + else + cout << "No legal syllable " << endl; + cout << "to " << *t << endl; + + cout << "d = " << d << endl; +*/ + + if (!s->in_relation("IntonationSyllable")) + a = utt.relation("IntonationSyllable")->append(s); + else + a = s->as_relation("IntonationSyllable"); + a->append_daughter(t); + t->set("time_path", "IntonationSyllable"); + t->set_function("position", "standard+unisyn_tilt_phrase_position"); +} + +void add_times(EST_Relation &lexical, EST_Relation &surface, + EST_Relation &match) +{ + (void) surface; + (void) match; + EST_Item *s, *t, *p; + float prev_end, inc, first_end, last_end; + int i; + + // first pass, copy times as appropriate, and find first + // and last defined ends + // This is hacky and certainly won't work for many cases + + first_end = -1.0; + prev_end = 0.0; + last_end = 0.0; + +// cout << "surface: " << surface << endl; + + for (s = lexical.head(); s; s = s->next()) + { + if ((t = daughter1(s->as_relation("Match"))) != 0) + { + s->set("end", t->F("end")); + s->set("start", t->F("start")); + + last_end = t->F("end"); + if (first_end < 0.0) + first_end = t->F("end"); + } + } + + if (!lexical.head()->f_present("end")) + { + lexical.head()->set("end", first_end / 2.0); + lexical.head()->set("start", 0.0); + } + + if (!lexical.tail()->f_present("end")) + { + lexical.tail()->set("end", last_end + 0.01); + lexical.tail()->set("start", last_end); + } + + for (s = lexical.head(); s; s = s->next()) + { + if (!s->f_present("end")) + { +// cout << "missing end feature for " << *s << endl; + for (i = 1, p = s; p; p = p->next(), ++i) + if (p->f_present("end")) + break; + inc = (p->F("end") - prev_end) / ((float) i); +// cout << "inc is : " << inc << endl; + +// cout << "stop phone is " << *p << endl; + + for (i = 1; s !=p ; s = s->next(), ++i) + { + s->set("end", (prev_end + ((float) i * inc))); + s->set("start", (prev_end + ((float) (i - 1 )* inc))); + } + } + prev_end = s->F("end"); + } +} + + +static void met_error(EST_Item *s) +{ + cerr << "Illegally named daughters of metrical node\n" + << "daughter1 : " << *daughter1(s) << endl + << "daughter2 : " << *daughter2(s) << endl; + EST_error(""); +} + +void legal_metrical_tree(EST_Item *s) +{ + if (s == 0) + return; + + if ((daughter1(s) == 0) || (daughter2(s) == 0)) + return; + + if ((daughter1(s)->S("MetricalValue") == "s") + && (daughter2(s)->S("MetricalValue") != "w")) + met_error(s); + else if ((daughter1(s)->S("MetricalValue") == "w") + && (daughter2(s)->S("MetricalValue") != "s")) + met_error(s); + else if ((daughter1(s)->S("MetricalValue") != "w") + && (daughter1(s)->S("MetricalValue") != "s")) + met_error(s); + + legal_metrical_tree(daughter1(s)); + legal_metrical_tree(daughter2(s)); +} + +void parse_words(EST_Utterance &utt) +{ + utt.create_relation("Token"); + + FT_Classic_POS_Utt(siod(&utt)); + FT_Classic_Phrasify_Utt(siod(&utt)); + MultiParse(utt); + + utt.relation("Syntax")->remove_item_feature("pos_index"); + utt.relation("Syntax")->remove_item_feature("pos_index_score"); + utt.relation("Syntax")->remove_item_feature("phr_pos"); + utt.relation("Syntax")->remove_item_feature("pbreak_index"); + utt.relation("Syntax")->remove_item_feature("pbreak_index_score"); + utt.relation("Syntax")->remove_item_feature("pbreak"); + utt.relation("Syntax")->remove_item_feature("blevel"); + utt.relation("Syntax")->remove_item_feature("prob"); +} + +void binaryize_tree(EST_Item *t) +{ + // terminating condition + if (daughter1(t) == 0) + return; + + // nodes with single children should be merged + if (daughter2(t) == 0) + { +// cout << "Single daughter: " << *t << endl; + EST_Item *d = daughter1(t); + move_sub_tree(d, t); + } + + for (EST_Item *p = daughter1(t); p; p = p->next()) + binaryize_tree(p); +} + +void binaryize_tree(EST_Utterance &utt, const EST_String &base_tree, + const EST_String &new_tree) +{ + utt.create_relation(new_tree); + copy_relation(*utt.relation(base_tree), *utt.relation(new_tree)); + + for (EST_Item *p = utt.relation(new_tree)->head(); p; p = p->next()) + binaryize_tree(p); +} + +void syntax_to_metrical_words(EST_Utterance &utt) +{ + utt.create_relation("MetricalWord"); + // copy syntax tree while merging single daughter nodes + binaryize_tree(utt, "Syntax", "MetricalWord"); + // add strong and weak values + apply_nsr(utt, "MetricalWord"); +} + +void add_metrical_functions(EST_Utterance &utt) +{ + // Note that we don't add "start" functions here as this depends on + // pause behaviour + add_feature_function(*utt.relation("Syllable"), + "vowel_start", + "unisyn_vowel_start"); + + add_feature_function(*utt.relation("Syllable"), + "end", "standard+unisyn_leaf_end"); + add_feature_function(*utt.relation("Syllable"), + "start", "standard+unisyn_leaf_start"); + + for (EST_Item *s = utt.relation("Syllable")->head(); s; s = s->next()) + s->set("time_path", "SylStructure"); + + EST_Features tf; + tf.set_function("end", "standard+unisyn_leaf_end"); + tf.set_function("start","standard+unisyn_leaf_start"); + tf.set_function("dur","standard+unisyn_duration"); + + tf.set("time_path", "MetricalTree"); + tf.set("time_path", "MetricalTree"); + +// add_non_terminal_features(*utt.relation("MetricalTree"), tf); + + tf.set("time_path", "SylStructure"); + add_non_terminal_features(*utt.relation("SylStructure"), tf); + + add_feature_function(*utt.relation("Segment"), + "dur", + "standard+duration"); + + +} + +void auto_metrical_lex(EST_Utterance &utt) +{ + LISP l_pdef; + + utt.create_relation("Syllable"); + utt.create_relation("Segment"); + + l_pdef = siod_get_lval("darpa_fs", NULL); + lisp_to_features(l_pdef, phone_def); + + mettree_add_words(utt); + + LISP lt = siod_get_lval("us_base_int", NULL); + EST_String base_int; + if (lt == NIL) + base_int = "Syllable"; + else + { + const char *x = get_c_string(lt); + base_int = x; + } + + // add_end_silences(*utt.relation("Segment")); + + add_metrical_functions(utt); +} + +void extend_tree(EST_Item *m, EST_Item *p, const EST_String &terminal, + const EST_String &second_tree) +{ + EST_Item *d, *e; + + if (!daughter1(m)) + { + if (m->in_relation(terminal)) // ie. really hit the bottom + return; + m = m->as_relation(second_tree); // swap to a new tree + } + + for (d = daughter1(m); d; d = d->next()) + { + e = p->append_daughter(d); + extend_tree(d, e, terminal, second_tree); + } +} + + +static void nsr(EST_Item *n) +{ + EST_Item *left, *right; + left = daughter1(n); + right = daughter2(n); + if (left == 0) + return; + else + { + nsr(left); + left->set("MetricalValue","w"); + } + + if (right == 0) + return; + else + { + nsr(right); + right->set("MetricalValue","s"); + } +} + +static void apply_nsr(EST_Utterance &u, const EST_String &tree) +{ + EST_Item *n; + + for (n = u.relation(tree)->head(); n; n = n->next()) + nsr(n); +} + +EST_Item *other_daughter(EST_Item *parent, EST_Item *daughter) +{ + return (daughter1(parent) == daughter) ? daughter2(parent) : + daughter1(parent); +} + +static void stress_factor1(EST_Item *s, int max_depth) +{ + EST_Item *a; + EST_String val, pad; + char *str; + long n, i; + float max; + + val = ""; + + for (a = s; parent(a); a = parent(a)) + if (a->f("MetricalValue") == "s") + val += "2"; + else + val += "0"; + + // cout << "\nSyllable " << s << " has value " << val << endl; + + if (val.length() < max_depth) + for (pad = "", i = 0; i < (max_depth - val.length()); ++i) + pad += "2"; + + val += pad; + // cout << "Syllable " << s << " has padded value " << val << endl; + + str = strdup(val); + max = pow(3.0, (float)max_depth) - 1.0; + n = strtol(str, (char **)NULL, 3); + // cout << "decimal value: " << n; + // cout << " normalised: " << (float)n/max << endl; + s->set("StressFactor1", ((float)n/max)); +} + +EST_Item * find_apex(EST_Item *n, int &num_nodes) +{ + EST_Item *p; + p = parent(n); + if (p == 0) + return n; + if (daughter2(p) == n) + return find_apex(p, ++num_nodes); + + return p; +} + +void find_leaf(EST_Item *n, int &num_nodes) +{ + if (n == 0) + return; + find_leaf(daughter1(n), ++num_nodes); +} + +static void phrase_factor(EST_Item &syl, const EST_String &met_name) +{ + EST_Item *p; + EST_String val, pad; + int num_nodes = 1; + + // cout << "Terminal Syl = " << syl << " f:" << syl.f << endl; + + p = find_apex(syl.as_relation(met_name), num_nodes); + // cout << "up nodes: " << num_nodes; + // cout << "Apex = " << *p << endl; + find_leaf(daughter2(p), num_nodes); + // cout << " downp nodes: " << num_nodes << endl; + + syl.set("PhraseIndex", num_nodes); +} + +static int max_tree_depth(EST_Utterance &u, const EST_String &base_stream, + const EST_String &mettree) +{ + EST_Item *s, *a; + int depth; + int max_depth = 0; + + for (s = u.relation(base_stream)->head(); s; s = s->next()) + { + depth = 0; + for (a = s->as_relation(mettree); parent(a); a = parent(a)) + ++depth; + if (depth > max_depth) + max_depth = depth; + } + return max_depth; +} + +void stress_factor1(EST_Utterance &u, const EST_String &base_stream, + const EST_String &mettree) +{ + EST_Item *s; + int max_depth = max_tree_depth(u, base_stream, mettree); + + for (s = u.relation(base_stream)->head(); s; s = s->next()) + stress_factor1(s->as_relation(mettree), max_depth); +} + + +EST_Item *strong_daughter(EST_Item *n) +{ + if (daughter1(n) == 0) + return 0; + return (daughter1(n)->f("MetricalValue") == "s") + ? daughter1(n) : daughter2(n); +} + +EST_Item *weak_daughter(EST_Item *n) +{ + if (daughter1(n) == 0) + return 0; + return (daughter1(n)->f("MetricalValue") == "w") + ? daughter1(n) : daughter2(n); +} + +static void fill_mini_tree(EST_Item *s, int val) +{ + if (s->f("MetricalValue") == "s") + s->set("StressVal", val); + else + s->set("StressVal", 0); + if (strong_daughter(s)) + fill_mini_tree(strong_daughter(s), val); + + if (weak_daughter(s)) + fill_mini_tree(weak_daughter(s), val - 1); +} + +void stress_factor2(EST_Utterance &u, const EST_String &base_stream, + const EST_String &mettree) +{ + EST_Item *s; + int sv = -1; + float b; + (void) base_stream; + + s = u.relation(mettree)->head(); + fill_mini_tree(s, sv); + + // normalise values + sv = 0; + for (s = u.relation(base_stream)->head(); s; s = s->next()) + sv = Lof(s->I("StressVal"), sv); + + cout << "Max Stress: " << sv << endl; + + for (s = u.relation(base_stream)->head(); s; s = s->next()) + { + b = (float)(s->I("StressVal") - sv + 1); + if (s->f("MetricalValue") == "s") + s->set("StressFactor2", (b / float(sv)) * -1.0); + else + s->set("StressFactor2", 0); + } +} + +void phrase_factor(EST_Utterance &u, const EST_String &base_stream, + const EST_String &mettree) +{ + EST_Item *s; + float max_pf = 0; + + for (s = u.relation(base_stream)->head(); s; s = s->next()) + phrase_factor(*s, mettree); + + for (s = u.relation(base_stream)->head(); s; s = s->next()) + if (s->I("PhraseIndex") > max_pf) + max_pf = s->I("PhraseIndex"); + + for (s = u.relation(base_stream)->head(); s; s = s->next()) + { + s->set("PhraseFactor", + (float)s->I("PhraseIndex")/max_pf); + // cout << *s << " pf = " << + // s->F("PhraseFactor") << endl; + } + +} + +#if 0 +static void remove_punctuation(EST_Utterance &u) +{ + // The syntactic grammar has unary rules for the preterminals + // these would make the mtettrical tree have an extra layer + // at the word level. So here we remove that extra layer + EST_Item *w; + EST_Item *a, *b, *c, *od; + + for (w = u.relation("Word")->head(); w != 0; w = w->next()) + { + if (w->f("pos") == "punc") + { + a = w->as_relation("Syntax"); + b = parent(a); + c = parent(b); + od = other_daughter(c, b); + remove_item(b, "Syntax"); + move_sub_tree(od, c); + remove_item(w, "Word"); + } + } +} + +static void add_intonation(EST_Utterance &u, const EST_String &base_stream, + float threshold) +{ + EST_Item *e, *s; + + cout << "Threshold = " << threshold << endl; + + for (s = u.relation(base_stream)->head(); s; s = s->next()) + { + if (s->F("StressFactor") > threshold) + { + // cout << *s <<" **stress factor:" << s->F("StressFactor") << endl; + e = u.relation("IntSyl")->append(); + e->insert_below(s); + e->set_name("Accent"); + e->set("prominence", s->F("StressFactor")); + u.relation("Intonation")->append(e); + } + } +} + +#endif + +void add_monotone_targets(EST_Utterance &u, float start_f0, + float end_f0) +{ + EST_Item *t; + float end; + + end = u.relation("Segment")->tail()->f("end"); + + cout << "Phone ends\n"; + cout << *u.relation("Segment"); + + cout << "last position is :" << end << endl; + + u.create_relation("Target"); + + t = u.relation("Target")->append(); + t->set("f0", start_f0); + t->set("pos", 0.0); + + // temporary - should disappear when awb changes code + // t->set("name", ftoString(start_f0)); + // t->set("end", 0.0); + + t = u.relation("Target")->append(); + t->set("f0", end_f0); + t->set("pos", end); + + // temporary - should disappear when awb changes code + // t->set("name", ftoString(end_f0)); + // t->set("end", end); +} + +static void mettree_add_words(EST_Utterance &u) +{ + EST_Utterance word; + EST_Item *w; + + word.create_relation("Word"); + word.create_relation("Segment"); + word.create_relation("SylStructure"); + word.create_relation("Syllable"); + word.create_relation("WordStructure"); + + for (w = u.relation("Word")->head(); w != 0; w = w->next()) + { + word.clear_relations(); + + cout << "N:"; + cout << w->f("name") << " " << w->f("pos", "") << endl; + lex_to_phones(w->f("name"), w->f("pos", "0"), + *word.relation("Segment")); + + EST_Item *nw = word.relation("Word")->append(); + nw->set("name", w->S("name")); + + syllabify_word(nw, *word.relation("Segment"), + *word.relation("SylStructure"), + *word.relation("Syllable"), 0); + + subword_metrical_tree(nw, *word.relation("Syllable"), + *word.relation("WordStructure")); + + utterance_merge(u, word, w, word.relation("Word")->head()); + } +} + +void add_metrical_nodes(EST_Utterance &u, EST_Item *n, LISP lpos); + +EST_String strip_vowel_num(EST_String p) +{ + if (p.contains(RXint)) + p = p.before(RXint); + return p; +} + + +static void percolate(EST_Item *start) +{ + EST_Item *n; + + for (n = start; n; n = parent(n)) + { + // cout << "altering sister\n"; + if (prev(n) != 0) + prev(n)->set("MetricalValue", "w"); + else if (next(n) != 0) + next(n)->set("MetricalValue", "w"); + } +} + + +void main_stress(EST_Item *s) +{ + EST_Item *n; + + for (n = s; parent(n); n = parent(n)) + n->set("MetricalValue", "s"); + + n = s; + percolate(n); +} + +void footing(EST_Item *n1) +{ + EST_Item *n2, *n3, *n4, *p1, *p3, *r; + + r = parent(n1); // root node + p1 = daughter2(r); + n2 = daughter1(p1); + n3 = daughter2(p1); + + if (p1 == 0) + { + cerr << "Error: Empty 3rd node after " << *n1 << " in footing\n"; + return; + } + if (n2 == 0) + { + cerr << "Error: Empty 3rd node after " << *n1 << " in footing\n"; + return; + } + if (n3 == 0) + { + cerr << "Error: Empty 3rd node after " << *n1 << " in footing\n"; + return; + } + + cout << "n1: " << *n1 << endl << endl; + cout << "n2: " << *n2 << endl << endl; + cout << "n3: " << *n3 << endl << endl; + cout << "p1: " << *p1 << endl << endl; + + p3 = n1->insert_parent(); + n1 = daughter1(p3); + n4 = p3->append_daughter(); + + move_sub_tree(n2, n4); + move_sub_tree(n3, p1); + + p3->set("MetricalValue", "w"); + p3->set("Altered_a", "DONE_W"); + + n1->set("MetricalValue", "s"); + n1->set("Altered_b", "DONE_S"); +} + + +#if 0 +LISP FT_metrical_data(LISP lf_word, LISP lf_seg, LISP lf_int) + { + + EST_Utterance word, *u = new EST_Utterance; + EST_Relation phone; + EST_Item *s, *p, *w, *nw, *n; + EST_StrList plist; + float phone_start, mid; + LISP lutt; + EST_Track fz; + int i; + + u->create_relation("Word"); + u->create_relation("Segment"); + u->create_relation("Syllable"); + u->create_relation("MetricalTree"); + u->create_relation("LexicalMetricalTree"); + u->create_relation("SurfacePhone"); + u->create_relation("Surface"); + u->create_relation("Intonation"); + u->create_relation("IntonationSyllable"); + + EST_String segfile = get_c_string(lf_seg); + EST_String wordfile = get_c_string(lf_word); + + if (u->relation("Word")->load(wordfile) != format_ok) + { + cerr << "Couldn't load file " << get_c_string(lf_word) << endl; + festival_error(); + } + + if ((segfile != "dummy") &&(u->relation("Segment")-> + load(get_c_string(lf_seg)) != format_ok)) + { + cerr << "Couldn't load file " << get_c_string(lf_seg) << endl; + festival_error(); + } + + if (lf_int != NIL) + if (u->relation("Intonation")->load(get_c_string(lf_int)) != format_ok) + { + cerr << "Couldn't load file " << get_c_string(lf_int) << endl; + festival_error(); + } + + u->f.set("fileroot", basename(wordfile, "*")); + + // cout << "Words: " << *u->relation("Word"); + + if (segfile != "dummy") + phonemic_trans(*u->relation("Segment")); + // u->relation("Intonation")->load(get_c_string(lf_int)); + + // tmp hack + float prev_end = 0.0; + + for (w = u->relation("Word")->head(); w != 0; w = n) + { + n = w->next(); + // w->set("start", prev_end); + w->f_remove("end"); + // prev_end = w->F("end"); + if ((w->f("name") == "sil") || (w->f("name") == "pau")) + u->relation("Word")->remove_item(w); + } + + gc_protect(&lutt); + lutt = siod_make_utt(u); + + cout << *u->relation("Word") << endl; + + FT_POS_Utt(lutt); + FT_Phrasify_Utt(lutt); + MultiParse(*u); + + // remove_punctuation(*u); + + // Copy Syntax tree into a new Metrical Tree + copy_relation(*u->relations.val("Syntax"), + *u->relations.val("MetricalTree")); + // flatten preterminal unary rules + flatten_preterminals(*u); + + apply_nsr(*u); + + copy_relation(*u->relations.val("MetricalTree"), + *u->relations.val("LexicalMetricalTree")); + + word.create_relation("Word"); + word.create_relation("Match"); + word.create_relation("NewMatch"); + word.create_relation("Segment"); + word.create_relation("SurfacePhone"); + word.create_relation("LexicalSylStructure"); + word.create_relation("SurfaceSylStructure"); + word.create_relation("LexicalSyllable"); + word.create_relation("SurfaceSyllable"); + + word.create_relation("LexicalMetricalTree"); + word.create_relation("SurfaceMetricalTree"); + + phone_start = 0.0; + + // Note starts are hardwired here because feature function thing + // isn't fully operational and because deleting silence messes + // it up. + + // u->save("zz_parse.utt", "est"); + + if (segfile != "dummy") + { + for (s = u->relation("Segment")->head(); s; s = s->next()) + { + s->set("start", phone_start); + phone_start = s->F("end"); + } + phone_start = 0.0; + + s = u->relation("Segment")->head(); + if ((s->f("name") == "pau") || (s->f("name") == "sil")) + { + w = u->relation("SurfacePhone")->append(); + w->set("name", "pau"); + w->set("end", s->F("end")); + w->set("start", s->F("start")); + } + } + + // cout <<"Surface 1:" << *u->relation("SurfacePhone") << endl; + + for (i = 0, w = u->relation("Word")->head(); w != 0; w = w->next(), ++i) + { + word.clear_relations(); + + lex_to_phones(w->f("name"), w->f("pos"), + *word.relation("Segment")); + + if (segfile == "dummy") + *word.relation("SurfacePhone") = *word.relation("Segment"); + else + trans_to_phones(w, *u->relation("Segment"), + *word.relation("SurfacePhone")); + + // cout << "lex phones: " << *word.relation("LexicalPhone") << endl; + // cout << "sur phones: " << *word.relation("SurfacePhone") << endl; + + if (siod_get_lval("mettree_phones_debug", NULL) != NIL) + { + cout << "phones for word" << *w << endl; + cout << *word.relation("SurfacePhone") << endl; + } + + nw = word.relation("Word")->append(); + nw->set("name", w->S("name")); + + syllabify_word(nw, *word.relation("LexicalPhone"), + *word.relation("LexicalSylStructure"), + *word.relation("LexicalSyllable")); + + subword_metrical_tree(nw, *word.relation("LexicalSyllable"), + *word.relation("LexicalMetricalTree")); + + if (siod_get_lval("mettree_debug", NULL) != NIL) + word.save("word_lex.utt", "est"); + + // copy_relation(*word.relation("LexicalMetricalTree"), + // *word.relation("HackMT")); + + EST_Item xx; + dp_match(*word.relation("LexicalPhone"), + *word.relation("SurfacePhone"), + *word.relation("Match"), local_cost, &xx); + + if (syllabify_word(nw, *word.relation("SurfacePhone"), + *word.relation("SurfaceSylStructure"), + *word.relation("SurfaceSyllable")) < 1) + { + cerr << "Pronuciation for \"" << w->S("name") + << "\" doesn't contain a vowel: " << + *word.relation("SurfacePhone") << endl; + // festival_error(); + } + + fix_syllables(nw, word); + + subword_metrical_tree(nw, *word.relation("SurfaceSyllable"), + *word.relation("SurfaceMetricalTree")); + + + if (siod_get_lval("mettree_debug_word", NULL) != NIL) + word.save("word_dp.utt", "est"); + + if (siod_get_lval("mettree_debug_word", NULL) != NIL) + if (get_c_int(siod_get_lval("mettree_debug_word", NULL)) == i) + word.save("word_nth.utt", "est"); + + + utterance_merge(*u, word, w, "LexicalMetricalTree"); + } + + // u->save("zz_parse2.utt", "est"); + + // u->save("test.utt"); + + // cout <<"Surface 2:" << *u->relation("SurfacePhone") << endl; + + add_initial_silence(*u->relation("LexicalPhone"), + *u->relation("SurfacePhone"), + *u->relation("Match")); + + // cout <<"Surface 3:" << *u->relation("SurfacePhone") << endl; + + add_times(*u->relation("LexicalPhone"), *u->relation("SurfacePhone"), + *u->relation("Match")); + + u->relation("LexicalPhone")->f.set("timing_style", "segment"); + u->relation("SurfacePhone")->f.set("timing_style", "segment"); + // u->relation("Word")->f.set("timing_style", "segment"); + + u->relation("LexicalSyllable")->f.set("timing_style", "segment"); + u->relation("LexicalSyllable")->f.set("time_path", + "LexicalSylStructure"); + + // u->relation("LexicalSylStructure")->f.set("timing_style", "segment"); + // u->relation("LexicalSylStructure")->f.set("time_relation", + // "LexicalPhone"); + + + u->relation("LexicalMetricalTree")->f.set("timing_style", "segment"); + + + // if (lf_int != NIL) + // add_feature_function(*u->relation("LexicalSyllable"),"vowel_start", + // vowel_start_time); + + // add_feature_function(*u->relation("LexicalPhone"), "start", + // ff_start_time); + // add_feature_function(*u->relation("SurfacePhone"), "start", + // ff_start_time); + // add_feature_function(*u->relation("SurfacePhone"), "dur", + // duration_time); + + // add_feature_function(*u->relation("LexicalSyllable"),"end", leaf_end_time); + + EST_Features tf; + tf.set("time_path", "LexicalMetricalTree"); + tf.set("end", leaf_end_time); + + // add_feature_string(*u->relation("LexicalMetricalTree"), "time_path", + // "LexicalMetricalTree"); + // add_feature_string(*u->relation("LexicalSylStructure"), "time_path", + // "LexicalSylStructure"); + // + + add_non_terminal_features(*u->relation("LexicalMetricalTree"), + tf); + + tf.set("time_path", "LexicalSylStructure"); + + add_non_terminal_features(*u->relation("LexicalSylStructure"), + tf); + + + // add_feature_function(*u->relation("LexicalSyllable"),"start", + // ff_start_time); + // add_feature_function(*u->relation("LexicalSyllable"),"dur", duration_time); + + + // cout << "ADDED Features to phone\n\n"; + // cout << *(u->relation("LexicalPhone")) << endl << endl; + + // cout << "ADDED Features\n\n"; + // cout << *(u->relation("LexicalSyllable")); + + // cout << "\nfinished\n\n"; + + // if (lf_int != NIL) + // add_trans_intonation(*u); + + // cout <<"Lexical 3:" << *u->relation("LexicalPhone") << endl; + + // end_to_dur(*u->relation("SurfacePhone")); + // end_to_dur(*u->relation("LexicalPhone")); + + + + + // cout <<"Lexical 3:" << *u->relation("LexicalPhone") << endl; + + // clear_feature(*u->relation("SurfacePhone"), "end"); + // clear_feature(*u->relation("LexicalPhone"), "end"); + + if (siod_get_lval("mettree_debug", NULL) != NIL) + u->save("met_data.utt", "est"); + + gc_unprotect(&lutt); + + // u->save("zz_parse3.utt", "est"); + + return lutt; + } +#endif diff --git a/src/modules/UniSyn_phonology/subword.cc b/src/modules/UniSyn_phonology/subword.cc new file mode 100644 index 0000000..718375b --- /dev/null +++ b/src/modules/UniSyn_phonology/subword.cc @@ -0,0 +1,216 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1998 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black and Paul Taylor */ +/* Date : February 1998 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* An implementation of Metrical Tree Phonology */ +/* */ +/*=======================================================================*/ + +#include "festival.h" + +void main_stress(EST_Item *s); + +void subword_metrical_tree(EST_Relation &syllable, + EST_Relation &metricaltree); + +EST_Item *make_foot(EST_Item *w, EST_Item *met_node, EST_Item *next_syl_node); +void subword_metrical_tree(EST_Item *w, EST_Relation &syllable, + EST_Relation &metricaltree); + +static void all_stress(EST_Relation &syllable, EST_Relation &mettree); + +static void make_super_foot(EST_Item *w, EST_Item *met_node, + EST_Item *next_syl_node); + +// Note: this function doesn't work as it should: currently a +// monosyllabic word ends up being the same item as the syllable, when +// a single daughter item would be better. + +void subword_list(EST_Item *w, EST_Relation &syllable, + EST_Relation &metricaltree) +{ + EST_Item *s, *n; + + n = metricaltree.append(w); + + if (next(syllable.head()) == 0) + return; + + for (s = syllable.head(); s ; s = s->next()) + { + cout << "appending syl\n"; + n->append_daughter(s); + } +} + +void subword_metrical_tree(EST_Item *w, EST_Relation &syllable, + EST_Relation &metricaltree) +{ + EST_Item *s; + EST_Item *new_leaf; + + // single syllable + if (next(syllable.head()) == 0) + { + new_leaf = metricaltree.append(w); + return; + } + + // absorb initial unstressed syllables + for (s = syllable.head(); s && (s->f("stress_num") == 0); s = s->next()) + { + new_leaf = metricaltree.append(s); + new_leaf->set("MetricalValue", "w"); + } + + while (s) + { + new_leaf = metricaltree.append(s); + new_leaf->set("MetricalValue", "s"); + s = make_foot(w, new_leaf, next(s)); + } + + if (siod_get_lval("mettree_debug", NULL) != NIL) + metricaltree.utt()->save("foot.utt", "est"); + + s = metricaltree.head(); + make_super_foot(w, s, next(s)); + + if (siod_get_lval("mettree_debug", NULL) != NIL) + metricaltree.utt()->save("super_foot.utt", "est"); + + all_stress(syllable, metricaltree); +} + + +EST_Item *make_foot(EST_Item *w, EST_Item *met_node, EST_Item *next_syl_node) +{ + EST_Item *new_parent; + EST_Item *fl; + + if (next_syl_node == 0) + return 0; + + if (next_syl_node->f("stress_num") == 0) + { + met_node->set("MetricalValue", "s"); + next_syl_node->set("MetricalValue", "w"); + + fl = first_leaf(met_node); + + if (next(next_syl_node)) + new_parent = met_node->insert_parent(); + else + { + if (prev(fl)) + new_parent = met_node->insert_parent(); + else + { + // cout << "making met node word node in foot\n"; + // cout << "foot root:" << *w << endl; + new_parent = met_node->insert_parent(); +// new_parent = met_node->insert_above(w); + merge_item(new_parent, w); + // cout << "foot root:" << *w << endl; + // cout << "foot root:" << *new_parent << endl; + } + + } + new_parent->append_daughter(next_syl_node); + + next_syl_node = make_foot(w, new_parent, next(next_syl_node)); + } + return next_syl_node; +} + +// construct left branching unlabelled tree using feet roots as terminals +static void make_super_foot(EST_Item *w, EST_Item *met_node, + EST_Item *next_syl_node) +{ + EST_Item *new_parent; + + if (next_syl_node == 0) + return; + + // make sure root node is w, i.e. word + if (next(next_syl_node)) + new_parent = met_node->insert_parent(); + else + { + // cout << "inserted word as root in super foot:" << *w << endl; + new_parent = met_node->insert_parent(); + + // KTH this crashes in linux + merge_item(new_parent, w); + // cout << "after inserted word as root in super foot:" << *w << endl; + // cout << "after inserted word as root in super foot:" << + // *new_parent << endl; + // w = new_parent->as_relation("Word"); + } + + new_parent->append_daughter(next_syl_node); + + make_super_foot(w, new_parent, next(new_parent)); +} + + +static void all_stress(EST_Relation &syllable, EST_Relation &mettree) +{ + EST_Item *n, *s; + int stress_num = -1; + + for (s = syllable.head(); s; s = s->next()) + if (s->I("stress_num",0) > stress_num) + stress_num = s->I("stress_num"); + + // cout << "max stress num:" << stress_num << endl; + + for (; stress_num > 0; --stress_num) + { + for (s = syllable.head(); s; s = s->next()) + if (s->I("stress_num",0) == stress_num) + break; + + if (s == 0) + { + cerr << "No main stress found in definition of lexical entry\n"; + festival_error(); + } + + n = s->as_relation(mettree.name()); + main_stress(n); + } +} + diff --git a/src/modules/UniSyn_phonology/syllabify.cc b/src/modules/UniSyn_phonology/syllabify.cc new file mode 100644 index 0000000..bf02ecf --- /dev/null +++ b/src/modules/UniSyn_phonology/syllabify.cc @@ -0,0 +1,666 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1998 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black and Paul Taylor */ +/* Date : February 1998 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* An implementation of Metrical Tree Phonology */ +/* */ +/*=======================================================================*/ + +#include +#include "festival.h" +#include "lexicon.h" + +extern EST_Features phone_def; + +static int nucleus_count(EST_Relation &phone) +{ + int v = 0; + for (EST_Item *l = phone.head(); l; l = l->next()) + if (l->S("df.syllabic") == "+") + ++v; + + return v; +} + +static bool legal_c1(EST_Item *x) +{ + return (x->S("df.syllabic") == "-"); +} + +static bool legal_c2(EST_Item *c1, EST_Item *x) +{ + if (x->S("df.syllabic") == "+") + return false; + + if ((x->name() == "s") && ((c1->name() == "l") || (c1->name() == + "w") || (c1->name() == "p") || (c1->name() == + "t") || (c1->name() == "k") || (c1->name() == + "m") || (c1->name() == "n") || (c1->name() == + "r") || (c1->name() == "f"))) + return true; + + + if ((c1->S("df.manner") == "approximant") && + (x->S("df.manner") == "stop") || (x->name() == "th") + || (x->name() == "f")) +// if (ph_is_semivowel(c1->name()) && (ph_is_stop(x->name()) +// || (x->name() == "th") +// || (x->name() == "f"))) + { + if (x->name() == "y") + return false; + + if ((c1->name() == "l") && ((x->name() == "t") || + (x->name() == "d") || (x->name() == "th"))) + return false; + + if ((c1->name() == "w") && ((x->name() == "p") || + (x->name() == "b") || (x->name() == "f"))) + return false; + return true; + } + // for "vroom" + if ((c1->name() == "r") && (x->name() == "v")) + return true; + return false; +} +static bool legal_c3(EST_Item * c1, EST_Item * c2, EST_Item *pos) +{ + (void) c1; + if (pos->name() != "s") + return false; + if ((c2->S("df.manner") == "stop") && (c2->S("df.voicing") == "-")) + return true; + return false; +} + +EST_Item *make_onset(EST_Item *syl_struct_root, EST_Item *nucleus, int flat) +{ + EST_Item *c1, *c2, *c3; + EST_Item *p, *onset; + + // if first syllable in word, put all prevocalic segments in onset + if ((prev(syl_struct_root) == 0) && prev(nucleus)) + { + if (flat) + onset = syl_struct_root; + else + { + onset = daughter1(syl_struct_root)->insert_before(); + onset->set("sylval", "Onset"); + } + // tmpeorary hack because of lack of prepend daughter fn. + EST_Item *s; + for (s = nucleus->prev(); prev(s); s = s->prev()); + for (c1 = s; c1 != nucleus; c1 = next(c1)) + onset->append_daughter(c1); + return 0; + } + + c1 = prev(nucleus); + if (c1 == 0) + return c1; + +// if (ph_is_vowel(c1->name())) +// return next(c1); + if (c1->S("df.syllabic") == "+") + return next(c1); + +// if (c1->S("df.type") == "vowel") +// return next(c1); + + if (flat) + onset = syl_struct_root; + else + { + onset = daughter1(syl_struct_root)->insert_before(); + onset->set("sylval", "Onset"); + } + + // add first consonant + if (legal_c1(c1)) + p = onset->append_daughter(c1); + else + return nucleus; + + // add second consonant + c2 = prev(c1); + if (c2 == 0) + return 0; + if (legal_c2(c1, c2)) + p = p->insert_before(c2); + else + return c1; + + // add third consonant (s) + c3 = prev(c2); + if (c3 == 0) + return 0; + + // add third consonant +// if (legal_c3(c1->name(), c2->name(), c3->name())) +// p = p->insert_before(c3); + if (legal_c3(c1, c2, c3)) + p = p->insert_before(c3); + else + return c2; + + return c3; +} + +static void make_nucleus(EST_Item *syl_struct_root, EST_Item *nucleus, + int flat) +{ + EST_Item *m; + + if (flat) + m = syl_struct_root; + else + { + // add rhyme + m = syl_struct_root->append_daughter(); + m->set("sylval", "Rhyme"); + + // add nucleus + m = m->append_daughter(); + m->set("sylval", "Nucleus"); + } + + m->append_daughter(nucleus); +} + +static void make_coda(EST_Item *first_coda, EST_Item *first_onset, + EST_Item *syl_struct_root, int flat) +{ + EST_Item *m=0;; + + if ((first_coda != 0) && (first_coda != first_onset)) + { + if (flat) + m = syl_struct_root; + else + { + m = daughter1(syl_struct_root); + if (m->f("sylval") != "Rhyme") + m = daughter2(syl_struct_root); + + m = m->append_daughter(); + m->set("sylval", "Coda"); + } + } + + for (; (first_coda != 0) && (first_coda != first_onset); + first_coda = first_coda->next()) + m->append_daughter(first_coda); +} + +// if "flat" is set to 1 a tree syllable structure is built, otherwise +// a flat list like structure is built. + +int syllabify_word(EST_Item *w, EST_Relation &phone, + EST_Relation &sylstructure, + EST_Relation &syllable, + int flat) +{ + EST_Item *prev_syl, *this_syl, *l, *this_struct, *prev_struct; + EST_Item *first_onset, *first_coda = 0; + EST_String v; + +// cout << "phones: " << phone << endl; + + int count = nucleus_count(phone); + + prev_struct =first_onset = 0; + + if (count < 1) + return 0; + + for (prev_syl = 0, l = phone.head(); l; l = l->next()) + { +// cout << "type " << l->S("name") << ": " << l->S("df.type") << endl; + if (l->S("df.syllabic") == "+") +// if (ph_is_vowel(l->name())) + { + + if (count == 1) + this_syl = syllable.append(w); + else + this_syl = syllable.append(); + this_struct = sylstructure.append(this_syl); + this_syl->set("stress_num", l->I("stress_num")); + + // note: it must be in this order +// cout << "this struct: " << *this_syl << endl; +// cout << "this struct: " << *this_struct << endl; + make_nucleus(this_struct, l, flat); + + first_onset = make_onset(this_struct, l, flat); + + make_coda(first_coda, first_onset, prev_struct, flat); + + prev_syl = this_syl; + prev_struct = this_struct; + first_coda = l->next(); + } + } + + make_coda(first_coda, first_onset, prev_struct, flat); + return count; +} + +void fix_lex_string(LISP lpos, EST_StrList &s) +{ + LISP a, b, c; + EST_String p; + + for (a = car(lpos); a != NIL; a = cdr(a)) + { +// cout << "1:\n"; +// lprint(a); +// for (b = car(a); b != NIL; b = cdr(b)) +// { + b = car(a); +// cout << "0:\n"; +// lprint(b); + for (c = car(b); c != NIL; c = cdr(c)) + { + p = get_c_string(car(c)); + if (ph_is_vowel(p)) + p += "0"; + s.append(p); + } + } +// cout << "def list: " << s << endl; +} + +//Adds phoneme name to syllable as a string +void add_syllable_name(EST_Item *syl, const EST_String &fname) +{ + EST_Item *s, *p; + + s = syl->as_relation("SylStructure"); + + for (p = first_leaf_in_tree(s); p != next_leaf(last_leaf_in_tree(s)); + p = next_leaf(p)) + if (p == first_leaf_in_tree(s)) + s->set(fname, p->S("name")); + else + s->set(fname, s->S(fname) + " " + p->S("name")); +} + +void lex_to_phones(const EST_String &name, const EST_String &pos, + EST_Relation &phone) +{ + LISP entry, lpos; + EST_Item *p; + EST_StrList lex_def; + EST_String lex_phone; + + if (pos != "0") + lpos = rintern(pos); + else + lpos = NIL; + entry = lex_lookup_word(name, lpos); + + lpos = cdr(cdr(entry)); + + if (!siod_atomic_list(car(lpos))) + fix_lex_string(lpos, lex_def); + else + siod_list_to_strlist(car(lpos), lex_def); + + for (EST_Litem *sl = lex_def.head(); sl; sl = sl->next()) + { + p = phone.append(); + lex_phone = lex_def(sl); + if (lex_phone.contains(RXint)) + { + p->set("name", lex_phone.before(RXint)); + p->set("stress_num", lex_phone.at(RXint)); + } + else + p->set("name", lex_phone); + + // df = "distinctive features" + if (phone_def.present(p->S("name"))) + p->set("df", phone_def.A(p->S("name"))); + else + EST_error("Word \"%s\" has illegal phoneme \"%s\" in lexicon\n", + (const char *)name, (const char *)p->S("name")); + } +} + + +/*static bool legal_c3(EST_String c1, EST_String c2, EST_String pos) +{ + (void) c1; + if (pos != "s") + return false; + if (ph_is_stop(c2) && (!ph_is_voiced(c2))) + return true; + return false; +} +*/ + + + +/*static int vowel_count(EST_StrList &full) +{ + int v = 0; + EST_Litem *l; + + for (l = full.head(); l; l = l->next()) + if (ph_is_stress_vowel(full(l))) + ++v; + return v; +} + + +static bool ph_is_s(const EST_String &p) +{ + return (p == "s") ? true : false; +} +*/ + +/*void subword_phonology(EST_Utterance &word) +{ + word.create_relation("MetricalTree"); + word.create_relation("SylStructure"); + word.create_relation("Syllable"); + + syllabify_word(word); + subword_metrical_tree(word); + + if (siod_get_lval("mettree_debug", NULL) != NIL) + word.save("word.utt", "est"); +} +*/ + +/*bool vowel(EST_String p) +{ + if (p.contains(RXint)) + p = p.before(RXint); + return ph_is_vowel(p); +} +*/ + +// Add phones, and sylstructure for a single syllable + + +/*bool met_node_is_leaf(EST_Item *met_node) +{ + return met_node->in_relation("Syllable"); +} +*/ + + +/*static int sonority(EST_String p) +{ + if (p.contains(RXint)) + p = p.before(RXint); + if (ph_is_vowel(p)) + return 6; + if (ph_is_liquid(p) || ph_is_approximant(p)) + return 5; + if (ph_is_nasal(p)) + return 4; + if (ph_is_fricative(p) && (!ph_is_s(p))) + return 3; + if (ph_is_stop(p)) + return 2; + return 1; +} + +// Parse arbitrary phone string with numbered vowels into +// syllable, phone and sylstructure relations + +static bool ph_is_semivowel(EST_String c1) +{ + return (ph_is_liquid(c1) || (ph_is_approximant(c1))); +} + +static bool ph_is_s(EST_String c1) +{ + return (c1 == "s"); +} +*/ +/*static bool ph_is_stress_vowel(EST_String p) +{ + if (p.contains(RXint)) + p = p.before(RXint); +// cout << "p = " << p << endl; + return (ph_is_vowel(p)); +} + + +static int vowel_count(EST_Relation &phone) +{ + int v = 0; + for (EST_Item *l = phone.head(); l; l = l->next()) + if (ph_is_vowel(l->name())) + ++v; + return v; +} +*/ +/*static void syllabify_word(EST_Utterance &word) +{ + EST_Item *prev_syl, *this_syl, *l; + EST_Item *first_onset, *first_coda = 0; + EST_String v; + + first_onset = 0; + + if (nucleus_count(*word.relation("Phone")) < 1) + { + cerr << "Error: Pronunciation for " << + *(word.relation("Word")->head()) << " does not contain vowel\n"; + festival_error(); + } + + for (prev_syl = 0, l = word.relation("Phone")->head(); l; l = l->next()) + { + cout << "syl: " << l->S("name") << ": " << l->S("df.syllabic", 1) + << endl; + if (l->S("df.syllabic") == "+") + { + this_syl = word.relation("Syllable")->append(); + word.relation("SylStructure")->append(this_syl); + this_syl->set("stress_num", l->I("stress_num")); + + // note: it must be in this order + make_nucleus(this_syl->as_relation("SylStructure"), l); + + first_onset = make_onset(this_syl->as_relation("SylStructure"),l); + + make_coda(first_coda, first_onset, + prev_syl->as_relation("SylStructure")); + + prev_syl = this_syl; + first_coda = l->next(); + } + } + + make_coda(first_coda, first_onset, prev_syl->as_relation("SylStructure")); +} +*/ + + +/*void convert_cmu_lex_to_utt(EST_Utterance &u, EST_Item *w) +{ + LISP entry, lpos; + EST_String pos; + EST_Utterance word; + EST_Item *nw, *p; + EST_StrList lex_def; + + pos = w->f("pos"); + if (pos != "0") + lpos = rintern(pos); + else + lpos = NIL; + entry = lex_lookup_word(w->name(), lpos); + + lprint(entry); + + lpos = cdr(cdr(entry)); + + printf("lpos\n"); + lprint(lpos); + + word.create_relation("Word"); + word.create_relation("MetricalTree"); + word.create_relation("SylStructure"); + word.create_relation("Syllable"); + word.create_relation("Phone"); + + nw = word.relation("Word")->append(); + nw->set("name", w->S("name")); + + // parse_lex_string(word, car(lpos)); + EST_String def, lex_phone; + + // cout << "DEF\n"; + lprint(lpos); + + + // make phone relation + siod_list_to_strlist(car(lpos), lex_def); + + for (EST_Litem *sl = lex_def.head(); sl; sl = sl->next()) + { + p = word.relation("Phone")->append(); + lex_phone = lex_def(sl); + if (lex_phone.contains(RXint)) + { + p->set("name", lex_phone.before(RXint)); + p->set("stress_num", lex_phone.at(RXint)); + } + else + p->set("name", lex_phone); + } + + syllabify_word(word); + + // cout << "before MT: " << *w << " F:" << w->f << endl; + // cout << "before MT: " << *(word.relation("Syllable")->head()->Info()) + // << " F:" << word.relation("Syllable")->head()->Info()->f << endl; + + subword_metrical_tree(word); + + // temporary hack because single syllable words don't get a metrical + // node + // if ((word.relation("MetricalTree")->head() == 0) && + // (word.relation("Syllable")->head() != 0)) + // word.relation("MetricalTree")->append(word.relation("Syllable")->head()->Info()); + + if (siod_get_lval("mettree_debug", NULL) != NIL) + word.save("word.utt", "est"); + // u.save("before.utt", "est"); + // cout << "before merge: " << *w << " F:" << w->f << endl; + // cout << "before syl: " << *(word.relation("Syllable")->head()->Info()) + // << " F:" << word.relation("Syllable")->head()->Info()->f << endl; + + utterance_merge(u, word, w, "MetricalTree"); + // cout << "after merge: " << *w << " F:" << w->f << endl; + // cout << "after syl: " << *(word.relation("Syllable")->head()->Info()) + // << " F:" << word.relation("Syllable")->head()->Info()->f << endl; + + // u.save("after.utt", "est"); +} + +*/ + +/*static void subword_metrical_tree(EST_Utterance &word) +{ + EST_Item *s; + EST_Item *new_leaf; + + // cout << endl<< endl << *(word.relation("Word")->head()->Info()) << endl; + + // cout << "head: " << word.relation("Syllable")->head() << endl; + // cout << "head: " << *(word.relation("Syllable")->head()->Info()) << endl; + // cout << "pre foot iteration:" << *(s->Info()) << " stress: " << s->Info()->f("stress_num") << endl << endl; + + // absorb initial unstressed syllables + for (s = word.relation("Syllable")->head(); + s && (s->f("stress_num") == 0); s = s->next()) + { + // cout << "**1 syl:" << *s << endl; + new_leaf = word.relation("MetricalTree")->append(s); + new_leaf->set("MetricalValue", "w"); + } + + // cout << "utt to now 1c: " << word << endl; + // In a multi-syllable word + + if (next(word.relation("Syllable")->head())) + { + //s = word.relation("Syllable")->head(); + // cout << "**2 syl:" << *s << endl; + for (; s;) + { + cout << "**3 syl:" << *s << endl; + cout << "foot iteration\n" << *s << endl << endl; + new_leaf = word.relation("MetricalTree")->append(s); + new_leaf->set("MetricalValue", "s"); + // s = make_foot(new_leaf, next(s)); + } + } + + else if (s) // For single syllable words + { + // cout << "adding single node\n" << *s << endl << endl; + new_leaf = word.relation("MetricalTree")->append(s); + // cout << "added node\n" << *s << endl << endl; + } + + // cout << "utt to now 2: " << word << endl; + + if (siod_get_lval("mettree_debug", NULL) != NIL) + word.save("sub_word.utt", "est"); + + s = word.relation("MetricalTree")->head(); + + // make_super_foot(s, next(s)); + if (siod_get_lval("mettree_debug", NULL) != NIL) + word.save("super_foot.utt", "est"); + + // all_stress(word); +} +*/ + + + + diff --git a/src/modules/UniSyn_phonology/unisyn_phonology.mak b/src/modules/UniSyn_phonology/unisyn_phonology.mak new file mode 100644 index 0000000..19912c1 --- /dev/null +++ b/src/modules/UniSyn_phonology/unisyn_phonology.mak @@ -0,0 +1,50 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1998 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## -------------------------------------------------------------------- ## + ## Make definitions to include unisyn_phonology. ## + ## ## + ########################################################################### + +INCLUDE_UNISYN_PHONOLOGY=1 + +MOD_DESC_Unisyn_phonology=Phonology for Unisyn + +ifeq ($(DIRNAME),src/modules) + EXTRA_LIB_BUILD_DIRS := $(EXTRA_LIB_BUILD_DIRS) UniSyn_phonology +endif + + + + diff --git a/src/modules/UniSyn_phonology/unisyn_tilt.cc b/src/modules/UniSyn_phonology/unisyn_tilt.cc new file mode 100644 index 0000000..970fd9a --- /dev/null +++ b/src/modules/UniSyn_phonology/unisyn_tilt.cc @@ -0,0 +1,79 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Paul Taylor */ +/* Date : July 1998 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Tilt synthesis code */ +/* */ +/*=======================================================================*/ + +#include +#include "festival.h" +#include "EST_tilt.h" +#include "../UniSyn/us_features.h" + +void vowel_tilt_to_abs_tilt(EST_Utterance &u) +{ + EST_Item *s, *t; + float pos; + + for (t = u.relation("Intonation")->head(); t; t = t->next()) + { + if (t->as_relation("IntonationSyllable")) + { + s = daughter1(t->as_relation("IntonationSyllable")); + s = s->as_relation("Syllable"); + pos = t->F("tilt:position"); + t->set("position", pos); + // cout << "pos: " << t->fF("position") << " rel:" << + //t->fF("rel_pos") << " vowel start:" << s->fF("vowel_start") + //<< endl; + } + } +} + +void tilt_to_f0(EST_Relation &intonation, EST_Relation &f0) +{ + EST_Track *fz = new EST_Track; + + tilt_synthesis(*fz, intonation, 0.01, 0); + + fz->save("tilt.f0", "est"); + + EST_Item *t = f0.append(); + t->set("name", "f0"); + t->set_val("f0", est_val(fz)); +} + + diff --git a/src/modules/UniSyn_phonology/us_aux.cc b/src/modules/UniSyn_phonology/us_aux.cc new file mode 100644 index 0000000..32c28c5 --- /dev/null +++ b/src/modules/UniSyn_phonology/us_aux.cc @@ -0,0 +1,136 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1998 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black and Paul Taylor */ +/* Date : February 1998 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* An implementation of Metrical Tree Phonology */ +/* */ +/*=======================================================================*/ + +#include +#include "festival.h" +#include "us_duration.h" + + + + +int num_daughters(EST_Item *s) +{ + EST_Item *d1, *dn, *d; + + d1 = daughter1(s); + if (d1 == 0) + return 0; + + dn = daughtern(s); + + int n = 1; + for (d = d1; d != dn; d = d->next()) + ++n; + return n; +} + + +void phones_in_word(EST_Item *w, const EST_String &met_name, + const EST_String &ss_name, const EST_String &seg_name, + EST_Item **first_p, EST_Item **last_p) +{ + EST_Item *s, *t; + + cout << "&word: " << w << endl; + cout << "in relation: " << w->in_relation(met_name) << endl; + w = w->as_relation(met_name); + cout << "&word: " << w << endl; + + if (w == 0) + EST_error("Word isn't in metrical tree\n"); + + cout << "word: " << *w << endl; + cout << "d1: " << daughter1(w) << endl; + if (daughter1(w)) + cout << "*d1: " << daughter1(w) << endl; + cout << "d2: " << daughter2(w) << endl; + if (daughter2(w)) + cout << "*d2: " << daughter2(w) << endl; + *first_p = 0; + + for (s = first_leaf_in_tree(w); s != next_leaf(last_leaf_in_tree(w)); + s = next_leaf(s)) + { + cout << "leaf: " << *s << endl; + cout << "in ss relation: " << s->in_relation(ss_name) << endl; +// cout << "relations: " << s->relations() << endl; + + cout << "first leaf: " << *first_leaf_in_tree(s->as_relation(ss_name)) << endl; + cout << "last leaf: " << *last_leaf_in_tree(s->as_relation(ss_name)) + << endl; + for (t = first_leaf_in_tree(s->as_relation(ss_name)); + t !=next_leaf(last_leaf_in_tree(s->as_relation(ss_name))); + t = next_leaf(t)) + { + *last_p = t->as_relation(seg_name); + cout << "phone: " << *t << endl; + if (*first_p == 0) + *first_p = t->as_relation(seg_name); + } + } + + cout << "word: " << *w << endl; + cout << "first: " << **first_p << " last " << **last_p << endl; + + cout << "\n\n"; +} + +#if 0 +static EST_Item *onset(EST_Item * ss_root) +{ + return (daughter1(ss_root)->f("sylval") == "Onset")? daughter1(ss_root) : 0; +} + +static EST_Item *rhyme(EST_Item * ss_root) +{ + return (daughter2(ss_root) == 0) ? daughter1(ss_root) : daughter2(ss_root); +} + +static EST_Item *nucleus(EST_Item * ss_root) +{ + return daughter1(rhyme(ss_root)); +} + +static EST_Item *coda(EST_Item * ss_root) +{ + return daughter2(rhyme(ss_root)); +} + +#endif diff --git a/src/modules/UniSyn_phonology/us_duration.cc b/src/modules/UniSyn_phonology/us_duration.cc new file mode 100644 index 0000000..ec377e1 --- /dev/null +++ b/src/modules/UniSyn_phonology/us_duration.cc @@ -0,0 +1,210 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1998 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black and Paul Taylor */ +/* Date : February 1998 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* An implementation of Metrical Tree Phonology */ +/* */ +/*=======================================================================*/ + +#include +#include "festival.h" +#include "us_duration.h" + +float phone_z_score(const EST_String &p, float dur) +{ + float mean, sd; + mean = met_duration.val(p).val("mean"); + sd = met_duration.val(p).val("sd"); + return ((dur - mean) / sd); +} + +void clear_feature(EST_Relation &r, const EST_String &name) +{ + for (EST_Item *p = r.head(); p ; p = p->next()) + p->f_remove(name); +} + +// void dur_to_end(EST_Relation &r) +// moved to UniSyn us_diphone.cc + +void end_to_dur(EST_Relation &r) +{ + float prev_end = 0; + + for (EST_Item *p = r.head(); p ; p = p->next()) + { + p->set("dur", p->F("end") - prev_end); + prev_end = p->F("end"); + } +} + +void assign_phone_z_scores(EST_Utterance &u, const EST_String &seg_name) +{ + EST_Item *s; + + end_to_dur(*u.relation(seg_name)); + + for (s = u.relation(seg_name)->head(); s; s = s->next()) + s->set("z_score", phone_z_score(s->f("name"), s->F("dur"))); +} + +void promote_mean_z_score(EST_Utterance &u, const EST_String &st_name, + const EST_String &syl_name) +{ + EST_Item *p, *s, *l; + float z, n; + + + for (s = u.relation(syl_name)->head(); s; s = s->next()) + { + p = s->as_relation(st_name); + z = 0.0; + for (n = 1, l = first_leaf_in_tree(p); l!= last_leaf_in_tree(p); + l = next_leaf(l), n += 1.0) + z += l->F("z_score"); + + z += l->F("z_score"); + z = z / n; + s->set("m_z_score", z); + + +// n = named_daughter(s->as_relation(st_name), "sylval", "Rhyme"); +// n = daughter1(named_daughter(n, "sylval", "Nucleus")); +// s->set("z_score", n->F("z_score")); + + } +} + +void promote_vowel_z_score(EST_Utterance &u, const EST_String &st_name, + const EST_String &syl_name) +{ + EST_Item *n, *s; + + for (s = u.relation(syl_name)->head(); s; s = s->next()) + { + n = named_daughter(s->as_relation(st_name), "sylval", "Rhyme"); + n = daughter1(named_daughter(n, "sylval", "Nucleus")); + s->set("z_score", n->F("z_score")); + } +} + + +// set everything to its phone's mean duration +LISP FT_met_dur_predict_1(LISP lutt, LISP lrel) +{ + EST_Utterance *utt = get_c_utt(lutt); + EST_String rel = get_c_string(lrel); + EST_Item *p; + + clear_feature(*utt->relation(rel), "dur"); + clear_feature(*utt->relation(rel), "end"); + + for (p = utt->relation(rel)->head(); p ; p = p->next()) + p->set("dur", met_duration.val(p->f("name")).F("mean")); + + cout << "dur end\n"; + + dur_to_end(*utt->relation(rel)); + + return lutt; +} + +LISP FT_met_dur_predict_2(LISP lutt, LISP lrel) +{ + EST_Utterance *utt = get_c_utt(lutt); + EST_String rel = get_c_string(lrel); + + clear_feature(*utt->relation(rel), "dur"); + clear_feature(*utt->relation(rel), "end"); + + for (EST_Item *p = utt->relation(rel)->head(); p ; p = p->next()) + p->set("dur", 0.2); + + dur_to_end(*utt->relation(rel)); + + return lutt; +} + +typedef +float (*local_cost_function)(const EST_Item *item1, + const EST_Item *item2); + +float local_cost(const EST_Item *s1, const EST_Item *s2); + +bool dp_match(const EST_Relation &lexical, + const EST_Relation &surface, + EST_Relation &match, + local_cost_function lcf, + EST_Item *null_syl); + +void add_times(EST_Relation &lexical, EST_Relation &surface, + EST_Relation &match); + +LISP FT_nat_dur_predict(LISP lutt, LISP lrel_name, LISP llab_file) +{ + EST_Utterance *utt = get_c_utt(lutt); + EST_String rel_name = get_c_string(lrel_name); + EST_String lab_file = get_c_string(llab_file); + EST_Relation lab, *segment, match, *ulab, *umatch; + + utt->create_relation("Match"); + utt->create_relation("Lab"); + + if (utt->relation("Lab")->load(lab_file) != format_ok) + festival_error(); +// if (lab.load(lab_file) != format_ok) +// festival_error(); + + EST_Item xx; + + segment = utt->relation(rel_name); + ulab = utt->relation("Lab"); + umatch = utt->relation("Match"); + + clear_feature(*segment, "dur"); + clear_feature(*segment, "end"); + + dp_match(*segment, *ulab, *umatch, local_cost, &xx); + add_times(*segment, *ulab, match); + + utt->remove_relation("Match"); + utt->remove_relation("Lab"); + +// dp_match(*segment, lab, match, local_cost, &xx); +// add_times(*segment, lab, match); + + return lutt; +} + diff --git a/src/modules/UniSyn_phonology/us_duration.h b/src/modules/UniSyn_phonology/us_duration.h new file mode 100644 index 0000000..3b24c30 --- /dev/null +++ b/src/modules/UniSyn_phonology/us_duration.h @@ -0,0 +1,61 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1998 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black and Paul Taylor */ +/* Date : February 1998 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* An implementation of Metrical Tree Phonology */ +/* */ +/*=======================================================================*/ + +#ifndef __UNISYN_H__ +#define __UNISYN_H__ + + +#include "festival.h" + +bool operator == (const EST_Features &a, const EST_Features &b); + +typedef EST_TList EST_UttList; +typedef EST_TKVL EST_FeatureList; + +EST_Item *named_daughter(EST_Item *syl, const EST_String &fname, + const EST_String &fval); + +void dur_to_end(EST_Relation &r); +void end_to_dur(EST_Relation &r); +float phone_z_score(const EST_String &p, float dur); + +extern EST_FeatureList met_duration; + +#endif diff --git a/src/modules/VCLocalRules b/src/modules/VCLocalRules new file mode 100755 index 0000000..53a8cea --- /dev/null +++ b/src/modules/VCLocalRules @@ -0,0 +1,6 @@ +# Special extra rules for this directory + +subtypes.cc : . utilities\find_subtypes.exe + @echo "Making subtypes.cc" + utilities\find_subtypes $(DIRS) - $(ABSTRACT_TYPES) >subtypes.cc + diff --git a/src/modules/base/Makefile b/src/modules/base/Makefile new file mode 100644 index 0000000..086b7d1 --- /dev/null +++ b/src/modules/base/Makefile @@ -0,0 +1,51 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +TOP=../../.. +DIRNAME=src/modules/base +H = +TSRCS = +SRCS = modules.cc module_support.cc parameters.cc ff.cc \ + pos.cc phrasify.cc word.cc postlex.cc phrinfo.cc $(TSRCS) +OBJS = $(SRCS:.cc=.o) + +FILES = Makefile $(SRCS) $(H) + +LOCAL_INCLUDES = -I../include + +INLIB = $(TOP)/src/lib/libFestival.a + +ALL = .buildlib + +include $(TOP)/config/common_make_rules + + diff --git a/src/modules/base/ff.cc b/src/modules/base/ff.cc new file mode 100644 index 0000000..5bfa97e --- /dev/null +++ b/src/modules/base/ff.cc @@ -0,0 +1,998 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : May 1998 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Basic builtin features */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "modules.h" + +static EST_String stressname("stress"); +static EST_Val val_string0("0"); +static EST_Val val_string1("1"); +static EST_Val val_int0(0); +static EST_Val val_int1(1); +static EST_Val default_val_float(0.0); + +static EST_Val ff_addr(EST_Item *i) +{ + char a[1024]; + + // The address of the contents so that the same item from different views + // have the same address + sprintf(a,"%p",i->contents()); + return EST_Val(a); +} + +static EST_Val ff_segment_duration(EST_Item *s) +{ + EST_Item *n = as(s,"Segment"); + if (n == 0) + { + cerr << "Asked for segment duration of item not in Segment relation." + << endl; + festival_error(); + } + if (n->prev() == 0) + return EST_Val(s->F("end", 0)); + else + return EST_Val(s->F("end", 0)-(n->prev()->F("end",0))); +} + +static EST_Val ff_syllable_duration(EST_Item *s) +{ + EST_Item *n = as(s,"SylStructure"); + if (n == 0) + { + cerr << "Asked for syllable duration of item not in SylStructure relation." + << endl; + festival_error(); + } + else + { + EST_Item *fd = daughter1(n); + EST_Item *ld = fd->last(); + + if (ld == 0) + return val_int0; + EST_Item *ps = as(fd,"Segment")->prev(); + if (ps == 0) + return ld->F("end",0); + else + return EST_Val(ld->F("end",0)-ps->F("end",0)); + } + + // dummy for stupid VC++ compiler + { + EST_Val junk; + return junk; + } +} + +static EST_Val ff_word_duration(EST_Item *s) +{ + EST_Item *n = as(s,"SylStructure"); + if (n == 0) + { + cerr << "Asked for word duration of item not in SylStructure relation." + << endl; + festival_error(); + } + else + { + EST_Item *fd = daughter1(daughter1(n)); + EST_Item *ld = daughtern(daughtern(n)); + + if (ld == 0) + return val_int0; + EST_Item *ps = as(fd,"Segment")->prev(); + if (ps == 0) + return ld->F("end",0); + else + return EST_Val(ld->F("end",0)-ps->F("end",0)); + } + // dummy for stupid VC++ compiler + { + EST_Val junk; + return junk; + } +} + +static EST_Val ff_seg_start(EST_Item *s) +{ + EST_Item *n = as(s,"Segment"); + if (n->prev() == 0) + return default_val_float; + else + return n->prev()->F("end",0); +} + +static EST_Val ff_syl_start(EST_Item *s) +{ + EST_Item *n = as(s,"SylStructure"); + if (daughter1(n) == 0) + return default_val_float; + else + return ff_seg_start(daughter1(n)); +} + +static EST_Val ff_word_start(EST_Item *s) +{ + EST_Item *n = as(s,"SylStructure"); + if (daughter1(daughter1(n)) == 0) + return default_val_float; + else + return ff_seg_start(daughter1(daughter1(n))); +} + +static EST_Val ff_seg_end(EST_Item *s) +{ + return s->F("end",0); +} + +static EST_Val ff_seg_mid(EST_Item *s) +{ + return EST_Val(((float)ff_seg_start(s)+(float)ff_seg_end(s))/2.0); +} + +static EST_Val ff_syl_end(EST_Item *s) +{ + EST_Item *n = as(s,"SylStructure"); + if (daughtern(n) == 0) + return default_val_float; + else + return ff_seg_end(daughtern(n)); +} + +static EST_Val ff_word_end(EST_Item *s) +{ + EST_Item *n = as(s,"SylStructure"); + if (daughtern(n) == 0) + return default_val_float; + else + return ff_syl_end(daughtern(n)); +} + +static EST_Val ff_position_type(EST_Item *s) +{ + /* Position of syllable in this word: initial, mod, final */ + EST_Item *nn = as(s,"SylStructure"); + + if (nn == 0) // its not really a syllable + return EST_Val("single"); + else if (nn->next() == 0) + { + if (nn->prev() == 0) + return EST_Val("single"); + else + return EST_Val("final"); + } + else if (nn->prev() == 0) + return EST_Val("initial"); + else + return EST_Val("mid"); +} + +static EST_Val ff_word_break(EST_Item *w) +{ + /* Break index of word */ + EST_Item *nn = as(w,"Phrase"); + static EST_Val val4 = EST_Val(4); + static EST_Val val3 = EST_Val(3); + static EST_Val val2 = EST_Val(2); + + if ((nn == 0) || (nn->next() != 0)) + return val_int1; + else + { + EST_Item *p = parent(nn); + if (p) + { + if (p->name() == "BB") + return val4; + else if (p->name() == "B") + return val3; + else if (p->name() == "mB") + return val2; + else + return EST_Val(p->name()); + + } + else + return val_int1; + } +} + +static EST_Val ff_syl_break(EST_Item *s) +{ + // 0 internal syl end, 1 word end, 4 phrase end (ToBI 3 and 4) + EST_Item *nn = as(s,"SylStructure"); + + if (nn == 0) + return val_int1; // no sylstructure so maybe its standalone + else if (nn->next() != 0) // word internal + return val_int0; + else if (parent(nn) == 0) // not in a word -- strange + return val_int1; + else + return ff_word_break(parent(nn)); // take it from the word +} + +static EST_Val ff_old_syl_break(EST_Item *s) +{ + // 0 internal syl end, 1 word end, 4 phrase end (ToBI 3 and 4) + // 2's and threes are promoted to 4s + EST_Item *nn = as(s,"SylStructure"); + static EST_Val val4 = EST_Val(4); + + if (nn == 0) + return val_int1; // no sylstructure so maybe its standalone + else if (nn->next() != 0) // word internal + return val_int0; + else if (parent(nn) == 0) // not in a word -- strange + return val_int1; + else + { + EST_Val v = ff_word_break(parent(nn)); + if ((v == 3) || (v == 2)) + return val4; + else + return v; + } +} + +static EST_Val ff_syl_accented(EST_Item *s) +{ + // t if syllable is accented or not + EST_Item *nn = as(s,"Intonation"); + + if ((nn == 0) || (daughter1(nn) == 0)) + return val_int0; + else + return val_int1; +} + +static EST_Val ff_tobi_accent(EST_Item *s) +{ + // First tobi accent related to syllable + EST_Item *nn = as(s,"Intonation"); + EST_Item *p; + + for (p=daughter1(nn); p; p=p->next()) + if (p->name().contains("*")) + return EST_Val(p->name()); + return EST_Val("NONE"); +} + +static EST_Val ff_tobi_endtone(EST_Item *s) +{ + // First tobi endtone (phrase accent or boundary tone) + EST_Item *nn = as(s,"Intonation"); + EST_Item *p; + + for (p=daughter1(nn); p; p=p->next()) + { + EST_String l = p->name(); + if ((l.contains("%")) || (l.contains("-"))) + return EST_Val(p->name()); + } + + return EST_Val("NONE"); +} + +static EST_Val ff_syl_accent(EST_Item *s) +{ + // (first) accent or NONE on given syllable + EST_Item *nn = as(s,"Intonation"); + + if (daughter2(nn)) + return EST_Val("multi"); + else if (daughter1(nn)) + return EST_Val(daughter1(nn)->name()); + else + return EST_Val("NONE"); +} + +static EST_Val ff_syl_numphones(EST_Item *s) +{ + // Number of phones in syllable + EST_Item *nn = as(s,"SylStructure"); + + return EST_Val(daughter1(nn)->length()); +} + +static EST_Val ff_word_numsyls(EST_Item *s) +{ + // Number of syllable in word + EST_Item *nn = as(s,"SylStructure"); + + return EST_Val(daughter1(nn)->length()); +} + +static EST_Val ff_syl_onsetsize(EST_Item *s) +{ + // number of segments in the onset + EST_Item *nn = as(s,"SylStructure"); + EST_Item *p; + int size; + + for (p=daughter1(nn),size=0; p; p=p->next(),size++) + if (ph_is_vowel(p->name())) + return EST_Val(size); + + return EST_Val(size); + +} + +static EST_Val ff_syl_vowel(EST_Item *s) +{ + // the vowel in the syllable + EST_Item *nn = as(s,"SylStructure"); + EST_Item *p; + int size; + + for (p=daughter1(nn),size=0; p; p=p->next(),size++) + if (ph_is_vowel(p->name())) + return EST_Val(p->name()); + + // no vowel + return EST_Val("novowel"); +} + +static EST_Val ff_syl_codasize(EST_Item *s) +{ + // number of segments in the coda + EST_Item *nn = as(s,"SylStructure"); + EST_Item *p; + int size; + + for (p=daughter1(nn)->last(),size=1; p; p=p->prev(),size++) + if (ph_is_vowel(p->name())) + return EST_Val(size); + + return EST_Val(size); +} + +static EST_Val ff_syl_pc_unvox(EST_Item *s) +{ + // Returns percentage of syllable from start to first voiced phone + EST_Item *nn = as(s,"SylStructure"); + EST_Item *p,*ps; + float unvox,start = 0; + + if (daughter1(nn) == 0) + return val_int0; // no segments in syllable + else if ((ps = as(daughter1(nn),"Segment")->prev()) != 0) + start = ps->F("end",0); + unvox = start; + + for (p=daughter1(nn); p != 0; p=p->next()) + { + if ((ph_is_vowel(p->name())) || + (ph_is_voiced(p->name()))) + break; + unvox = p->F("end",0); + } + + return EST_Val((int)(((unvox-start)*100)/ + (daughtern(nn)->F("end",0)-start))); +} + +static EST_Val ff_syl_vowel_start(EST_Item *s) +{ + // Returns start time of vowel in syllable (or start of syllable) + EST_Item *nn = as(s,"SylStructure"); + EST_Item *p; + + for (p=daughter1(nn); p != 0; p=p->next()) + { + if (ph_is_vowel(p->name())) + return EST_Val(ff_seg_start(p)); + } + // There isn't a vowel, so just take start of syl + return EST_Val(ff_syl_start(p)); +} + +static EST_Val ff_seg_onsetcoda(EST_Item *s) +{ + // onset if seg in onset, coda otherwise (vowel is in coda) + EST_Item *nn = as(s,"SylStructure"); + EST_Item *p; + + for (p=nn->next(); p; p=p->next()) + if (ph_is_vowel(p->name())) + return EST_Val("onset"); + return EST_Val("coda"); +} + +static EST_Val ff_seg_onset_stop(EST_Item *s) +{ + // 1 if onset of the syllable attached to this segment has a stop + EST_Item *nn = as(s,"SylStructure")->first(); + + for ( ; nn ; nn=nn->next()) + { + if (ph_is_vowel(nn->name())) + return val_string0; + if (ph_is_stop(nn->name())) + return val_string1; + } + return val_string0; +} + +static EST_Val ff_seg_coda_fric(EST_Item *s) +{ + // 1 if coda of the syllable attached to this segment has a fricative + EST_Item *nn = as(s,"SylStructure")->last(); + + for ( ; nn ; nn=nn->prev()) + { + if (ph_is_vowel(nn->name())) + return val_string0; + if (ph_is_fricative(nn->name())) + return val_string1; + } + return val_string0; +} + +static EST_Val ff_seg_pos_in_syl(EST_Item *s) +{ + // position of segment in syllable + EST_Item *nn = as(s,"SylStructure"); + EST_Item *p; + int pos=0; + + for (p=nn->first(); p; p=p->next(),pos++) + if (p == nn) + return EST_Val(pos); + // don't think you can get here + return EST_Val(pos); +} + +static EST_Val ff_seg_syl_initial(EST_Item *s) +{ + // 1 if seg is syllable initial, 0 otherwise. + EST_Item *nn = as(s,"SylStructure"); + + if (nn->prev() == 0) + return val_string1; + else + return val_string0; +} + +static EST_Val ff_seg_syl_final(EST_Item *s) +{ + // 1 if seg is syllable initial, 0 otherwise. + EST_Item *nn = as(s,"SylStructure"); + + if (nn->next() == 0) + return val_string1; + else + return val_string0; +} + +static EST_Val ff_syl_pos_in_word(EST_Item *s) +{ + // position of syllable in word + EST_Item *nn = as(s,"SylStructure"); + EST_Item *p; + int pos=0; + + for (p=nn->first(); p; p=p->next(),pos++) + if (p == nn) + return EST_Val(pos); + // don't think you can get here + return EST_Val(pos); +} + +static EST_Val ff_pos_in_phrase(EST_Item *s) +{ + // position of word in phrase + EST_Item *nn = as(s,"Phrase"); + EST_Item *p; + int pos=0; + + for (p=nn->first(); p; p=p->next(),pos++) + if (p == nn) + return EST_Val(pos); + // don't think you can get here + return EST_Val(pos); +} + +static EST_Val ff_num_break(EST_Item *s) +{ + // 1 if this word is at the end of a number group and followed by + // a new number group + EST_Item *nn = as(s,"Token"); + + if ((nn->next() == 0) && + (parent(nn)->name().matches(RXdouble)) && + (parent(nn)->next()->name().matches(RXdouble))) + return val_string1; + else + return val_string0; +} + +static EST_Val ff_words_out(EST_Item *s) +{ + return EST_Val(as(s,"Phrase")->length()); +} + +static EST_Val ff_syl_midpitch(EST_Item *s) +{ + // pitch of mid vowel in syllable + EST_Item *nn = as(s,"SylStructure"); + EST_Item *p; + + for (p=daughter1(nn); p; p = p->next()) + { + if (ph_is_vowel(p->name())) + return ffeature(p,"R:Target.daughter1.f0"); + } + // must be a silence or a syllabic consonant + return default_val_float; +} + +static EST_Val ff_syl_startpitch(EST_Item *s) +{ + // pitch at start of syllable + // average of first segment and previous segment target (if exists) + + float pt = ffeature(s,"R:SylStructure.daughter1.R:Segment.p.R:Target.daughter1.f0"); + float tt = ffeature(s,"R:SylStructure.daughter1.R:Segment.R:Target.daughter1.f0"); + + if (pt < 0.1) + return EST_Val(tt); + else if (tt < 0.1) + return EST_Val(pt); + else + return EST_Val((tt+pt)/2.0); +} + +static EST_Val ff_syl_endpitch(EST_Item *s) +{ + // pitch at start of syllable + // average of first segment and previous segment target (if exists) + + float nt = ffeature(s,"R:SylStructure.daughtern.R:Segment.n.R:Target.daughter1.f0"); + float tt = ffeature(s,"R:SylStructure.daughtern.R:Segment.R:Target.daughter1.f0"); + + if (nt < 0.1) + return EST_Val(tt); + else if (tt < 0.1) + return EST_Val(nt); + else + return EST_Val((tt+nt)/2.0); +} + +static EST_Val ff_seg_pitch(EST_Item *s) +{ + // Return interpolated pitch at mid-point of s + EST_Item *t,*lastt; + float spoint,deltaf0,deltatime; + float smid = ff_seg_mid(s); + EST_Utterance *u = get_utt(s); + + for (lastt=t=u->relation("Target")->first_leaf(); + next_leaf(t) != 0; t=next_leaf(t)) + { + if (smid <= t->F("pos",0)) + break; + lastt=t; + } + + if (lastt == 0) + return EST_Val((float)0.0); + + deltaf0 = t->F("f0",0)-lastt->F("f0",0); + deltatime = t->F("pos",0) - lastt->F("pos",0); + if (deltatime <= 0) + spoint = lastt->F("f0",0); + else + spoint = lastt->F("f0",0) + + (deltaf0*((smid-lastt->F("pos",0))/deltatime)); + + if (spoint > 35) + return EST_Val(spoint); + else + return EST_Val((float)0.0); +} + +static EST_Val ff_syl_in(EST_Item *s) +{ + // Number of syllables to since last phrase break + EST_Item *nn = as(s,"Syllable"); + // The first syllable in the phrase + EST_Item *fsyl = + as(daughter1(as(parent(s,"SylStructure"),"Phrase")->first(),"SylStructure"), + "Syllable"); + EST_Item *p; + int count; + + for (count=0,p=nn; p != 0; p=p->prev(),count++) + if (p == fsyl) + return EST_Val(count); + return EST_Val(count); +} + +static EST_Val ff_syl_out(EST_Item *s) +{ + // Number of syllables since last phrase break + EST_Item *nn = as(s,"Syllable"); + // The last syllable in the phrase + EST_Item *lsyl = + as(daughtern(as(parent(s,"SylStructure"),"Phrase")->last(),"SylStructure"), + "Syllable"); + EST_Item *p; + int count; + + for (count=0,p=nn; p != 0; p=p->next(),count++) + if (p == lsyl) + return EST_Val(count); + return EST_Val(count); +} + +static EST_Val ff_ssyl_in(EST_Item *s) +{ + // Number of stressed syllables since last phrase break + EST_Item *nn = as(s,"Syllable"); + EST_Item *fsyl = + as(daughter1(as(parent(s,"SylStructure"),"Phrase")->first(),"SylStructure"), + "Syllable"); + EST_Item *p; + int count; + + if (nn == fsyl) return val_int0; + for (count=0,p=nn->prev(); (p != 0) && (p != fsyl); p = p->prev()) + if (p->F(stressname,0) == 1) + count ++; + return EST_Val(count); +} + +static EST_Val ff_ssyl_out(EST_Item *s) +{ + // Number of stressed syllables to next phrase break + EST_Item *nn = as(s,"Syllable"); + // The last syllable in the phrase + EST_Item *lsyl = + as(daughtern(as(parent(s,"SylStructure"),"Phrase")->last(),"SylStructure"), + "Syllable"); + EST_Item *p; + int count; + + if (nn == lsyl) return val_int0; + for (count=0,p=nn->next(); (p != 0); p=p->next()) + { + if (p->F(stressname,0) == 1) + count ++; + if (p == lsyl) break; + } + return EST_Val(count); +} + +static EST_Val ff_asyl_in(EST_Item *s) +{ + // Number of accented syllables since last phrase break + EST_Item *nn = as(s,"Syllable"); + // The first syllable in the phrase + EST_Item *fsyl = + as(daughter1(as(parent(s,"SylStructure"),"Phrase")->first(),"SylStructure"), + "Syllable"); + EST_Item *p; + int count; + + if (nn == fsyl) return val_int0; + for (count=0,p=nn->prev(); (p != 0) && (p != fsyl); p = p->prev()) + if (ff_syl_accented(p) == 1) + count ++; + return EST_Val(count); +} + +static EST_Val ff_asyl_out(EST_Item *s) +{ + // Number of accented syllables to next phrase break + EST_Item *nn = as(s,"Syllable"); + // The last syllable in the phrase + EST_Item *lsyl = + as(daughtern(as(parent(s,"SylStructure"),"Phrase")->last(),"SylStructure"), + "Syllable"); + EST_Item *p; + int count; + + if (nn == lsyl) return val_int0; + for (count=0,p=nn->next(); (p != 0); p=p->next()) + { + if (ff_syl_accented(p) == 1) + count ++; + if (p == lsyl) break; + } + return EST_Val(count); +} + +static EST_Val ff_last_accent(EST_Item *s) +{ + // Number of syllables since last accented syllable + EST_Item *nn = as(s,"Syllable"); + EST_Item *p; + int count; + + for (count=0,p=nn->prev(); p != 0; p=p->prev(),count++) + if (ff_syl_accented(p) == 1) + return EST_Val(count); + + return EST_Val(count); +} + +static EST_Val ff_next_accent(EST_Item *s) +{ + // Number of syllables to next accented syllable + EST_Item *nn = as(s,"Syllable"); + EST_Item *p; + int count; + + for (count=0,p=nn->next(); p != 0; p=p->next(),count++) + if (ff_syl_accented(p) == 1) + return EST_Val(count); + + return EST_Val(count); +} + +static EST_Val ff_sub_phrases(EST_Item *s) +{ + // Number of non-major phrase breaks since last major phrase break + EST_Item *nn = parent(parent(s,"SylStructure"),"Phrase"); + EST_Item *p; + int count; + + for (count=0,p=nn->prev(); p != 0; p=p->prev()) + { + if (p->name() == "BB") + return EST_Val(count); + count ++; + } + + return EST_Val(count); +} + +void festival_ff_init(void) +{ + + festival_def_nff("segment_duration","Segment",ff_segment_duration, + "Segment.segment_duration\n\ + The duration of the given stream item calculated as the end of this\n\ + item minus the end of the previous item in the Segment relation."); + festival_def_nff("syllable_duration","Syllable",ff_syllable_duration, + "Syllable.syllable_duration\n\ + The duration of the given stream item calculated as the end of last\n\ + daughter minus the end of previous item in the Segment relation of the\n\ + first duaghter."); + festival_def_nff("word_duration","Word",ff_word_duration, + "Word.word_duration\n\ + The duration of the given stream item. This is defined as the end of\n\ + last segment in the last syllable (via the SylStructure relation) minus\n\ + the segment immediate preceding the first segment in the first syllable."); + festival_def_nff("segment_start","Segment",ff_seg_start, + "Segement.segment_start\n\ + The start time of the given segment."); + festival_def_nff("segment_mid","Segment",ff_seg_mid, + "Segement.segment_mid\n\ + The middle time of the given segment."); + festival_def_nff("syllable_start","Syllable",ff_syl_start, + "Syllable.syllable_start\n\ + The start time of the given syllable."); + festival_def_nff("word_start","Word",ff_word_start, + "Word.word_start\n\ + The start time of the given word."); + festival_def_nff("segment_end","Segment",ff_seg_end, + "Segment.segment_end\n\ + The end time of the given segment."); + festival_def_nff("syllable_end","Syllable",ff_syl_end, + "Syllable.syllable_end\n\ + The end time of the given syllable."); + festival_def_nff("word_end","Word",ff_word_end, + "Word.word_end\n\ + The end time of the given word."); + festival_def_nff("addr","ANY",ff_addr, + "ANY.addr\n\ + Returned by popular demand, returns the address of given item that\n\ + is guaranteed unique for this session."); + + festival_def_nff("accented","Syllable",ff_syl_accented, + "Syllable.accented\n\ + Returns 1 if syllable is accented, 0 otherwise. A syllable is\n\ + accented if there is at least one IntEvent related to it."); + festival_def_nff("syl_accent","Syllable",ff_syl_accent, + "Syllable.syl_accent\n\ + Returns the name of the accent related to the syllable. NONE is returned\n\ + if there are no accents, and multi is returned if there is more than one."); + festival_def_nff("tobi_accent","Syllable",ff_tobi_accent, + "Syllable.tobi_accent\n\ + Returns the ToBI accent related to syllable. ToBI accents are\n\ + those which contain a *. NONE is returned if there are none. If\n\ + there is more than one ToBI accent related to this syllable the\n\ + first one is returned."); + festival_def_nff("tobi_endtone","Syllable",ff_tobi_endtone, + "Syllable.tobi_endtone\n\ + Returns the ToBI endtone related to syllable. ToBI end tones are\n\ + those IntEvent labels which contain a % or a - (i.e. end tones or\n\ + phrase accents). NONE is returned if there are none. If\n\ + there is more than one ToBI end tone related to this syllable the\n\ + first one is returned."); + festival_def_nff("syl_onsetsize","Syllable",ff_syl_onsetsize, + "Syllable.syl_onsetsize\n\ + Returns the number of segments before the vowel in this syllable. If\n\ + there is no vowel in the syllable this will return the total number\n\ + of segments in the syllable."); + festival_def_nff("syl_vowel","Syllable",ff_syl_vowel, + "Syllable.syl_vowel\n\ + Returns the name of the vowel within this syllable. Note this is not\n\ + the general form you probably want. You can't refer to ph_* features \n\ + of this. Returns \"novowel\" is no vowel can be found."); + festival_def_nff("syl_codasize","Syllable",ff_syl_codasize, + "Syllable.syl_codasize\n\ + Returns the number of segments after the vowel in this syllable. If\n\ + there is no vowel in the syllable this will return the total number\n\ + of segments in the syllable." ); + festival_def_nff("seg_onsetcoda","Segment",ff_seg_onsetcoda, + "Segment.seg_onsetcoda\n\ + Returns onset if this segment is before the vowel in the syllable it\n\ + is contained within. Returns coda if it is the vowel or after. If\n\ + the segment is not in a syllable it returns onset."); + festival_def_nff("seg_onset_stop","Segment",ff_seg_onset_stop, + "Segment.seg_onset_stop\n\ + Returns 1 if onset of the syllable this segment is in contains a stop.\n\ + 0 otherwise."); + festival_def_nff("seg_coda_fric","Segment",ff_seg_coda_fric, + "Segment.seg_coda_fric\n\ + Returns 1 if coda of the syllable this segment is in contains a fricative.\n\ + 0 otherwise."); + festival_def_nff("syl_numphones","Syllable",ff_syl_numphones, + "Syllable.syl_numphones\n\ + Returns number of phones in syllable."); + festival_def_nff("syl_pc_unvox","Syllable",ff_syl_pc_unvox, + "Syllable.syl_pc_unvox\n\ + Percentage of total duration of unvoiced segments from\n\ + start of syllable. (i.e. percentage to start of first voiced segment)"); + festival_def_nff("syl_vowel_start","Syllable",ff_syl_vowel_start, + "Syllable.syl_vowel_start\n\ + Start position of vowel in syllable. If there is no vowel the start\n\ + position of the syllable is returned."); + festival_def_nff("syl_midpitch","Syllable",ff_syl_midpitch, + "Syllable.syl_midpitch\n\ + Pitch at the mid vowel of this syllable."); + festival_def_nff("syl_startpitch","Syllable",ff_syl_startpitch, + "Syllable.syl_startpitch\n\ + Pitch at the start of this syllable."); + festival_def_nff("syl_endpitch","Syllable",ff_syl_endpitch, + "Syllable.syl_endpitch\n\ + Pitch at the end of this syllable."); + festival_def_nff("seg_pitch","Segment",ff_seg_pitch, + "Segment.seg_pitch\n\ + Pitch at the middle of this segment."); + + festival_def_nff("syl_in","Syllable",ff_syl_in, + "Syllable.syl_in\n\ + Returns number of syllables since last phrase break. This is 0 if\n\ + this syllable is phrase initial."); + festival_def_nff("syl_out","Syllable",ff_syl_out, + "Syllable.syl_out\n\ + Returns number of syllables to next phrase break. This is 0 if\n\ + this syllable is phrase final."); + festival_def_nff("ssyl_in","Syllable",ff_ssyl_in, + "Syllable.ssyl_in\n\ + Returns number of stressed syllables since last phrase break, not\n\ + including this one."); + festival_def_nff("ssyl_out","Syllable",ff_ssyl_out, + "Syllable.ssyl_out\n\ + Returns number of stressed syllables to next phrase break, not including\n\ + this one."); + festival_def_nff("asyl_in","Syllable",ff_asyl_in, + "Syllable.asyl_in\n\ + Returns number of accented syllables since last phrase break, not\n\ + including this one. Accentedness is as defined by the syl_accented\n\ + feature."); + festival_def_nff("asyl_out","Syllable",ff_asyl_out, + "Syllable.asyl_out\n\ + Returns number of accented syllables to the next phrase break, not\n\ + including this one. Accentedness is as defined by the syl_accented\n\ + feature."); + festival_def_nff("last_accent","Syllable",ff_last_accent, + "Syllable.last_accent\n\ + Returns the number of syllables since last accented syllable."); + festival_def_nff("next_accent","Syllable",ff_next_accent, + "Syllable.next_accent\n\ + Returns the number of syllables to the next accented syllable."); + festival_def_nff("sub_phrases","Syllable",ff_sub_phrases, + "Syllable.sub_phrases\n\ + Returns the number of non-major phrase breaks since last major\n\ + phrase break. Major phrase breaks are 4, as returned by syl_break,\n\ + minor phrase breaks are 2 and 3."); + festival_def_nff("syl_break","Syllable",ff_syl_break, + "Syllable.syl_break\n\ + The break level after this syllable. Word internal is syllables\n\ + return 0, non phrase final words return 1. Final syllables in \n\ + phrase final words return the name of the phrase they are related to.\n\ + Note the occasional \"-\" that may appear of phrase names is removed\n\ + so that this feature function returns a number in the range 0,1,2,3,4."); + festival_def_nff("old_syl_break","Syllable",ff_old_syl_break, + "Syllable.old_syl_break\n\ + Like syl_break but 2 and 3 are promoted to 4 (to be compatible with\n\ + some older models."); + festival_def_nff("pos_in_syl","Segment",ff_seg_pos_in_syl, + "Segment.pos_in_syl\n\ + The position of this segment in the syllable it is related to. The index\n\ + counts from 0. If this segment is not related to a syllable this \n\ + returns 0."); + festival_def_nff("syl_initial","Segment",ff_seg_syl_initial, + "Segment.syl_initial\n\ + Returns 1 if this segment is the first segment in the syllable it\n\ + is related to, or if it is not related to any syllable."); + festival_def_nff("syl_final","Segment",ff_seg_syl_final, + "Segment.syl_final\n\ + Returns 1 if this segment is the last segment in the syllable it\n\ + is related to, or if it is not related to any syllable."); + festival_def_nff("pos_in_word","Syllable",ff_syl_pos_in_word, + "Syllable.pos_in_word\n\ + The position of this syllable in the word it is related to. The index\n\ + counts from 0. If this syllable is not related to a word then 0 is\n\ + returned."); + festival_def_nff("word_numsyls","Word",ff_word_numsyls, + "Word.word_numsyls\n\ + Returns number of syllables in a word."); + festival_def_nff("word_break","Word",ff_word_break, + "Word.word_break\n\ + The break level after this word. Non-phrase final words return 1\n\ + Phrase final words return the name of the phrase they are in."); + festival_def_nff("pos_in_phrase","Word",ff_pos_in_phrase, + "Word.pos_in_phrase\n\ + The position of this word in the phrase this word is in."); + festival_def_nff("num_break","Word",ff_num_break, + "Word.num_break\n\ + 1 if this is the last word in a numeric token and it is followed by\n\ + a numeric token."); + festival_def_nff("words_out","Word",ff_words_out, + "Word.words_out\n\ + Number of words to end of this phrase."); + festival_def_nff("position_type","Syllable",ff_position_type, + "Syllable.position_type\n\ + The type of syllable with respect to the word it it related to. This\n\ + may be any of: single for single syllable words, initial for word\n\ + initial syllables in a poly-syllabic word, final for word final\n\ + syllables in poly-syllabic words, and mid for syllables within \n\ + poly-syllabic words."); + +} diff --git a/src/modules/base/module_support.cc b/src/modules/base/module_support.cc new file mode 100644 index 0000000..c31acc6 --- /dev/null +++ b/src/modules/base/module_support.cc @@ -0,0 +1,192 @@ + /************************************************************************/ + /* */ + /* Centre for Speech Technology Research */ + /* University of Edinburgh, UK */ + /* Copyright (c) 1996,1997 */ + /* All Rights Reserved. */ + /* */ + /* Permission is hereby granted, free of charge, to use and distribute */ + /* this software and its documentation without restriction, including */ + /* without limitation the rights to use, copy, modify, merge, publish, */ + /* distribute, sublicense, and/or sell copies of this work, and to */ + /* permit persons to whom this work is furnished to do so, subject to */ + /* the following conditions: */ + /* 1. The code must retain the above copyright notice, this list of */ + /* conditions and the following disclaimer. */ + /* 2. Any modifications must be clearly marked as such. */ + /* 3. Original authors' names are not deleted. */ + /* 4. The authors' names are not used to endorse or promote products */ + /* derived from this software without specific prior written */ + /* permission. */ + /* */ + /* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ + /* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ + /* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ + /* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ + /* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ + /* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ + /* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ + /* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ + /* THIS SOFTWARE. */ + /* */ + /*************************************************************************/ + /* */ + /* Author: Richard Caley (rjc@cstr.ed.ac.uk) */ + /* Date: Tue Jul 29 1997 */ + /* -------------------------------------------------------------------- */ + /* Some things useful in modules. */ + /* */ + /*************************************************************************/ + +#include "module_support.h" + +#define CAR6(x) CAR(CDR5(x)) +#define CDR6(x) CDR(CDR5(x)) +#define CAR7(x) CAR(CDR6(x)) +#define CDR7(x) CDR(CDR6(x)) + +#define CDR_to1(X) ((X!=NIL)&&CDR1(X)) +#define CDR_to2(X) (CDR_to1(X)&&CDR2(X)) +#define CDR_to3(X) (CDR_to2(X)&&CDR3(X)) +#define CDR_to4(X) (CDR_to3(X)&&CDR4(X)) +#define CDR_to5(X) (CDR_to4(X)&&CDR5(X)) +#define CDR_to6(X) (CDR_to5(X)&&CDR6(X)) +#define CDR_to7(X) (CDR_to6(X)&&CDR7(X)) + + +void unpack_multiple_args(LISP args, LISP &v1, LISP &v2, LISP &v3, LISP &v4) +{ + if (args) + { + v1 = CAR1(args); + if (CDR1(args)) + { + v2 = CAR2(args); + if (CDR2(args)) + { + v3 = CAR3(args); + if (CDR3(args)) + v4 = CAR4(args); + } + } + } +} + +void unpack_multiple_args(LISP args, LISP &v1, LISP &v2, LISP &v3, LISP &v4, LISP &v5) +{ + unpack_multiple_args(args, v1, v2, v3, v4); + + if (CDR4(args)) + v5 = CAR5(args); +} + +void unpack_relation_arg(EST_Utterance *utt, + LISP lrel_name, + EST_String &relation_name, + EST_Relation *&relation, + RelArgType type) +{ + if (lrel_name) + relation_name = get_c_string(lrel_name); + + if(utt->relation(relation_name)) + relation = utt->relation(relation_name); + + if (type==sat_existing) + { + if(!relation) + err("no relation", relation_name); + } + else if (type==sat_new || type==sat_replace) + { + if (relation) + if (type==sat_new) + err("relation exists", relation_name); + utt->create_relation(relation_name); + + relation = &(*(utt->relation(relation_name))); + } +} + +void unpack_module_args(LISP args, EST_Utterance *&utt) +{ + if (args) + { + LISP lutt = CAR1(args); + + utt = get_c_utt(lutt); + return; + } + err("no utterance given", NIL); +} + + +void unpack_module_args(LISP args, + EST_Utterance *&utt, + EST_String &relation1_name, EST_Relation *&relation1, RelArgType type1) +{ + unpack_module_args(args, utt); + + unpack_relation_arg(utt, CDR_to1(args)?CAR2(args):NIL, relation1_name, relation1, type1); +} + +void unpack_module_args(LISP args, + EST_Utterance *&utt, + EST_String &relation1_name, EST_Relation *&relation1, RelArgType type1, + EST_String &relation2_name, EST_Relation *&relation2, RelArgType type2 + ) +{ + unpack_module_args(args, utt); + + unpack_relation_arg(utt, CDR_to1(args)?CAR2(args):NIL, relation1_name, relation1, type1); + unpack_relation_arg(utt, CDR_to2(args)?CAR3(args):NIL, relation2_name, relation2, type2); +} + +void unpack_module_args(LISP args, + EST_Utterance *&utt, + EST_String &relation1_name, EST_Relation *&relation1, RelArgType type1, + EST_String &relation2_name, EST_Relation *&relation2, RelArgType type2, + EST_String &relation3_name, EST_Relation *&relation3, RelArgType type3 + ) +{ + unpack_module_args(args, utt); + + unpack_relation_arg(utt, CDR_to1(args)?CAR2(args):NIL, relation1_name, relation1, type1); + unpack_relation_arg(utt, CDR_to2(args)?CAR3(args):NIL, relation2_name, relation2, type2); + unpack_relation_arg(utt, CDR_to3(args)?CAR4(args):NIL, relation3_name, relation3, type3); +} + +void unpack_module_args(LISP args, + EST_Utterance *&utt, + EST_String &relation1_name, EST_Relation *&relation1, RelArgType type1, + EST_String &relation2_name, EST_Relation *&relation2, RelArgType type2, + EST_String &relation3_name, EST_Relation *&relation3, RelArgType type3, + EST_String &relation4_name, EST_Relation *&relation4, RelArgType type4 + ) +{ + unpack_module_args(args, utt); + + unpack_relation_arg(utt, CDR_to1(args)?CAR2(args):NIL, relation1_name, relation1, type1); + unpack_relation_arg(utt, CDR_to2(args)?CAR3(args):NIL, relation2_name, relation2, type2); + unpack_relation_arg(utt, CDR_to3(args)?CAR4(args):NIL, relation3_name, relation3, type3); + unpack_relation_arg(utt, CDR_to4(args)?CAR5(args):NIL, relation4_name, relation4, type4); +} + +void unpack_module_args(LISP args, + EST_Utterance *&utt, + EST_String &relation1_name, EST_Relation *&relation1, RelArgType type1, + EST_String &relation2_name, EST_Relation *&relation2, RelArgType type2, + EST_String &relation3_name, EST_Relation *&relation3, RelArgType type3, + EST_String &relation4_name, EST_Relation *&relation4, RelArgType type4, + EST_String &relation5_name, EST_Relation *&relation5, RelArgType type5 + ) +{ + unpack_module_args(args, utt); + + unpack_relation_arg(utt, CDR_to1(args)?CAR2(args):NIL, relation1_name, relation1, type1); + unpack_relation_arg(utt, CDR_to2(args)?CAR3(args):NIL, relation2_name, relation2, type2); + unpack_relation_arg(utt, CDR_to3(args)?CAR4(args):NIL, relation3_name, relation3, type3); + unpack_relation_arg(utt, CDR_to4(args)?CAR5(args):NIL, relation4_name, relation4, type4); + unpack_relation_arg(utt, CDR_to5(args)?CAR6(args):NIL, relation5_name, relation5, type5); +} + diff --git a/src/modules/base/modules.cc b/src/modules/base/modules.cc new file mode 100644 index 0000000..c939638 --- /dev/null +++ b/src/modules/base/modules.cc @@ -0,0 +1,226 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Some basic initialization functions for modules */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "lexicon.h" +#include "modules.h" +#include "intonation.h" + +static void create_words(EST_Utterance *u); +static void create_segments(EST_Utterance *u); +static void create_wave(EST_Utterance *u); +static void create_phones(EST_Utterance *u); + +LISP FT_Initialize_Utt(LISP utt) +{ + // Main utterance intialization routine + // creates appropriate streams and loads them from the input + EST_Utterance *u = get_c_utt(utt); + EST_String type; + + *cdebug << "Initialize module\n"; + + type = utt_type(*u); + + utt_cleanup(*u); // delete all relations + + if (type == "Words") + create_words(u); + else if (type == "Text") + ; + else if (type == "Segments") + create_segments(u); + else if (type == "Phones") + create_phones(u); + else if (type == "Phrase") + create_phraseinput(u); + else if (type == "Wave") + create_wave(u); + else + { + // error + cerr << "Unknown utterance type \"" << type << "\" for initialization " + << endl; + festival_error(); + } + + return utt; +} + +void create_words(EST_Utterance *u) +{ + // Add words from IForm + LISP lwords,w; + EST_Item *word; + + u->create_relation("Word"); + lwords = utt_iform(*u); + + for (w=lwords; w != NIL; w=cdr(w)) + { + if (consp(car(w))) // word has features too + { + word = add_word(u,get_c_string(car(car(w)))); + add_item_features(word,car(cdr(car(w)))); + } + else + add_word(u,get_c_string(car(w))); + } + +} + +void create_wave(EST_Utterance *u) +{ + // Get the fname for the wave and load it + EST_Item *item = 0; + LISP lwave; + EST_Wave *wave = new EST_Wave; + + lwave = utt_iform(*u); + + if (wave->load(get_c_string(lwave)) != format_ok) + { + cerr << "Cannot load wavefile: " << get_c_string(lwave) << endl; + festival_error(); + } + + item = u->create_relation("Wave")->append(); + item->set_val("wave",est_val(wave)); + +} + +void create_segments(EST_Utterance *u) +{ + // Add segments from IForm + LISP lsegs,s,targs,t; + EST_String seg; + EST_Item *Seg;; + float start,end,dur,tpos,tval; + u->create_relation("Segment"); + u->create_relation("Target"); + + lsegs = utt_iform(*u); + + end = 0.0; + for (s=lsegs; s != NIL; s=cdr(s)) + { + seg = get_c_string(car(car(s))); + dur = get_c_float(car(cdr(car(s)))); + targs = cdr(cdr(car(s))); + Seg = add_segment(u,seg); + start = end; + end += dur; + Seg->set("end",end); + for (t=targs; t != NIL; t=cdr(t)) + { + tpos = start + (get_c_float(car(car(t)))); + tval = get_c_float(car(cdr(car(t)))); + add_target(u,Seg,tpos,tval); + } + } + +} + +static void create_phones(EST_Utterance *u) +{ + // Add phones from IForm + LISP lsegs,s; + EST_String seg; + + u->create_relation("Segment"); + lsegs = utt_iform(*u); + + for (s=lsegs; s != NIL; s=cdr(s)) + { + seg = get_c_string(car(s)); + add_segment(u,seg); + } +} + +LISP FT_Initialize_Utt(LISP args); +LISP FT_Classic_Phrasify_Utt(LISP args); +LISP FT_Classic_Word_Utt(LISP args); +LISP FT_Unilex_Word_Utt(LISP args); +LISP FT_Classic_POS_Utt(LISP args); +LISP FT_PostLex_Utt(LISP utt); +void festival_ff_init(void); + +void festival_base_init(void) +{ + // Thing I haven't put anywhere else yet + + festival_ff_init(); // basic feature functions + // Basic EST_Utterance modules + festival_def_utt_module("Initialize",FT_Initialize_Utt, + "(Initialize UTT)\n\ + This module should be called first on all utterances it does some\n\ + necessary initialization of the utterance and loads the base\n\ + streams with the information from the input form."); + festival_def_utt_module("Classic_Phrasify",FT_Classic_Phrasify_Utt, + "(Classic_Phrasify UTT)\n\ + Creates phrases from words, if pos_supported is non-nil, a more elaborate\n\ + system of prediction is used. Here probability models based on part of\n\ + speech and B/NB distribution are used to predict breaks. This system\n\ + uses standard Viterbi decoding techniques. If pos_supported is nil,\n\ + a simple CART-based prediction model is used. [see Phrase breaks]"); + festival_def_utt_module("Classic_Word",FT_Classic_Word_Utt, + "(Classic_Word UTT)\n\ + Build the syllable/segment/SylStructure from the given words using the\n\ + Lexicon. Uses part of speech information in the lexicon look up if\n\ + present."); + festival_def_utt_module("Unilex_Word",FT_Unilex_Word_Utt, + "(Unilex_Word UTT)\n\ + Build the syllable/segment/SylStructure from the given words using the\n\ + Lexicon. Uses part of speech information in the lexicon look up if\n\ + present."); + festival_def_utt_module("Classic_POS",FT_Classic_POS_Utt, + "(Classic_POS UTT)\n\ + Predict part of speech tags for the existing word stream. If the variable\n\ + pos_lex_name is nil nothing happens, otherwise it is assumed to point to\n\ + a lexicon file giving part of speech distribution for words. An ngram\n\ + model file should be in pos_ngram_name. The system uses standard\n\ + Viterbi decoding techniques. [see POS tagging]"); + festival_def_utt_module("Builtin_PostLex",FT_PostLex_Utt, + "(Builtin_PostLex UTT)\n\ + Post-lexical rules. Currently only vowel reduction applied to each\n\ + syllable using postlex_vowel_reduce_cart_tree, and the table of \n\ + vowel reduction pairs in postlex_vowel_reduce_table."); + +} diff --git a/src/modules/base/parameters.cc b/src/modules/base/parameters.cc new file mode 100644 index 0000000..4eb1de7 --- /dev/null +++ b/src/modules/base/parameters.cc @@ -0,0 +1,126 @@ + /************************************************************************/ + /* */ + /* Centre for Speech Technology Research */ + /* University of Edinburgh, UK */ + /* Copyright (c) 1996,1997 */ + /* All Rights Reserved. */ + /* */ + /* Permission is hereby granted, free of charge, to use and distribute */ + /* this software and its documentation without restriction, including */ + /* without limitation the rights to use, copy, modify, merge, publish, */ + /* distribute, sublicense, and/or sell copies of this work, and to */ + /* permit persons to whom this work is furnished to do so, subject to */ + /* the following conditions: */ + /* 1. The code must retain the above copyright notice, this list of */ + /* conditions and the following disclaimer. */ + /* 2. Any modifications must be clearly marked as such. */ + /* 3. Original authors' names are not deleted. */ + /* 4. The authors' names are not used to endorse or promote products */ + /* derived from this software without specific prior written */ + /* permission. */ + /* */ + /* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ + /* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ + /* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ + /* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ + /* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ + /* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ + /* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ + /* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ + /* THIS SOFTWARE. */ + /* */ + /*************************************************************************/ + /* */ + /* Author: Richard Caley (rjc@cstr.ed.ac.uk) */ + /* Date: Tue Aug 26 1997 */ + /* ------------------------------------------------------------------- */ + /* Utility routines for easy access to parameters from C++ modules. */ + /* */ + /*************************************************************************/ + +#include "module_support.h" + +// implemented as a call to scheme so that redefineing how parameters +// are accessed in scheme will affect things as required. Of course +// this isn't as efficient as it might be, but we aren't going to be +// doing this inside any loops + +LISP lisp_parameter_get(const EST_String parameter_name) +{ + LISP parameter_get = siod_get_lval("Parameter.get", "Parameter.get not defined"); + LISP parameter = rintern(parameter_name); + LISP sexp = cons(parameter_get, cons(quote(parameter), NIL)); + LISP val=NIL; + + gc_protect(&sexp); + + CATCH_ERRORS() + { + cerr << "error getting parameter " << parameter_name << "\n"; + siod_reset_prompt(); + gc_unprotect(&sexp); + return NIL; + } + val = leval(sexp, NIL); + END_CATCH_ERRORS(); + + gc_unprotect(&sexp); + return val; +} + +int int_parameter_get(const EST_String parameter, int def) +{ + LISP lval = lisp_parameter_get(parameter); + + if (lval == NIL) + return def; + if (!FLONUMP(lval)) + { + cerr << "non numeric value for parameter " << parameter << "\n"; + return 0; + } + + return get_c_int(lval); +} + +float float_parameter_get(const EST_String parameter, float def) +{ + LISP lval = lisp_parameter_get(parameter); + + if (lval == NIL) + return def; + if (!FLONUMP(lval)) + { + cerr << "non numeric value for parameter " << parameter << "\n"; + return 0.0; + } + + return get_c_float(lval); +} + +bool bool_parameter_get(const EST_String parameter) +{ + LISP lval = lisp_parameter_get(parameter); + + return lval != NIL; +} + +EST_String string_parameter_get(const EST_String parameter, EST_String def) +{ + LISP lval = lisp_parameter_get(parameter); + + if (lval == NIL) + return def; + if (!SYMBOLP(lval) && !STRINGP(lval)) + { + cerr << "non string value for parameter " << parameter << "\n"; + return 0; + } + + return get_c_string(lval); +} + + + + + diff --git a/src/modules/base/phrasify.cc b/src/modules/base/phrasify.cc new file mode 100644 index 0000000..16ed9e4 --- /dev/null +++ b/src/modules/base/phrasify.cc @@ -0,0 +1,692 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : August 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Phrase break prediction */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "modules.h" + +static void phrasing_none(EST_Utterance *u); +static void phrasing_by_cart(EST_Utterance *u); +static void phrasing_by_probmodels(EST_Utterance *u); +static void phrasing_by_fa(EST_Utterance *u); +static EST_VTCandidate *bb_candlist(EST_Item *s,EST_Features &f); +static EST_VTPath *bb_npath(EST_VTPath *p,EST_VTCandidate *c,EST_Features &f); +static double find_b_prob(EST_VTPath *p,int n,int *state); + +// Used in various default value cases +static int B_word = 0; +static int BB_word = 0; +static int NB_word = 0; +static int pos_p_start_tag = 0; +static int pos_pp_start_tag = 0; +static int pos_n_start_tag = 0; + +static double gscale_s = 1.0; +static double gscale_p = 0.0; +static EST_Ngrammar *bb_ngram = 0; +static EST_Ngrammar *bb_pos_ngram = 0; +static LISP bb_tags = NIL; +static LISP pos_map = NIL; +static LISP phrase_type_tree = NIL; + +/* Interslice */ +static EST_Track *bb_track = 0; + +LISP FT_Classic_Phrasify_Utt(LISP utt) +{ + // Predict and add phrasing to an utterance + EST_Utterance *u = get_c_utt(utt); + LISP phrase_method = ft_get_param("Phrase_Method"); + + *cdebug << "Phrasify module\n"; + + if (u->relation_present("Phrase")) + return utt; // already specified + else if (phrase_method == NIL) + phrasing_none(u); // all one phrase + else if (streq("prob_models",get_c_string(phrase_method))) + phrasing_by_probmodels(u); + else if (streq("cart_tree",get_c_string(phrase_method))) + phrasing_by_cart(u); + else if (streq("forced_align",get_c_string(phrase_method))) + phrasing_by_fa(u); + else + { + cerr << "PHRASIFY: unknown phrase method \"" << + get_c_string(phrase_method) << endl; + festival_error(); + } + + return utt; +} + +static void phrasing_none(EST_Utterance *u) +{ + // All in a single phrase + EST_Item *w,*phr=0; + + u->create_relation("Phrase"); + + for (w=u->relation("Word")->first(); w != 0; w = w->next()) + { + if (phr == 0) + phr = add_phrase(u); + append_daughter(phr,"Phrase",w); + if (w->next() == 0) + { + w->set("pbreak","B"); + phr->set_name("4"); + phr = 0; + } + } + +} + +static void phrasing_by_cart(EST_Utterance *u) +{ + EST_Item *w,*phr=0; + LISP tree; + EST_Val pbreak; + + u->create_relation("Phrase"); + tree = siod_get_lval("phrase_cart_tree","no phrase cart tree"); + + for (w=u->relation("Word")->first(); w != 0; w = w->next()) + { + if (phr == 0) + phr = add_phrase(u); + append_daughter(phr,"Phrase",w); + pbreak = wagon_predict(w,tree); + w->set("pbreak",pbreak.string()); + if ((pbreak == "B") || (pbreak == "BB")) + { + phr->set_name((EST_String)pbreak); + phr = 0; + } + } + +} + +static void pbyp_get_params(LISP params) +{ + EST_String bb_pos_name,bb_pos_filename,bb_name,bb_break_filename; + EST_String bb_track_name; + LISP l1; + + bb_pos_name = get_param_str("pos_ngram_name",params,""); + bb_pos_filename = get_param_str("pos_ngram_filename",params,""); + if ((bb_pos_ngram = get_ngram(bb_pos_name,bb_pos_filename)) == 0) + { + cerr << "PHRASIFY: no ngram called \"" << + bb_pos_name << "\" defined." << endl; + festival_error(); + } + + gscale_s = get_param_float("gram_scale_s",params,1.0); + gscale_p = get_param_float("gram_scale_p",params,0.0); + pos_map = get_param_lisp("pos_map",params,NIL); + + bb_name = get_param_str("break_ngram_name",params,""); + bb_break_filename = get_param_str("break_ngram_filename",params,""); + if ((bb_ngram = get_ngram(bb_name,bb_break_filename)) == 0) + { + cerr << "PHRASIFY: no ngram called \"" << + bb_name << "\" defined." << endl; + festival_error(); + } + bb_tags = get_param_lisp("break_tags",params,NIL); + phrase_type_tree = get_param_lisp("phrase_type_tree",params,NIL); + + bb_track_name = get_param_str("break_track_name",params,""); + if (bb_track_name != "") + { + if (bb_track) + delete bb_track; + bb_track = new EST_Track; + if (bb_track->load(bb_track_name) != format_ok) + { + delete bb_track; + cerr << "PHRASE: failed to load FA track " << + bb_track_name << endl; + festival_error(); + } + } + + l1 = siod_get_lval("pos_p_start_tag",NULL); + if (l1 != NIL) + pos_p_start_tag = bb_pos_ngram->get_vocab_word(get_c_string(l1)); + l1 = siod_get_lval("pos_pp_start_tag",NULL); + if (l1 != NIL) + pos_pp_start_tag = bb_pos_ngram->get_vocab_word(get_c_string(l1)); + l1 = siod_get_lval("pos_n_start_tag",NULL); + if (l1 != NIL) + pos_n_start_tag = bb_pos_ngram->get_vocab_word(get_c_string(l1)); +} + +static void phrasing_by_probmodels(EST_Utterance *u) +{ + // Predict phrasing using POS and prob models of B distribution + EST_Item *w,*phr=0; + EST_String pbreak; + int num_states; + + pbyp_get_params(siod_get_lval("phr_break_params",NULL)); + gc_protect(&bb_tags); + + for (w=u->relation("Word")->first(); w != 0; w = w->next()) + { // Set up tag index for pos ngram + EST_String lpos = map_pos(pos_map,w->f("pos").string()); + w->set("phr_pos",lpos); + w->set("pos_index",bb_pos_ngram->get_vocab_word(lpos)); + } + B_word = bb_ngram->get_vocab_word("B"); + NB_word = bb_ngram->get_vocab_word("NB"); + BB_word = bb_ngram->get_vocab_word("BB"); + + num_states = bb_ngram->num_states(); + EST_Viterbi_Decoder v(bb_candlist,bb_npath,num_states); + + v.initialise(u->relation("Word")); + v.search(); + v.result("pbreak_index"); + + // Given predicted break, go through and add phrases + u->create_relation("Phrase"); + for (w=u->relation("Word")->first(); w != 0; w = w->next()) + { + w->set("pbreak",bb_ngram-> + get_vocab_word(w->f("pbreak_index").Int())); + if (phr == 0) + phr = add_phrase(u); + append_daughter(phr,"Phrase",w); + if (phrase_type_tree != NIL) + { + EST_Val npbreak = wagon_predict(w,phrase_type_tree); + w->set("pbreak",npbreak.string()); // may reset to BB + } + pbreak = (EST_String)w->f("pbreak"); + if (pbreak == "B") + w->set("blevel",3); + else if (pbreak == "mB") + w->set("blevel",2); + if ((pbreak == "B") || (pbreak == "BB") || (pbreak == "mB")) + { + phr->set_name((EST_String)pbreak); + phr = 0; + } + } + + gc_unprotect(&bb_tags); + bb_tags = NIL; +} + +static EST_VTCandidate *bb_candlist(EST_Item *s,EST_Features &f) +{ + // Find candidates with a priori probabilities + EST_IVector window(bb_pos_ngram->order()); + (void)f; + int tag; + + if (bb_pos_ngram->order() == 4) + { + window[1] = s->I("pos_index",0); + if (s->prev() != 0) + window[0] = s->prev()->I("pos_index",0); + else + window[0] = pos_p_start_tag; + if (s->next() != 0) + window[2] = s->next()->I("pos_index",0); + else + window[2] = pos_n_start_tag; + } + else if (bb_pos_ngram->order() == 3) + { + window[0] = s->I("pos_index",0); + if (s->next() != 0) + window[1] = s->next()->I("pos_index",0); + else + window[1] = pos_n_start_tag; + } + else if (bb_pos_ngram->order() == 5) + { // This is specific for some set of pos tagsets + window[2] = s->I("pos_index",0); + if (s->prev() != 0) + { + window[1] = s->prev()->I("pos_index",0); + } + else + { + window[1] = pos_p_start_tag; + } + if (s->next() != 0) + { + window[3] = s->next()->I("pos_index",0); + if (s->next()->next() != 0) + window[0] = s->next()->next()->I("pos_index",0); + else + window[0] = 0; + } + else + { + window[3] = pos_n_start_tag; + window[0] = 0; + } + } + else + { + cerr << "PHRASIFY: can't deal with ngram of size " << + bb_pos_ngram->order() << endl; + festival_error(); + } + double prob=1.0; + EST_VTCandidate *all_c = 0; + EST_Val labelled_brk = ffeature(s,"R:Token.parent.pbreak"); + + if ((labelled_brk != "0") && + (ffeature(s,"R:Token.n.name") == "0")) // last word in token + { // there is a labelled break on the token so respect it + EST_VTCandidate *c = new EST_VTCandidate; + c->s = s; + c->name = bb_ngram->get_vocab_word(labelled_brk.string()); + c->score = log(0.95); // very very likely, but not absolute + c->next = all_c; + all_c = c; // but then if you give only one option ... + } + else if (s->next() == 0) // end of utterances so force a break + { + EST_VTCandidate *c = new EST_VTCandidate; + c->s = s; + c->name = B_word; + c->score = log(0.95); // very very likely, but not absolute + c->next = all_c; + all_c = c; + } + else if (s->name() == ".end_utt") + { // This is a quick check to see if forcing "." to B is worth it + EST_VTCandidate *c = new EST_VTCandidate; + c->s = s; + c->name = B_word; + c->score = log(0.95); // very very likely, but not absolute + c->next = all_c; + all_c = c; + } + else if (siod_get_lval("break_non_bayes",NULL) != NIL) + { + /* This uses the "wrong" formula to extract the probability */ + /* Extract P(B | context) rather than P(context | B) as below */ + /* This gives worse results as well as not following Bayes */ + /* equations */ + EST_VTCandidate *c; + LISP l; + for (l=bb_tags; l != 0; l=cdr(l)) + { + c = new EST_VTCandidate; + c->s = s; + tag = bb_ngram->get_vocab_word(get_c_string(car(l))); + c->name = tag; + window[bb_pos_ngram->order()-1] = tag; + const EST_DiscreteProbDistribution &pd = + bb_pos_ngram->prob_dist(window); + if (pd.samples() == 0) + { + if (tag == B_word) + prob = 0.2; + else + prob = 0.8; + } + else + prob = pd.probability(tag); + if (prob == 0) + c->score = log(0.0000001); + else + c->score = log(prob); + c->next = all_c; + all_c = c; + } + } + else + { // Standard Bayes model + EST_VTCandidate *c; + LISP l; + for (l=bb_tags; l != 0; l=cdr(l)) + { + c = new EST_VTCandidate; + c->s = s; + tag = bb_ngram->get_vocab_word(get_c_string(car(l))); + c->name = tag; + window[bb_pos_ngram->order()-1] = tag; + prob = bb_pos_ngram->reverse_probability(window); + + // If this word came from inside a token reduce the + // probability of a break + if ((ffeature(s,"R:Token.n.name") != "0") && + ((s->as_relation("Token")->first()->length()) < 7)) + { + float weight = ffeature(s,"pbreak_scale"); + if (weight == 0) weight = 0.5; + if (tag == B_word) + prob *= weight; + else + prob = 1.0-((1.0-prob)*weight); + } + if (prob == 0) + c->score = log(0.0000001); + else + c->score = log(prob); + s->set("phrase_score",c->score); + c->next = all_c; + all_c = c; + } + } + + return all_c; +} + +static EST_VTPath *bb_npath(EST_VTPath *p,EST_VTCandidate *c,EST_Features &f) +{ + EST_VTPath *np = new EST_VTPath; + (void)f; +// static EST_String lscorename("lscore"); + double prob; + double lprob,lang_prob; + + np->c = c; + np->from = p; + int n = c->name.Int(); + prob = find_b_prob(p,n,&np->state); + if (np->state == -1) + prob = find_b_prob(p,n,&np->state); + if (prob == 0) + lprob = log(0.00000001); + else + lprob = log(prob); + + lang_prob = (1.0 * c->score) + gscale_p; + lang_prob = c->score; + +// np->set_feature(lscorename,lang_prob+lprob); + if (p==0) + np->score = (lang_prob+lprob); + else + np->score = (lang_prob+lprob) + p->score; + + return np; +} + +static double find_b_prob(EST_VTPath *p,int n,int *state) +{ + int oldstate=0; + double prob; + + if (p == 0) + { + int order = bb_ngram->order(); + EST_IVector window(order); + int i; + window.a_no_check(order-1) = n; + window.a_no_check(order-2) = B_word; + for (i=order-3; i>=0; i--) + window.a_no_check(i) = NB_word; + oldstate = bb_ngram->find_state_id(window); + } + else + oldstate = p->state; + const EST_DiscreteProbDistribution &pd = bb_ngram->prob_dist(oldstate); + if (pd.samples() == 0) + prob = 0; + else + prob = (double)pd.probability(n); + // This is too specific + if (n == B_word) + prob *= gscale_s; + *state = bb_ngram->find_next_state_id(oldstate,n); + + return prob; + +} + +/* Part of Interslice */ + +static double find_b_faprob(EST_VTPath *p,int n,int *state) +{ + int oldstate=0; + int i,j; + EST_VTPath *d; + double prob; + double atime, wtime, wstddev=0, z; + static int ATOTH_BREAK=2; + static int ATOTH_NBREAK=1; + + if (p == 0) + { + oldstate = 0; + } + else + oldstate = p->state; + +/* if (streq("of", + ( p && p->c && p->c->s ? (const char *)ffeature(p->c->s,"name").String() : "null"))) + printf("ding\n"); */ + + // Prob is the prob that time since last break could be + // generated by the duration model + + // Skip over break if we're at one + for (i = oldstate; i < bb_track->num_frames(); i++) + { + if (bb_track->a(i,0) == ATOTH_NBREAK) + break; + } + + // time since last break in words with std + for (wstddev=wtime=0.0,d=p; d ; d=d->from) + { + wtime += ffeature(d->c->s,"word_duration").Float(); + wstddev += ffeature(d->c->s,"lisp_word_stddev").Float(); + if (bb_track->a(d->state,0) == ATOTH_BREAK) + break; + } + + // time since last break in acoustics + for (atime=0.01,j=i; j>0; j--) + { + if (bb_track->a(j,0) == ATOTH_BREAK) + break; + atime += bb_track->t(j) - ( i == 0 ? 0 : bb_track->t(j-1)); + } + + // Find best state for give time + if (wstddev == 0) + i++; + else if (n == B_word) + { /* cost of having a break here */ + /* extend acoustics until next break */ + for (; i < bb_track->num_frames(); i++) + { + if (bb_track->a(i,0) == ATOTH_BREAK) + break; + atime += bb_track->t(i) - ( i == 0 ? 0 : bb_track->t(i-1)); + } + z = fabs((atime-wtime)/wstddev); + } + else + { /* cost of having a non-break here */ + for ( i++,z = fabs((atime-wtime)/wstddev); + (i < bb_track->num_frames()) && (bb_track->a(i,0) == ATOTH_NBREAK); + i++) + { + atime += bb_track->t(i) - ( i == 0 ? 0 : bb_track->t(i-1)); +/* printf("better atime %f wtime %f wstddev %f z %f new z %f\n", + atime,wtime,wstddev,z, + fabs((atime-wtime)/wstddev)); */ + if (fabs((atime-wtime)/wstddev) > (float)z) + break; + else + z = fabs((atime-wtime)/wstddev); + } + } + + + if (d && d->c && d->c->s && (d->c->s->next()->next()) == NULL) + { /* must be in final state */ + printf("must be in final state\n"); + if (i != bb_track->num_frames()) + z = 0.000001; + } + + if (z == 0) + printf("z == 0"); + + /* number hack */ + prob = (2.0 - z)/2.0; +/* prob = z; */ + if (prob < 0.000001) + prob = 0.000001; + else if (prob > 0.999999) + prob = 0.999999; + + // prob of atime given (wtime,wstddev) + printf("%d %d %f %f %f %f %s %s %f\n",oldstate,i,atime,wtime,wstddev,z, + ( p && p->c && p->c->s ? (const char *)ffeature(p->c->s,"name").String() : "null"), + ( n == B_word ? "B" : "NB"), + prob + ); + if (i >= bb_track->num_frames()) + i = bb_track->num_frames() - 1; + *state = i; + + return prob; + +} + +static EST_VTPath *bb_fapath(EST_VTPath *p,EST_VTCandidate *c,EST_Features &f) +{ + EST_VTPath *np = new EST_VTPath; + (void)f; +// static EST_String lscorename("lscore"); + double prob; + double lprob,lang_prob; + + np->c = c; + np->from = p; + int n = c->name.Int(); + prob = find_b_faprob(p,n,&np->state); + if (prob == 0) + lprob = log(0.00000001); + else + lprob = log(prob); + + lang_prob = (1.0 * c->score) + gscale_p; + lang_prob = c->score; + +// np->set_feature(lscorename,lang_prob+lprob); + if (p==0) + np->score = (lang_prob+lprob); + else + np->score = (lang_prob+lprob) + p->score; + + return np; +} + +static void phrasing_by_fa(EST_Utterance *u) +{ + // Predict phrasing using POS and prob models of B distribution + EST_Item *w,*phr=0; + EST_String pbreak; + int num_states; + + pbyp_get_params(siod_get_lval("phr_break_params",NULL)); + gc_protect(&bb_tags); + + for (w=u->relation("Word")->first(); w != 0; w = w->next()) + { // Set up tag index for pos ngram + EST_String lpos = map_pos(pos_map,w->f("pos").string()); + w->set("phr_pos",lpos); + w->set("pos_index",bb_pos_ngram->get_vocab_word(lpos)); + } + + num_states = bb_track->num_frames(); + + EST_Viterbi_Decoder v(bb_candlist,bb_fapath,num_states); + + v.initialise(u->relation("Word")); + v.search(); + v.result("pbreak_index"); + + // Given predicted break, go through and add phrases + u->create_relation("Phrase"); + for (w=u->relation("Word")->first(); w != 0; w = w->next()) + { + w->set("pbreak",bb_ngram-> + get_vocab_word(w->f("pbreak_index").Int())); + if (phr == 0) + phr = add_phrase(u); + append_daughter(phr,"Phrase",w); + if (phrase_type_tree != NIL) + { + EST_Val npbreak = wagon_predict(w,phrase_type_tree); + w->set("pbreak",npbreak.string()); // may reset to BB + } + pbreak = (EST_String)w->f("pbreak"); + if (pbreak == "B") + w->set("blevel",3); + else if (pbreak == "mB") + w->set("blevel",2); + if ((pbreak == "B") || (pbreak == "BB") || (pbreak == "mB")) + { + phr->set_name((EST_String)pbreak); + phr = 0; + } + } + + gc_unprotect(&bb_tags); + bb_tags = NIL; +} + + +EST_Item *add_phrase(EST_Utterance *u) +{ + EST_Item *item = u->relation("Phrase")->append(); + + item->set_name("phrase"); + + return item; + +} + diff --git a/src/modules/base/phrinfo.cc b/src/modules/base/phrinfo.cc new file mode 100644 index 0000000..20c51b6 --- /dev/null +++ b/src/modules/base/phrinfo.cc @@ -0,0 +1,120 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : June 1997 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* An input method for much more general info on tokens */ +/* */ +/*=======================================================================*/ + +#include +#include "festival.h" +#include "modules.h" +#include "text.h" + +static EST_Item *make_phrase_node(EST_Utterance *u, + const EST_String &name, + LISP feats); +static EST_Item *make_token_node(EST_Utterance *u, + const EST_String &name, + LISP feats); + +void create_phraseinput(EST_Utterance *u) +{ + // Build from phrase input form (phrase, tokens, segs) + LISP l,ptree,t; + EST_Item *phrase,*token; + + ptree = utt_iform(*u); + + u->create_relation("Phrase"); + u->create_relation("Token"); + + for (l=ptree; l != NIL; l=cdr(l)) + { + if (streq("Phrase",get_c_string(car(car(l))))) + { + phrase = make_phrase_node(u,"Phrase",car(cdr(car(l)))); + for (t=cdr(cdr(car(l))); t != NIL; t=cdr(t)) + { + if (consp(car(t))) + token = make_token_node(u,get_c_string(car(car(t))), + car(cdr(car(t)))); + else + token = make_token_node(u,get_c_string(car(t)),NIL); + append_daughter(phrase,token); + } + } + else // no explicit phrase marker + { + cerr << "PhrInfo: malformed input form." << endl; + festival_error(); + } + } +} + +static EST_Item *make_phrase_node(EST_Utterance *u, + const EST_String &name, + LISP feats) +{ + // Create a phrase node with name and features + EST_Item *p; + + p = add_phrase(u); + p->set_name(name); + add_item_features(p,feats); + return p; +} + +static EST_Item *make_token_node(EST_Utterance *u, + const EST_String &name, + LISP feats) +{ + // Create a token node with name and features + EST_Token t = name; + EST_Item *li = add_token(u,t); + LISP f; + + for (f=feats; f != NIL; f=cdr(f)) + { + const char *nname = get_c_string(car(car(f))); + if (streq(nname,"punctuation")) + li->set("punc",get_c_string(car(cdr(car(f))))); + else + li->set(nname,get_c_string(car(cdr(car(f))))); + } + + return li; +} + diff --git a/src/modules/base/pos.cc b/src/modules/base/pos.cc new file mode 100644 index 0000000..7bc3288 --- /dev/null +++ b/src/modules/base/pos.cc @@ -0,0 +1,213 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : August 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Various part-of-speech predciting modules */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "lexicon.h" + +static EST_VTCandidate *pos_candlist(EST_Item *s,EST_Features &f); +static EST_VTPath *pos_npath(EST_VTPath *p,EST_VTCandidate *c,EST_Features &f); +static double find_np_prob(EST_VTPath *p,int n,int *state); + +static EST_Ngrammar *pos_ngram = 0; + +static EST_String zeroString("0"); +static int p_word = 0; // arbitrary numbers +static int n_word = 1; + +LISP FT_Classic_POS_Utt(LISP utt) +{ + // Predict part of speech for word stream + EST_Utterance *u = get_c_utt(utt); + LISP pos_lex_name, pos_ngram_name; + LISP lastlex, pos_p_start_tag, pos_pp_start_tag; + + *cdebug << "Classic POS module\n"; + + pos_lex_name = siod_get_lval("pos_lex_name",NULL); + if (pos_lex_name == NIL) + return utt; // not set so ignore it + pos_ngram_name = siod_get_lval("pos_ngram_name","no pos ngram name"); + pos_p_start_tag = siod_get_lval("pos_p_start_tag","no prev start tag"); + pos_pp_start_tag = siod_get_lval("pos_pp_start_tag","no prev prev start tag"); + + lastlex = lex_select_lex(pos_lex_name); + + if ((pos_ngram = get_ngram(get_c_string(pos_ngram_name))) == 0) + { + cerr << "POS: no ngram called \"" << + get_c_string(pos_ngram_name) << "\" defined" << endl; + festival_error(); + } + + p_word = pos_ngram->get_vocab_word(get_c_string(pos_p_start_tag)); + n_word = pos_ngram->get_vocab_word(get_c_string(pos_pp_start_tag)); + + EST_Viterbi_Decoder v(pos_candlist,pos_npath,pos_ngram->num_states()); + + v.initialise(u->relation("Word")); + v.search(); + v.result("pos_index"); + + lex_select_lex(lastlex); + + EST_Item *w; + EST_String pos; + LISP l; + // Map pos tagset to desired set + LISP pos_map = siod_get_lval("pos_map",NULL); + for (w=u->relation("Word")->first(); w != 0; w = w->next()) + { + // convert pos index into string value + pos = pos_ngram->get_vocab_word(w->f("pos_index").Int()); + w->set("pos",pos); + for (l=pos_map; l != NIL; l=cdr(l)) + if (siod_member_str(pos,car(car(l))) != NIL) + { + w->set("pos",get_c_string(car(cdr(car(l))))); + break; + } + } + + return utt; +} + +static EST_VTCandidate *pos_candlist(EST_Item *s,EST_Features &f) +{ + // Return list of possible pos based on a priori probabilities + LISP pd,l; + EST_Item *token; + EST_VTCandidate *c; + EST_VTCandidate *all_c = 0; + EST_String actual_pos; + (void)f; + + if (((actual_pos = s->S("pos","0")) != "0") || + (((token = parent(s,"Token")) != 0) && + ((actual_pos = token->S("pos","0")) != "0"))) + { + // There is an explicit pos specified, so respect it + pd = cons(make_param_float(actual_pos,1.0),NIL); + c = new EST_VTCandidate; + c->name = pos_ngram->get_vocab_word(actual_pos); + c->score = 1.0; + c->s = s; + c->next = 0; + return c; + } + + LISP e = lex_lookup_word(s->name(),NIL); + pd = car(cdr(e)); + + if (pd == NIL) + { + const char *chr = s->name(); + if (strchr("0123456789",chr[0]) != NULL) + e = lex_lookup_word("_number_",NIL); // I *know* there is an entry + else + e = lex_lookup_word("_OOV_",NIL); // I *know* there is an entry + pd = car(cdr(e)); + } + + // Build a candidate for each entry in prob distribution + for (l=pd; l != NIL; l=cdr(l)) + { + c = new EST_VTCandidate; + c->name = pos_ngram->get_vocab_word(get_c_string(car(car(l)))); + c->score = get_c_float(car(cdr(car(l)))); + c->s = s; + c->next = all_c; + all_c = c; + } + + return all_c; +} + +static EST_VTPath *pos_npath(EST_VTPath *p,EST_VTCandidate *c,EST_Features &f) +{ + // Build a potential new path from previous path and this candidate + EST_VTPath *np = new EST_VTPath; +// static EST_String lscorename("lscore"); + double prob; + double lprob; + (void)f; + + np->c = c; + np->from = p; + int n = c->name.Int(); + prob = find_np_prob(p,n,&np->state); + if (prob == 0) + lprob = log(0.00000001); + else + lprob = log(prob); + +// np->set_feature(lscorename,(c->score+lprob)); + if (p==0) + np->score = (c->score+lprob); + else + np->score = (c->score+lprob) + p->score; + + return np; +} + +static double find_np_prob(EST_VTPath *p,int n,int *state) +{ + int oldstate=0; + + if (p==0) + { // This could be done once before the search is called + int order = pos_ngram->order(); + EST_IVector window(order); + int i; + + window.a_no_check(order-1) = n; + window.a_no_check(order-2) = p_word; + for (i = order-3; i>=0; i--) + window.a_no_check(i) = n_word; + oldstate = pos_ngram->find_state_id(window); + } + else + oldstate = p->state; + *state = pos_ngram->find_next_state_id(oldstate,n); + const EST_DiscreteProbDistribution &pd = pos_ngram->prob_dist(oldstate); + if (pd.samples() == 0) + return 0; + else + return (double)pd.probability(n); +} diff --git a/src/modules/base/postlex.cc b/src/modules/base/postlex.cc new file mode 100644 index 0000000..76e4ea3 --- /dev/null +++ b/src/modules/base/postlex.cc @@ -0,0 +1,123 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : February 1997 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Post-lexical rules: vowel reduction and contraction, R deleteion */ +/* */ +/* All this is far too specific, and should be parameterized better */ +/* -- and gradually it is ... */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "modules.h" + +static void vowel_reduction(EST_Utterance *u); +static void r_reduction(EST_Utterance *u); +static void vowel_reduce(EST_Item *syl,LISP vow_table); + +LISP FT_PostLex_Utt(LISP utt) +{ + // Do vowel reduction, destructively changes vowel segment values + EST_Utterance *u = get_c_utt(utt); + + vowel_reduction(u); + r_reduction(u); + + return utt; +} + +static void r_reduction(EST_Utterance *u) +{ + // R reduction for mrpa (British English) + EST_Item *s,*t; + LISP r_red_tree; + + if (!streq(get_c_string(ft_get_param("PhoneSet")),"mrpa")) + return; + + r_red_tree = siod_get_lval("postlex_mrpa_r_cart_tree",NULL); + if (r_red_tree == NIL) + return; + + for (s=u->relation("Segment")->first(); s != 0; s = t) + { + t = s->next(); + if (wagon_predict(s,r_red_tree) == "delete") + s->unref_all(); + } +} + +static void vowel_reduction(EST_Utterance *u) +{ + EST_Item *s; + LISP red_tree, full_vow_table, vow_table=NIL; + + red_tree = siod_get_lval("postlex_vowel_reduce_cart_tree", NULL); + full_vow_table = siod_get_lval("postlex_vowel_reduce_table",NULL); + vow_table = + car(cdr(siod_assoc_str(get_c_string(ft_get_param("PhoneSet")), + full_vow_table))); + if ((vow_table == NIL) || (red_tree == NIL)) + return; // ain't anything to do + + for (s=u->relation("Syllable")->first(); s != 0; s = s->next()) + { + if (wagon_predict(s,red_tree) == "1") + vowel_reduce(s,vow_table); + } +} + +static void vowel_reduce(EST_Item *syl,LISP vow_table) +{ + // Reduce vowel in syl by looking it up in vow_table for + // appropriate vowel mapping + EST_Item *seg; + LISP vreduce=NIL; + + for (seg=daughter1(syl,"SylStructure"); seg; seg=seg->next()) + { + if (ph_is_vowel(seg->name())) + { + vreduce = siod_assoc_str(seg->name(),vow_table); + if (vreduce != NIL) + seg->set_name(get_c_string(car(cdr(vreduce)))); + return; + // ignore any secondary vowels in syllable (should only be one) + } + } +} + + diff --git a/src/modules/base/word.cc b/src/modules/base/word.cc new file mode 100644 index 0000000..1503555 --- /dev/null +++ b/src/modules/base/word.cc @@ -0,0 +1,270 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* From words to syllables and segments using the lexicon */ +/* */ +/*=======================================================================*/ + +#include +#include "festival.h" +#include "lexicon.h" +#include "modules.h" + +static EST_Item *add_syllable(EST_Utterance *u, int stress); +static LISP specified_word_pronunciation(EST_Item *w, LISP lpos); + +LISP FT_Classic_Word_Utt(LISP utt) +{ + // Look up words in lexicon and create syllable and segment streams + EST_Utterance *u = get_c_utt(utt); + EST_Item *w; + LISP entry,s,p,lpos; + EST_String pos; + EST_Item *syl,*seg; + EST_Relation *SylStructure; + + *cdebug << "Word module\n"; + + u->create_relation("Syllable"); + u->create_relation("Segment"); + SylStructure = u->create_relation("SylStructure"); + + for (w=u->relation("Word")->first(); w != 0; w = w->next()) + { + lpos = NIL; + pos = (EST_String)ffeature(w,"hg_pos"); + // explicit homograph pos disambiguation + if (pos == "0") + pos = (EST_String)ffeature(w,"pos"); + if (pos != "0") + lpos = rintern(pos); + + // Check if there is an explicitly given pronunciation before + // going to the lexicon + if ((entry = specified_word_pronunciation(w,lpos)) == NIL) + entry = lex_lookup_word(w->name(),lpos); + if (lpos == NIL) + w->set("pos",get_c_string(car(cdr(entry)))); + SylStructure->append(w); + for (s=car(cdr(cdr(entry))); s != NIL; s=cdr(s)) + { + syl = add_syllable(u,get_c_int(car(cdr(car(s))))); + append_daughter(w,"SylStructure",syl); + for (p=car(car(s)); p != NIL; p=cdr(p)) + { + seg = add_segment(u,get_c_string(car(p))); + append_daughter(syl,"SylStructure",seg); + } + } + } + + return utt; +} + +LISP FT_Unilex_Word_Utt(LISP utt) +{ + // This tries to be a bit cleverer than Classic_Word in dealing with full and reduced forms of words. + // Look up words in lexicon and create syllable and segment streams + EST_Utterance *u = get_c_utt(utt); + EST_Item *w; + LISP entry,entry2,s,p,s2,p2,lpos,lexpos; + EST_String pos, vowel_form,sname,s2name; + EST_Item *syl,*seg; + EST_Relation *SylStructure; + + *cdebug << "Word module\n"; + + u->create_relation("Syllable"); + u->create_relation("Segment"); + SylStructure = u->create_relation("SylStructure"); + + for (w=u->relation("Word")->first(); w != 0; w = w->next()) + { + lpos = NIL; + pos = EST_String(ffeature(w,"hg_pos")); + // explicit homograph pos disambiguation + if (pos == "0") + pos = EST_String(ffeature(w,"pos")); + if (pos != "0") + lpos = rintern(pos); + + // Check if there is an explicitly given pronunciation before + // going to the lexicon + if ((entry = specified_word_pronunciation(w,lpos)) == NIL) + entry = lex_lookup_word(w->name(),lpos); + lexpos = car(cdr(entry)); + // deal with full/reduced specification in pos as a list. + entry2 = NIL; + if (! atomp(lexpos)) + { + if ( (vowel_form = get_c_string(car(cdr(lexpos)))) == "full") + { + entry2 = lex_lookup_word(w->name(),cons(rintern("reduced"),NIL)); + if (lpos == NIL) + w->set("pos",get_c_string(car(lexpos))); + } + } + else if (lpos == NIL) + w->set("pos",get_c_string(lexpos)); + SylStructure->append(w); + if (entry2) // compare full and reduced form entries + for (s=car(cdr(cdr(entry))),s2=car(cdr(cdr(entry2))) ; s != NIL; s=cdr(s)) + { + syl = add_syllable(u,get_c_int(car(cdr(car(s))))); + append_daughter(w,"SylStructure",syl); + for (p=car(car(s)),p2=car(car(s2)); p != NIL; p=cdr(p)) + { + seg = add_segment(u,get_c_string(car(p))); + append_daughter(syl,"SylStructure",seg); + + if(p2 != NIL) + { + sname = get_c_string(car(p)); + s2name = get_c_string(car(p2)); + if (sname != s2name) + { + seg->set("reducable",1); + seg->set("fullform",sname); + seg->set("reducedform",s2name); + } + p2=cdr(p2); + } + } + if(s2 != NIL) + s2 = cdr(s2); + } + else + for (s=car(cdr(cdr(entry))); s != NIL; s=cdr(s)) + { + syl = add_syllable(u,get_c_int(car(cdr(car(s))))); + append_daughter(w,"SylStructure",syl); + for (p=car(car(s)); p != NIL; p=cdr(p)) + { + seg = add_segment(u,get_c_string(car(p))); + append_daughter(syl,"SylStructure",seg); + } + } + } + + return utt; +} + + +static LISP specified_word_pronunciation(EST_Item *w, LISP lpos) +{ + // If there is a phoneme feature on w or the Token related to + // w use that as the pronunciation. Note the value will be a string + // from which a list can be read. + EST_String p; + + if (((p = (EST_String)ffeature(w,"phonemes")) != "0") || + ((p = (EST_String)ffeature(w,"R:Token.parent.phonemes")) != "0")) + { + LISP phones = read_from_lstring(strintern(p)); + + return cons(strintern(w->name()), + cons(lpos, + cons(lex_syllabify(phones),NIL))); + } + else + return NIL; + +} + +EST_Item *add_word(EST_Utterance *u, const EST_String &name) +{ + EST_Item *item = u->relation("Word")->append(); + + item->set_name(name); + + return item; +} + +EST_Item *add_word(EST_Utterance *u, LISP word) +{ + // Build a Word Ling_Item from the Lisp description, which may + // contain other features + LISP f; + EST_Item *si_word; + int has_name = FALSE; + + if (consp(word)) + { + // feature form + si_word = add_word(u,""); + for (f=word; f != NIL; f=cdr(f)) + { + if (streq("name",get_c_string(car(car(f))))) + { + has_name = TRUE; + si_word->set_name(get_c_string(car(cdr(car(f))))); + } + else + si_word->set(get_c_string(car(car(f))), + get_c_string(car(cdr(car(f))))); + } + if (!has_name) + { + cerr << "add_word: word has description but no name" << endl; + cerr << " " << siod_sprint(word) << endl; + festival_error(); + } + } + else // just the name + si_word = add_word(u,get_c_string(word)); + + return si_word; +} + +static EST_Item *add_syllable(EST_Utterance *u, int stress) +{ + EST_Item *item = u->relation("Syllable")->append(); + + item->set_name("syl"); + item->set("stress",stress); + + return item; +} + +EST_Item *add_segment(EST_Utterance *u, const EST_String &s) +{ + EST_Item *item = u->relation("Segment")->append(); + + item->set_name(s); + + return item; +} + diff --git a/src/modules/clunits/Makefile b/src/modules/clunits/Makefile new file mode 100644 index 0000000..f808649 --- /dev/null +++ b/src/modules/clunits/Makefile @@ -0,0 +1,71 @@ +########################################################################### +## ## +## Carnegie Mellon University and ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1998-2010 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH, CARNEGIE MELLON UNIVERSITY AND THE ## +## CONTRIBUTORS TO THIS WORK DISCLAIM ALL WARRANTIES WITH REGARD TO ## +## THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY ## +## AND FITNESS, IN NO EVENT SHALL THE UNIVERSITY OF EDINBURGH, CARNEGIE ## +## MELLON UNIVERSITY NOR THE CONTRIBUTORS BE LIABLE FOR ANY SPECIAL, ## +## INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ## +## RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION ## +## OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF ## +## OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. ## +## ## +########################################################################### +## ## +## Author: Alan W Black ## +## Date: April 1998 ## +## with substantial tidyup Spring 2000 ## +########################################################################### +## ## +## Implementation of Black, A. and Taylor, P. (1997). Automatically ## +## clustering similar units for unit selection in speech synthesis ## +## Proceedings of Eurospeech 97, vol2 pp 601-604, Rhodes, Greece. ## +## ## +## postscript: http://www.cs.cmu.edu/~awb/papers/ES97units.ps ## +## http://www.cs.cmu.edu/~awb/papers/ES97units/ES97units.html ## +## ## +## This also serves as an example unit selection algorithm within the ## +## the new festival unit selection/signal processing architecture ## +## ## +########################################################################### +TOP=../../.. +DIRNAME=src/modules/clunits +SCMS = acost.scm +H = clunits.h +TSRCS = +SRCSCXX = acost.cc clunits.cc cldb.cc cljoin.cc +SRCS = $(SRCSCXX) $(TSRCS) + +OBJS = $(SRCSCXX:.cc=.o) + +FILES=Makefile $(SRCS) $(H) $(SCMS) $(WORKINGON) + +LOCAL_INCLUDES = -I../include -I../UniSyn + +INLIB = $(TOP)/src/lib/libFestival.a + +ALL = .buildlib + +include $(TOP)/config/common_make_rules + + diff --git a/src/modules/clunits/acost.cc b/src/modules/clunits/acost.cc new file mode 100644 index 0000000..96176f0 --- /dev/null +++ b/src/modules/clunits/acost.cc @@ -0,0 +1,380 @@ +/*************************************************************************/ +/* */ +/* Carnegie Mellon University and */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1998-2001 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH, CARNEGIE MELLON UNIVERSITY AND THE */ +/* CONTRIBUTORS TO THIS WORK DISCLAIM ALL WARRANTIES WITH REGARD TO */ +/* THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY */ +/* AND FITNESS, IN NO EVENT SHALL THE UNIVERSITY OF EDINBURGH, CARNEGIE */ +/* MELLON UNIVERSITY NOR THE CONTRIBUTORS BE LIABLE FOR ANY SPECIAL, */ +/* INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER */ +/* RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION */ +/* OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF */ +/* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1998 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Find the "acoustic" distance between two units. There are a number */ +/* of various params to this but it falls downs to a weighted summed */ +/* difference of parameters for the two units. */ +/* */ +/* A linear interpolation of the smallest to largest is done, and an */ +/* optional penality is factored. */ +/* */ +/* This is independent of any particular Unit Database */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "clunits.h" + +static void find_unit_distances(LISP units, const EST_String &fname); + +static float duration_penalty_weight=1.0; +static float f0_penalty_weight=0.0; +static EST_FVector weights; +static LISP get_stds_per_unit = NIL; +static EST_String directory = ""; + +LISP acost_utt_load_coeffs(LISP utt, LISP params) +{ + // Load all the coefficient file and create sub_track for each + // segment + EST_Utterance *u = get_c_utt(utt); + EST_Track *track = new EST_Track; + EST_String coefffilename = + EST_String(get_param_str("db_dir",params,"./"))+ + get_param_str("coeffs_dir",params,"coeffs/")+ + u->f("fileid").string()+ + get_param_str("coeffs_ext",params,".coeffs"); + float ac_left_context = get_param_float("ac_left_context",params,0.0); + EST_String segrelation = + EST_String(get_param_str("clunit_relation",params,"Segment")); + + if (track->load(coefffilename) != format_ok) + { + cerr << "ACOST: failed to read track from \"" << + coefffilename << "\"" << endl; + festival_error(); + } + cl_maybe_fix_pitch_c0(track); + // Add the whole track to a new relation + EST_Item *c_si = u->create_relation("Acoustic_Coeffs")->append(); + c_si->set_val("Acoustic_Coeffs", est_val(track)); + + // Now add subtracks for each segment + for (EST_Item *s=u->relation(segrelation)->first(); s != 0; s=s->next()) + { + EST_Track *st = new EST_Track; + float start = ffeature(s,"segment_start"); + float end = ffeature(s,"segment_end"); + if (s->prev()) + start -= ac_left_context* + ffeature(s,"p.segment_duration").Float(); + int startf = track->index(start); + int nframes = track->index(end)-startf; + if (startf >= track->num_frames()) + { + cerr << "ACOST: utterances longer than coeffs file \n " << + coefffilename << endl; + festival_error(); + } + else if ((startf + nframes) > track->num_frames()) + nframes = track->num_frames() - startf; + track->sub_track(*st,startf,nframes,0); + s->set_val("Acoustic_Coeffs",est_val(st)); + } + return utt; +} + +LISP make_unit_distance_tables(LISP unittypes, LISP params) +{ + // Build distance tables for given lists of lists of items + // Each item should have a coeffs feature (a track). Also + // the order the items are give is the order they will + // be put into the saved distance matrix. Care should be + // taken to ensure you save related features in the same order + LISP ut; + + for (ut=unittypes; ut != NIL; ut = cdr(ut)) + { + acost_dt_params(params); + + EST_String unit_name = get_c_string(car(car(ut))); + EST_String fname = + EST_String(get_param_str("db_dir",params,"./"))+ + get_param_str("disttabs_dir",params,"disttabs/")+ + unit_name+".disttab"; + cout << "Making unit distance table for " << unit_name << + " (" << siod_llength(cdr(car(ut))) << ")" << endl; + find_unit_distances(cdr(car(ut)),fname); + } + + return NIL; +} + +LISP ac_distance_tracks(LISP filename1, LISP filename2, LISP lweights) +{ + // Return Unit type distance between two full files. + EST_Track a,b; + + if (a.load(get_c_string(filename1)) != format_ok) + { + cerr << "CLUNITS: distance tracks: \"" << + get_c_string(filename1) << "\" unloadable." << endl; + festival_error(); + } + if (b.load(get_c_string(filename2)) != format_ok) + { + cerr << "CLUNITS: distance tracks: \"" << + get_c_string(filename2) << "\" unloadable." + << endl; + festival_error(); + } + + LISP l; + int i; + float dist; + + duration_penalty_weight = get_c_float(car(lweights)); + + EST_FVector tweights(siod_llength(cdr(lweights))); + for (l=cdr(lweights),i=0; l!=NIL;l=cdr(l),i++) + tweights[i] = get_c_float(car(l)); + + dist = ac_unit_distance(a,b,tweights); + + return flocons(dist); +} + +void acost_dt_params(LISP params) +{ + // Extract parameters from description + LISP lweights,l; + int i; + + directory = get_param_str("disttab_dir",params,"disttabs"); + lweights = get_param_lisp("ac_weights",params,NIL); + + weights.resize(siod_llength(lweights)); + for (l=lweights,i=0; l != NIL; l=cdr(l),i++) + weights[i] = get_c_float(car(l)); + duration_penalty_weight = get_param_float("dur_pen_weight",params,1.0); + f0_penalty_weight = get_param_float("f0_pen_weight",params,0.0); + get_stds_per_unit = get_param_lisp("get_stds_per_unit",params,NIL); + +} + +static void cumulate_ss_frames(EST_Track *a,EST_SuffStats *ss_frames) +{ + // Gather sufficient statistics on the parameters in each frame + int i,j; + double p; + + for (i=0; i < a->num_frames(); i++) + for (j=0; j < a->num_channels(); j++) + { + p = a->a_no_check(i,j); + if (!finite(p)) + { + p = 1.0e5; + a->a_no_check(i,j) = p; + } + ss_frames[j] += p; + } +} + +static EST_Track *acost_get_coefficients(EST_Item *si) +{ + EST_Val c = si->f("Acoustic_Coeffs"); + + if (c == 0) + { + cerr << "ACOST: failed to find coefficients on items\n"; + festival_error(); + } + return track(c); +} + +static void find_unit_distances(LISP units, const EST_String &fname) +{ + // Find all distances between units of type. + int i,j; + LISP u,v; + EST_FMatrix dist(siod_llength(units),siod_llength(units)); + EST_SuffStats *ss_frames = new EST_SuffStats[weights.length()]; + + // Find stddev for this unit + for (i=0,u=units; u != 0; u=cdr(u),i++) + { + dist.a_no_check(0,i) = 0; + if (get_stds_per_unit != NIL) // need to calculate stds for this unit + { + EST_Item *si = get_c_item(car(u)); + EST_Track *a=acost_get_coefficients(si); + if (a->num_channels() != weights.length()) + { + cerr << "ACOST: number of weights " << + weights.length() << " does not match mcep param width " + << a->num_channels() << endl; + festival_error(); + } + cumulate_ss_frames(a,ss_frames); + } + } + + if (get_stds_per_unit != NIL) // modify weights with stds + for (i=0; i < weights.length(); i++) + weights[i] /= (ss_frames[i].stddev() * ss_frames[i].stddev()); + + for (i=1,u=cdr(units); u != 0; u=cdr(u),i++) + { + EST_Track *a=acost_get_coefficients(get_c_item(car(u))); + // Only fill lower half of matrix + for (v=units,j=0; j < i; j++,v=cdr(v)) + { + EST_Track *b=acost_get_coefficients(get_c_item(car(v))); + dist.a_no_check(i,j) = ac_unit_distance(*a,*b,weights); + } + for ( ; j < dist.num_rows(); j++) + dist.a_no_check(i,j) = 0.0; + } + + delete [] ss_frames; + + if (dist.save(fname,"est_ascii") != write_ok) + { + cerr << "ACOST: failed to save distance data in \"" << + fname << endl; + festival_error(); + } +} + +float ac_unit_distance(const EST_Track &unit1, + const EST_Track &unit2, + const EST_FVector wghts) +{ + // Find distance between two units, unit1 will be smaller than unit2 + float distance = 0.0; + int i,j,k; + float fj,incr,dur_penalty; + float cost,diff; + float score; + int nc = unit1.num_channels(); + + if (unit1.end() > unit2.end()) + return ac_unit_distance(unit2,unit1,wghts); // unit1 is smaller + if (unit1.num_frames() == 0) +// return 1.0e20; // HUGE_VAL is too HUGE + return 100; + + if ((unit1.num_channels() != unit2.num_channels()) || + (unit1.num_channels() != wghts.length())) + { + cerr << "ac_unit_distance: unit1 (" << unit1.num_channels() << + "), unit2 (" << unit2.num_channels() << ") and wghts (" << + wghts.length() << ") are of different size" << endl; + festival_error(); + } + +// printf("unit1 nf %d end %f unit2 nf %d end %f \n", +// unit1.num_frames(), +// unit1.end(), +// unit2.num_frames(), +// unit2.end()); + incr = unit1.end()/ unit2.end(); + for (fj=0.0,j=i=0; i < unit2.num_frames(); i++,fj+=incr) + { + while ((j < unit1.num_frames()-1) && (unit1.t(j) < unit2.t(i)*incr) ) + j++; +// printf("unit1 j %d %f unit2 %d %f %f\n", +// j,unit1.t(j), +// i,unit2.t(i), +// unit2.t(i)*incr); + cost = f0_penalty_weight * + fabs((unit1.t(j)-((j > 0) ? unit1.t(j-1) : 0))- + (unit2.t(i)-((i > 0) ? unit2.t(i-1) : 0))); + + for (k=0; k < nc; k++) + if (wghts.a_no_check(k) != 0.0) + { + diff = unit2.a_no_check(i,k)-unit1.a_no_check(j,k); + diff *= diff; + cost += diff*wghts.a_no_check(k); + } + distance += cost; + } + + dur_penalty = (float)unit2.end()/(float)unit1.end(); + + score = (distance/(float)i)+(dur_penalty*duration_penalty_weight); + + return score; +} + +// Maybe can reduce this to an FVector pulled out of the track +float frame_distance(const EST_Track &a, int ai, + const EST_Track &b, int bi, + const EST_FVector &wghts, + float f0_weight) +{ + float cost = 0.0,diff; + + if ((a.num_channels() != b.num_channels()) || + (a.num_channels() != wghts.length())) + { + cerr << "frame_distance: unit1, unit2 and wghts" << + " are of different size" << endl; + festival_error(); + } + + if ((ai < 0) || + (ai >= a.num_frames()) || + (bi < 0) || + (bi >= b.num_frames())) + { + cerr << "frame_distance: frames out of range" << endl; + festival_error(); + } + + if (f0_weight > 0) + { + cost = f0_weight * + fabs((a.t(ai)-((ai > 0) ? a.t(ai-1) : 0))- + (b.t(bi)-((bi > 0) ? b.t(bi-1) : 0))); + } + + for (int k=0; k < a.num_channels(); k++) + { + if (wghts.a_no_check(k) != 0.0) + { + diff = a.a_no_check(ai,k)-b.a_no_check(bi,k); + diff *= wghts.a_no_check(k); + cost += diff*diff; + } + } + + return sqrt(cost); +} + diff --git a/src/modules/clunits/acost.scm b/src/modules/clunits/acost.scm new file mode 100644 index 0000000..601d1a6 --- /dev/null +++ b/src/modules/clunits/acost.scm @@ -0,0 +1,58 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Carnegie Mellon University and ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1998-2001 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH, CARNEGIE MELLON UNIVERSITY AND THE ;; +;;; CONTRIBUTORS TO THIS WORK DISCLAIM ALL WARRANTIES WITH REGARD TO ;; +;;; THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY ;; +;;; AND FITNESS, IN NO EVENT SHALL THE UNIVERSITY OF EDINBURGH, CARNEGIE ;; +;;; MELLON UNIVERSITY NOR THE CONTRIBUTORS BE LIABLE FOR ANY SPECIAL, ;; +;;; INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ;; +;;; RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION ;; +;;; OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF ;; +;;; OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; Finding features and acoustic distance measture for a set of +;;; segments in a database of utterances +;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; +;;; This is primarily implement for the cluster unit selection method +;;; but may be use in other unit selection schemes. +;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; There are five stages to this +;;; Load in all utterances +;;; Load in their coefficients +;;; Collect together the units of the same type +;;; build distance tables from them +;;; dump features for them +;;; + +;; Everything is moved to lib/clunits_build.scm and +;; lib/clunits.scm + +(require_module 'clunits) +(require 'clunits_build) + +(provide 'acost) diff --git a/src/modules/clunits/cldb.cc b/src/modules/clunits/cldb.cc new file mode 100644 index 0000000..ab723bf --- /dev/null +++ b/src/modules/clunits/cldb.cc @@ -0,0 +1,421 @@ +/*************************************************************************/ +/* */ +/* Carnegie Mellon University and */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1998-2001 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH, CARNEGIE MELLON UNIVERSITY AND THE */ +/* CONTRIBUTORS TO THIS WORK DISCLAIM ALL WARRANTIES WITH REGARD TO */ +/* THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY */ +/* AND FITNESS, IN NO EVENT SHALL THE UNIVERSITY OF EDINBURGH, CARNEGIE */ +/* MELLON UNIVERSITY NOR THE CONTRIBUTORS BE LIABLE FOR ANY SPECIAL, */ +/* INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER */ +/* RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION */ +/* OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF */ +/* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1998 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* A quick database structure */ +/* */ +/*=======================================================================*/ +#include +#include +#include "festival.h" +#include "EST_FileType.h" +#include "clunits.h" + +VAL_REGISTER_CLASS(clunitsdb,CLDB) +SIOD_REGISTER_CLASS(clunitsdb,CLDB) +static void cl_load_catalogue(CLDB *cldb,EST_String &indexfile); + +static LISP CLDB_list = NIL; +static CLDB *current_cldb = 0; + +static void cldb_add(const EST_String &name, CLDB *cldb) +{ + // Add lexicon to list of lexicons + LISP lpair; + + lpair = siod_assoc_str(name,CLDB_list); + + if (CLDB_list == NIL) + gc_protect(&CLDB_list); + + if (lpair == NIL) + { + CLDB_list = cons(cons(strintern(name), + cons(siod(cldb),NIL)), + CLDB_list); + } + else + { + cwarn << "CLDB " << name << " recreated" << endl; + // old one will be garbage collected + setcar(cdr(lpair),siod(cldb)); + } + + return; +} + + +LISP cl_load_db(LISP params) +{ + EST_String indexfile; + int i; + LISP w; + CLDB *cldb = new CLDB; + + cldb->params = params; + + indexfile = EST_String("") + + get_param_str("db_dir",params,"./")+ + get_param_str("catalogue_dir",params,"./")+ + get_param_str("index_name",params,"catalogue")+ + ".catalogue"; + + cl_load_catalogue(cldb,indexfile); + + cldb->cweights.resize(siod_llength(get_param_lisp("join_weights",params,NIL))); + for (i=0,w=get_param_lisp("join_weights",params,NIL); w; w=cdr(w),i++) + cldb->cweights[i] = get_c_float(car(w)); + + cldb_add(get_param_str("index_name",params,"catalogue"),cldb); + + current_cldb = cldb; + + return NIL; +} + +static void cl_load_catalogue(CLDB *cldb,EST_String &indexfile) +{ + EST_TokenStream ts; + EST_EstFileType t; + EST_Option hinfo; + EST_String v; + bool ascii; + EST_read_status r; + + if (((indexfile == "-") ? ts.open(cin) : ts.open(indexfile)) != 0) + { + cerr << "CLUNITS: Can't open catalogue file " << indexfile << endl; + festival_error(); + } + + if (((r = read_est_header(ts, hinfo, ascii, t)) != format_ok) || + (t != est_file_index)) + { + cerr << "CLUNITS: " << indexfile << " is not an indexfile" << endl; + festival_error(); + } + + CLunit *ls = 0; + while(!ts.eof()) + { + CLunit *s = new CLunit; + s->name = ts.get().string(); + s->base_name = s->name.before("_"); + s->fileid = ts.get().string(); + s->start = atof(ts.get().string()); + s->mid = atof(ts.get().string()); + s->end = atof(ts.get().string()); + + if ((ls != 0) && + (ls->fileid == s->fileid) && + (ls->end == s->start)) + { + s->prev_unit = ls; + ls->next_unit = s; + } + cldb->index.add(s->name,s); + ls = s; + } +} + +CLDB *check_cldb() +{ + if (current_cldb == 0) + { + cerr << "CLDB: no database loaded\n"; + festival_error(); + } + return current_cldb; +} + +void cl_maybe_fix_pitch_c0(EST_Track *c) +{ + // If its pitch synchronous, trash the first coefficient with + // the pitch value, there should be a cleaner way to do this + int i; + float ltime = 0; + + if (!c->equal_space()) + { + for (i=0; i < c->num_frames(); i++) + { + c->a_no_check(i,0) = 1/(c->t(i)-ltime); + ltime = c->t(i); + } + } +} + +void CLDB::load_join_coefs(CLunit *unit) +{ + // Load in the coefficients and signal for this unit. + CLfile *fileitem; + EST_Track *join_coeffs; + + if (unit->join_coeffs != 0) + return; + + fileitem = get_file_join_coefs(unit->fileid); + + EST_Track *unit_join_coeffs = new EST_Track; + join_coeffs = fileitem->join_coeffs; + + int pm_start = join_coeffs->index(unit->start); + int pm_end = join_coeffs->index(unit->end); + + join_coeffs->sub_track(*unit_join_coeffs, pm_start, pm_end-pm_start+1,0); + unit->join_coeffs = unit_join_coeffs; +} + +CLfile *CLDB::get_file_join_coefs(const EST_String &fileid) +{ + CLfile *fileitem; + + fileitem = get_fileitem(fileid); + + if (fileitem == 0) + { // even the file isn't here + fileitem = new CLfile; + fileindex.add(fileid,fileitem); + } + if (fileitem->join_coeffs == 0) + { + EST_Track *join_coeffs = new EST_Track; + EST_String jc_filename = + EST_String("") + + get_param_str("db_dir",params,"./") + + get_param_str("coeffs_dir",params,"wav/") + + fileid+ + get_param_str("coeffs_ext",params,".dcoeffs"); + if (join_coeffs->load(jc_filename) != format_ok) + { + delete join_coeffs; + cerr << "CLUNITS: failed to load join coeffs file " << + jc_filename << endl; + festival_error(); + } +// cl_maybe_fix_pitch_c0(join_coeffs); + fileitem->join_coeffs = join_coeffs; + } + + return fileitem; +} + +CLfile *CLDB::get_file_coefs_sig(const EST_String &fileid) +{ + CLfile *fileitem = get_fileitem(fileid); + + if (fileitem == 0) + { // even the file isn't here + fileitem = new CLfile; + fileindex.add(fileid,fileitem); + } + if (fileitem->sig == 0) + { + EST_Track *track = new EST_Track; + EST_String coef_filename = + EST_String("") + + get_param_str("db_dir",params,"./") + + get_param_str("pm_coeffs_dir",params,"pm/") + + fileid+ + get_param_str("pm_coeffs_ext",params,".pm"); + if (track->load(coef_filename) != format_ok) + { + delete track; + cerr << "CLUNITS: failed to load coeffs file " << + coef_filename << endl; + festival_error(); + } + fileitem->coefs = track; + + EST_Wave *sig = new EST_Wave; + EST_String sig_filename = + EST_String("") + + get_param_str("db_dir",params,"./") + + get_param_str("sig_dir",params,"wav/") + + fileid+ + get_param_str("sig_ext",params,".wav"); + if (sig->load(sig_filename) != format_ok) + { + delete sig; + cerr << "CLUNITS: failed to load signal file " << + sig_filename << endl; + festival_error(); + } + fileitem->sig = sig; + } + return fileitem; +} + +void CLDB::load_coefs_sig(EST_Item *unit) +{ + // Load in the coefficients and signal for this unit. + EST_String fileid = unit->f("fileid"); + CLfile *fileitem; + + fileitem = get_file_coefs_sig(fileid); + + EST_Track *coeffs = fileitem->coefs; + EST_Wave *sig = fileitem->sig; + EST_Track u1; + EST_Wave *unit_sig = new EST_Wave; + + int pm_start = coeffs->index(unit->F("start")); + int pm_middle = coeffs->index(unit->F("middle")); + int pm_end = coeffs->index(unit->F("end")); + +// coeffs->sub_track(u1,Gof((pm_start-1),0), pm_end - pm_start + 1); + coeffs->sub_track(u1,pm_start, pm_end - pm_start + 1,0); + EST_Track *unit_coeffs = new EST_Track(u1); + for (int j = 0; j < u1.num_frames(); ++j) + unit_coeffs->t(j) = u1.t(j) - coeffs->t(Gof((pm_start - 1), 0)); + +/* printf("coefs %s: pm_start %d pm_end %d pm_length %d\n", + (const char *)fileid, + pm_start, pm_end, pm_end - pm_start + 1); */ + unit->set_val("coefs",est_val(unit_coeffs)); + + if ((pm_middle-pm_start-1) < 1) + unit->set("middle_frame", 1); + else + unit->set("middle_frame", pm_middle - pm_start -1); + int samp_start = (int)(coeffs->t(Gof((pm_start - 1), 0)) + * (float)sig->sample_rate()); + int samp_end; + if ((pm_end + 1) < coeffs->num_frames()) + samp_end = (int)(coeffs->t(pm_end + 1) * (float)sig->sample_rate()); + else + samp_end = (int)(coeffs->t(pm_end) * (float)sig->sample_rate()); + int real_samp_start = (int)(unit->F("start") * (float)sig->sample_rate()); + int real_samp_end = (int)(unit->F("end") * (float)sig->sample_rate()); + if (samp_end-samp_start < 1) + sig->sub_wave(*unit_sig,samp_start, 1); + else + sig->sub_wave(*unit_sig,samp_start, samp_end-samp_start); + if (real_samp_start-samp_start<0) + unit->set("samp_start",0); + else + unit->set("samp_start",real_samp_start-samp_start); + unit->set("samp_end",real_samp_end-samp_start); + /* Need to preserve where the phone boundary is (which may actually */ + /* be past the end of this unit */ + unit->set("samp_seg_start", + (int)(unit->F("seg_start") * + (float)sig->sample_rate())-samp_start); + unit->set_val("sig",est_val(unit_sig)); +} + +CLunit::CLunit() +{ + start=0; + mid=0; + end=0; + prev_unit = 0; + next_unit = 0; + samp_start = 0; + samp_end = 0; + join_coeffs = 0; + coefs = 0; + sig = 0; +} + +CLunit::~CLunit() +{ + delete join_coeffs; + delete coefs; + delete sig; +} + +CLfile::CLfile() +{ + join_coeffs = 0; + coefs = 0; + sig = 0; +} + +CLfile::~CLfile() +{ + delete join_coeffs; + delete coefs; + delete sig; +} + +CLDB::CLDB() +{ + gc_protect(¶ms); +} + +static void del_clunit(void *s) { delete (CLunit *)s; } +static void del_clfile(void *s) { delete (CLfile *)s; } +CLDB::~CLDB() +{ + index.clear(del_clunit); + fileindex.clear(del_clfile); + gc_unprotect(¶ms); +} + +LISP cldb_list(void) +{ + // List names of all current defined cluster dbs + LISP d = NIL; + LISP l; + + for (l=CLDB_list; l != NIL; l=cdr(l)) + d = cons(car(car(l)),d); + + return d; +} + +LISP cldb_select(LISP dbname) +{ + // Select named cldb and make it current + EST_String name = get_c_string(dbname); + LISP lpair; + + lpair = siod_assoc_str(name,CLDB_list); + + if (lpair == NIL) + { + cerr << "CLDB " << name << " not defined" << endl; + festival_error(); + } + else + current_cldb = clunitsdb(car(cdr(lpair))); + + return dbname; +} + + + diff --git a/src/modules/clunits/cljoin.cc b/src/modules/clunits/cljoin.cc new file mode 100644 index 0000000..7777297 --- /dev/null +++ b/src/modules/clunits/cljoin.cc @@ -0,0 +1,204 @@ +/*************************************************************************/ +/* */ +/* Language Technologies Institute */ +/* Carnegie Mellon University */ +/* Copyright (c) 1999-2004 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author: Alan W Black */ +/* Date: July 2001 */ +/* --------------------------------------------------------------------- */ +/* Some more interesting join methods */ +/* based on the mapping code in UniSyn */ +/* */ +/*************************************************************************/ + +#include "EST_error.h" +#include "us_synthesis.h" +#include "festival.h" + +#if 0 +static int awb_voiced(EST_Track &pm,int i) +{ + // Hacky guess at voicedness + return TRUE; + if ((i < 2) || (i+3 > pm.num_frames())) + return FALSE; + else if (((pm.t(i) - pm.t(i-1)) == (pm.t(i+1) - pm.t(i))) && + ((pm.t(i-1) - pm.t(i-2)) == (pm.t(i) - pm.t(i-1)))) + return FALSE; + else + return TRUE; +} + +static float frame_duration(EST_Track &pm, int i) +{ + if (i <= 0) + return frame_duration(pm,i+1); + else if (i >= pm.num_frames()) + return frame_duration(pm,pm.num_frames()-1); + else + return pm.t(i)-pm.t(i-1); +} + +static float awb_smoothed(EST_Track &pm, int i) +{ + // Returned smoothed pitch period at i + + return (frame_duration(pm,i-4)+ + frame_duration(pm,i-3)+ + frame_duration(pm,i-2)+ + frame_duration(pm,i-1)+ + frame_duration(pm,i)+ + frame_duration(pm,i+1)+ + frame_duration(pm,i+2))/7.0; +} +#endif + + +static void make_segment_varied_mapping(EST_Relation &source_lab, + EST_Track &source_pm, + EST_Track &target_pm, + EST_IVector &map, + float dur_impose_factor, + float f0_impose_factor) +{ + int n_i, s_i, u_i, u_frames; + int spp; + float stime, ttime, ltime, dratio, n_frames; + int max_frames; + EST_Item *u; + EST_Track ntarget_pm; + + ntarget_pm = target_pm; + if (target_pm.num_frames() > source_pm.num_frames()) + max_frames = target_pm.num_frames()+100; + else + max_frames = source_pm.num_frames()+100; + + ntarget_pm.resize(max_frames,target_pm.num_channels()); + map.resize(max_frames); + +// printf("source_lab relations is %s\n",(const char *)source_lab.name()); + + if (target_pm.t(target_pm.num_frames() - 1) < + source_lab.tail()->F("end",0)) + { + EST_warning("Target pitchmarks end before end of target segment " + "timings (%f vs %f). Expect a truncated utterance\n", + target_pm.t(target_pm.num_frames() - 1), + source_lab.tail()->F("end",0.0)); + } + + n_i = 0; + s_i = 0; + ltime = 0; + for (u = source_lab.head(); u; u = u->next()) + { + u_frames = u->I("num_frames"); +// stime = source_pm.t(s_i+u_frames-1) - source_pm.t(s_i); + stime = u->F("unit_duration"); + ttime = ffeature(u,"segment_duration"); + if (streq("+",(EST_String)ffeature(u,"ph_vc"))) + dratio = stime / (stime + ((ttime-stime)*dur_impose_factor)); + else + dratio = 1; + n_frames = (float)u_frames / dratio; +/* printf("unit %s dratio %f %d %f time %f, stime %f\n", + (const char *)u->name(),dratio,u_frames,n_frames, + ttime,stime); */ + + for (u_i = 0; u_i < n_frames; u_i++,n_i++) + { + spp = (int)((float)u_i*dratio); + + if (s_i + spp == 0) + ntarget_pm.t(n_i) = ltime; + else + ntarget_pm.t(n_i) = ltime + source_pm.t(s_i+spp)- + source_pm.t(s_i+spp-1); + map[n_i] = s_i+spp; + ltime = ntarget_pm.t(n_i); + if (n_i+1 == ntarget_pm.num_frames()) + break; + } + s_i += u_frames; + + } + + ntarget_pm.resize(n_i,ntarget_pm.num_channels()); + +/* printf("target_pm.end() = %f ntarget_pm.end() = %f\n", + target_pm.end(), ntarget_pm.end()); */ + target_pm = ntarget_pm; +/* printf("target_pm.end() = %f ntarget_pm.end() = %f\n", + target_pm.end(), ntarget_pm.end()); */ + if (n_i == 0) + map.resize(0); // nothing to synthesize + else + map.resize(n_i - 1); +} + +void cl_mapping(EST_Utterance &utt, LISP params) +{ + EST_Relation *target_lab; + EST_IVector *map; + EST_Track *source_coef=0, *target_coef=0; + float dur_impose_factor, f0_impose_factor; + + source_coef = track(utt.relation("SourceCoef")->head()->f("coefs")); + target_coef = track(utt.relation("TargetCoef")->head()->f("coefs")); + target_lab = utt.relation("Segment"); + + map = new EST_IVector; + + dur_impose_factor = get_param_float("dur_impose_factor",params,0.0); + f0_impose_factor = get_param_float("f0_impose_factor",params,0.0); + + make_segment_varied_mapping(*target_lab, *source_coef, + *target_coef, *map, + dur_impose_factor, + f0_impose_factor); + + utt.create_relation("US_map"); + EST_Item *item = utt.relation("US_map")->append(); + item->set_val("map", est_val(map)); + +} + +LISP l_cl_mapping(LISP utt, LISP params) +{ + EST_Utterance *u = get_c_utt(utt); + + cl_mapping(*u,params); + + return utt; +} + + + diff --git a/src/modules/clunits/clunits.cc b/src/modules/clunits/clunits.cc new file mode 100644 index 0000000..c76131f --- /dev/null +++ b/src/modules/clunits/clunits.cc @@ -0,0 +1,816 @@ +/*************************************************************************/ +/* */ +/* Carnegie Mellon University and */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1998-2001 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH, CARNEGIE MELLON UNIVERSITY AND THE */ +/* CONTRIBUTORS TO THIS WORK DISCLAIM ALL WARRANTIES WITH REGARD TO */ +/* THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY */ +/* AND FITNESS, IN NO EVENT SHALL THE UNIVERSITY OF EDINBURGH, CARNEGIE */ +/* MELLON UNIVERSITY NOR THE CONTRIBUTORS BE LIABLE FOR ANY SPECIAL, */ +/* INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER */ +/* RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION */ +/* OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF */ +/* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : April 1998 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Yet another unit selection method. */ +/* */ +/* Using an acoustic measure find the distance between all units in the */ +/* db. Try to minimise the mean difference between units in a cluster */ +/* using CART technology, based on features like phonetic and prosodic */ +/* context. This gives a bunch of CARTs for each unit type in the db */ +/* which are acoustically close. Use these as candidates and optimise */ +/* a path through them minimising join using a viterbi search. */ +/* */ +/* Advantages: */ +/* requires little or no measurements at selection time */ +/* allows for clear method of pruning */ +/* no weights need to be generated (well, except where they do) */ +/* will optimise appropriately with varying numbers of example units */ +/* */ +/* Disadvantages: */ +/* Units can't cross between clusters */ +/* */ +/* Implementation of Black, A. and Taylor, P. (1997). Automatically */ +/* clustering similar units for unit selection in speech synthesis */ +/* Proceedings of Eurospeech 97, vol2 pp 601-604, Rhodes, Greece. */ +/* */ +/* postscript: http://www.cs.cmu.edu/~awb/papers/ES97units.ps */ +/* http://www.cs.cmu.edu/~awb/papers/ES97units/ES97units.html */ +/* */ +/* Comments: */ +/* */ +/* This is a new implementation using the newer unit selection/signal */ +/* processing archtecture in festival */ +/* */ +/* This is still in development but become more stable. It is robust */ +/* for many cases, though a lot depends on the db and parameters */ +/* you use */ +/* */ +/* This had significant new work (and bug fixes) done on it when awb */ +/* moved to CMU */ +/* */ +/*=======================================================================*/ +#include +#include "EST_math.h" +#include "festival.h" +#include "clunits.h" + +static EST_String static_unit_prev_move = "unit_prev_move"; +static EST_String static_unit_this_move = "unit_this_move"; +static EST_String static_jscore = "local_join_cost"; +static EST_String static_tscore = "local_target_cost"; +static EST_String static_cscore = "cummulative_unit_score"; + +static void setup_clunits_params(); +static EST_VTCandidate *TS_candlist(EST_Item *s,EST_Features &f); +static EST_VTPath *TS_npath(EST_VTPath *p,EST_VTCandidate *c,EST_Features &f); +static float naive_join_cost(CLunit *unit0, CLunit *unit1, + EST_Item *s, + float &u0_move, + float &u1_move); +static float optimal_couple(CLunit *u0, + CLunit *u1, + float &u0_move, + float &u1_move, + int type, + float different_prev_pen, + float non_consecutive_pen); +static void cl_parse_diphone_times(EST_Relation &diphone_stream, + EST_Relation &source_lab); + +VAL_REGISTER_CLASS_NODEL(vtcand,EST_VTCandidate); +VAL_REGISTER_CLASS_NODEL(clunit,CLunit); + +LISP selection_trees = NIL; +LISP clunits_params = NIL; +static int optimal_coupling = 0; +static int extend_selections = 0; +static int clunits_debug = 0; +static int clunits_log_scores = 0; +static int clunits_smooth_frames = 0; +float continuity_weight = 1; +float f0_join_weight = 0.0; +float different_prev_pen = 1000.0; +float non_consecutive_pen = 100.0; +static EST_String clunit_name_feat = "name"; + +static CLDB *cldb; + +static LISP clunits_select(LISP utt) +{ + // Select units from db using CARTs to index into clustered unit groups + EST_Utterance *u = get_c_utt(utt); + EST_Item *s, *f; + + cldb = check_cldb(); // make sure there is one loaded + setup_clunits_params(); + + f = u->relation("Segment")->head(); + for (s=f; s; s=s->next()) + s->set_val("clunit_name",ffeature(s,clunit_name_feat)); + + if (f) + { + EST_Viterbi_Decoder v(TS_candlist,TS_npath,-1); + v.set_big_is_good(FALSE); // big is bad + + v.initialise(u->relation("Segment")); + v.search(); + if (!v.result("unit_id")) + { + cerr << "CLUNIT: failed to find path\n"; + return utt; + } + v.copy_feature(static_unit_this_move); + v.copy_feature(static_unit_prev_move); + v.copy_feature(static_jscore); + v.copy_feature(static_tscore); + v.copy_feature(static_cscore); + } + + return utt; +} + +static LISP clunits_get_units(LISP utt) +{ + // Create unit stream and loading params + EST_Utterance *u = get_c_utt(utt); + EST_Relation *units,*ss; + EST_Item *s; + + cldb = check_cldb(); // make sure there is one loaded + + units = u->create_relation("Unit"); + for (s=u->relation("Segment")->head(); s != 0; s=s->next()) + { + EST_Item *unit = units->append(); + CLunit *db_unit = clunit(s->f("unit_id")); + float st,e; + unit->set_name(db_unit->name); + unit->set("fileid",db_unit->fileid); + // These should be modified from the optimal coupling + if ((s->prev()) && (s->f_present("unit_this_move"))) + st = s->F("unit_this_move"); + else + st = db_unit->start; + if (s->next() && (s->next()->f_present("unit_prev_move"))) + e = s->next()->F("unit_prev_move"); + else + e = db_unit->end; + if ((e-st) < 0.011) + e = st + 0.011; + unit->set("start",st); + unit->set("middle",db_unit->start); + unit->set("end",e); + unit->set("unit_start",st); + unit->set("unit_middle",db_unit->start); + unit->set("unit_end",e); + unit->set("seg_start",db_unit->start); + unit->set("seg_end",db_unit->end); + cldb->load_coefs_sig(unit); + if (clunits_debug) + printf("unit: %s fileid %s start %f end %f\n", + (const char *)db_unit->name, + (const char *)db_unit->fileid, + st,e); + } + + // Make it look as much like the diphones as possible for + // the rest of the code + ss = u->create_relation("SourceSegments"); + for (s = u->relation("Segment")->head(); s != 0 ; s = s->next()) + { + EST_Item *d = ss->append(); + d->set_name(ffeature(s,"clunit_name")); + } + + cl_parse_diphone_times(*units,*ss); + + return utt; +} + +static void cl_parse_diphone_times(EST_Relation &diphone_stream, + EST_Relation &source_lab) +{ + EST_Item *s, *u; + EST_Track *pm; + int e_frame, m_frame = 0; + float dur_1 = 0.0, dur_2 = 0.0, p_time; + float t_time = 0.0, end; + p_time = 0.0; + + for (s = source_lab.head(), u = diphone_stream.head(); u; u = u->next(), + s = s->next()) + { + pm = track(u->f("coefs")); + if (pm == 0) + { + cerr << "CLUNIT: couldn't get pitchmarks for " << u->name() << endl; + festival_error(); + } + + e_frame = pm->num_frames() - 1; + m_frame = u->I("middle_frame"); + + dur_1 = pm->t(m_frame); + dur_2 = pm->t(e_frame) - dur_1; + + s->set("end", (dur_1 + p_time)); + p_time = s->F("end") + dur_2; + + end = dur_1 + dur_2 + t_time; + t_time = end; + u->set("end", t_time); + } + if (s) + s->set("end", (dur_2 + p_time)); +} + +static LISP clunits_simple_wave(LISP utt) +{ + // Naive joining of waveforms + EST_Utterance *u = get_c_utt(utt); + EST_Wave *w = new EST_Wave; + EST_Wave *w1 = 0; + EST_Item *witem = 0; + EST_Item *s; + int size,i,k,c; + + for (size=0,s=u->relation("Unit")->head(); s != 0; s = s->next()) + size += wave(s->f("sig"))->num_samples(); + + if (u->relation("Unit")->head()) + { // This will copy the necessary wave features across + s = u->relation("Unit")->head(); + *w = *(wave(s->f("sig"))); + } + i = w->num_samples(); + w->resize(size); // its maximum size + for (s=u->relation("Unit")->head()->next(); s; s=s->next()) + { + w1 = wave(s->f("sig")); + // Find last zero crossing + for (c=0; ((i > 0) && (c < 40)); c++,i--) + if (((w->a_no_check(i) < 0) && (w->a_no_check(i-1) >= 0)) || + ((w->a_no_check(i) >= 0) && (w->a_no_check(i-1) < 0))) + break; + if (c == 40) i += 40; + // Find next zero crossing + for (c=0,k=1; ((k < w1->num_samples()) && (c < 40)); k++,i++) + if (((w1->a_no_check(k) < 0) && (w1->a_no_check(k-1) >= 0)) || + ((w1->a_no_check(k) >= 0) && (w1->a_no_check(k-1) < 0))) + break; + if (c == 40) k -= 40; + for (; k < w1->num_samples(); k++,i++) + w->a_no_check(i) = w1->a_no_check(k); + } + w->resize(i); + + witem = u->create_relation("Wave")->append(); + witem->set_val("wave",est_val(w)); + + return utt; +} + +static LISP clunits_windowed_wave(LISP utt) +{ + // windowed join, no prosodic modification + EST_Utterance *u = get_c_utt(utt); + EST_Wave *w = new EST_Wave; + EST_Wave *w1 = 0; + EST_Track *t1 = 0; + EST_Item *witem = 0; + EST_Item *s; + int size,i,k,wi,samp_idx, l_samp_idx; + int width, lwidth; + EST_Wave *www=0; + + for (size=0,s=u->relation("Unit")->head(); s != 0; s = s->next()) + size += wave(s->f("sig"))->num_samples(); + + if (u->relation("Unit")->head()) + { // This will copy the necessary wave features across + s = u->relation("Unit")->head(); + www = wave(s->f("sig")); + *w = *www; + } + w->resize(size); // its maximum size + wi=0; + lwidth = width = 0; + for (s=u->relation("Unit")->head(); s; s=s->next()) + { + w1 = wave(s->f("sig")); + t1 = track(s->f("coefs")); + + l_samp_idx = 0; + for (i=0; i < t1->num_frames()-1; i++) + { + samp_idx = (int)(t1->t(i)*w->sample_rate()); + width = samp_idx - l_samp_idx; + if (clunits_smooth_frames && (i==0) && (lwidth != 0)) + width = (width+lwidth)/2; // not sure if this is worth it + wi += width; + for (k=-width; ((knum_samples())) ;k++) + w->a(wi+k) += + (int)(0.5*(1+cos((PI/(double)(width))*(double)k))* + w1->a(samp_idx+k)); + l_samp_idx = samp_idx; + } + lwidth = width; + } + w->resize(wi); + + witem = u->create_relation("Wave")->append(); + witem->set_val("wave",est_val(w)); + + return utt; +} + +static LISP clunits_smoothedjoin_wave(LISP utt) +{ + // Actually not very smoothed yet, just joined + EST_Utterance *u = get_c_utt(utt); + EST_Wave *w = new EST_Wave; + EST_Wave *w1 = 0; + EST_Track *t1 = 0; + EST_Item *witem = 0; + EST_Item *s; + int size,i,wi; + int samp_end, samp_start; + EST_Wave *www=0; + + for (size=0,s=u->relation("Unit")->head(); s != 0; s = s->next()) + { + samp_end = s->I("samp_end"); + samp_start = s->I("samp_start"); + size += samp_end-samp_start; + } + + if (u->relation("Unit")->head()) + { // This will copy the necessary wave features across + s = u->relation("Unit")->head(); + www = wave(s->f("sig")); + *w = *www; + } + w->resize(size); // its maximum size + wi=0; + for (s=u->relation("Unit")->head(); s; s=s->next()) + { + samp_end = s->I("samp_end"); + samp_start = s->I("samp_start"); + w1 = wave(s->f("sig")); +/* printf("%s %s %f %f %d %d\n", + (const char *)s->S("name"), + (const char *)s->S("fileid"), + (float)samp_start/(float)w->sample_rate(), + (float)samp_end/(float)w->sample_rate(), + w1->num_samples(), + samp_end); */ + t1 = track(s->f("coefs")); + for (i=samp_start; ia_no_check(wi) = w1->a_no_check(i); +/* printf("%d %f\n",wi,(float)wi/(float)w->sample_rate()); */ + } + w->resize(wi); + + witem = u->create_relation("Wave")->append(); + witem->set_val("wave",est_val(w)); + + return utt; +} + +static void setup_clunits_params() +{ + // Set up params + clunits_params = siod_get_lval("clunits_params", + "CLUNITS: no parameters set for module"); + optimal_coupling = get_param_int("optimal_coupling",clunits_params,0); + different_prev_pen = get_param_float("different_prev_pen",clunits_params,1000.0); + non_consecutive_pen = get_param_float("non_consectutive_pen",clunits_params,100.0); + extend_selections = get_param_int("extend_selections",clunits_params,0); + continuity_weight = get_param_float("continuity_weight",clunits_params,1); + f0_join_weight = get_param_float("f0_join_weight",clunits_params,0.0); + clunits_debug = get_param_int("clunits_debug",clunits_params,0); + clunits_log_scores = get_param_int("log_scores",clunits_params,0); + clunits_smooth_frames = get_param_int("smooth_frames",clunits_params,0); + clunit_name_feat = get_param_str("clunit_name_feat",clunits_params,"name"); + selection_trees = + siod_get_lval("clunits_selection_trees", + "CLUNITS: clunits_selection_trees unbound"); +} + +static EST_VTCandidate *TS_candlist(EST_Item *s,EST_Features &f) +{ + // Return a list of candidate units for target s + // Use the appropriate CART to select a small group of candidates + EST_VTCandidate *all_cands = 0; + EST_VTCandidate *c, *gt; + LISP tree,group,l,pd,cc,ls; + EST_String name; + EST_String lookingfor; + CLunit *u; + int bbb,ccc; + float cluster_mean; + (void)f; + bbb=ccc=0; + + lookingfor = s->S("clunit_name"); + ls = siod(s); + + cc = siod_get_lval("clunits_cand_hooks",NULL); + if (cc) + pd = apply_hooks(siod_get_lval("clunits_cand_hooks",NULL), + ls); + else + { + tree = car(cdr(siod_assoc_str(lookingfor,selection_trees))); + pd = wagon_pd(s,tree); + } + if (pd == NIL) + { + cerr << "CLUNITS: no predicted class for " << + s->S("clunit_name") << endl; + festival_error(); + } + group = car(pd); + cluster_mean = get_c_float(car(cdr(pd))); + + for (bbb=0,l=group; l != NIL; l=cdr(l),bbb++) + { + c = new EST_VTCandidate; + name = s->S("clunit_name")+"_"+get_c_string(car(car(l))); + u = cldb->get_unit(name); + if (u == 0) + { + cerr << "CLUNITS: failed to find unit " << name << + " in index" << endl; + festival_error(); + } + cldb->load_join_coefs(u); + c->name = est_val(u); + c->s = s; + // Mean distance from others in cluster (could be precalculated) + c->score = get_c_float(car(cdr(car(l))))-cluster_mean; + c->score *= c->score; + // Maybe this should be divided by overall mean of set + // to normalise this figure (?) + + c->next = all_cands; + all_cands = c; + } + + if (extend_selections) + { + // An experiment, for all candidates of the previous + // item whose following is of this phone type, include + // them as a candidate + EST_Item *ppp = s->prev(); + if (ppp) + { + EST_VTCandidate *lc = vtcand(ppp->f("unit_cands")); + for (ccc=0 ; lc && (ccc < extend_selections); lc = lc->next) + { + CLunit *unit = clunit(lc->name); + CLunit *next_unit; + + if (unit->next_unit) + next_unit = unit->next_unit; + else + continue; + EST_String ss; + ss = next_unit->name.before("_"); + if (ss.matches(".*_.*_.*")) + { + ss += "_"; + ss += next_unit->name.after("_").before("_"); + } +/* printf("%s %s\n",(const char *)ss, (const char *)lookingfor); */ + for (gt=all_cands; gt; gt=gt->next) + if (clunit(gt->name)->name == next_unit->name) + break; /* got this one already */ + if ((ss == lookingfor) && (gt == 0)) + { // its the right type so add it + c = new EST_VTCandidate; + c->name = est_val(next_unit); + cldb->load_join_coefs(next_unit); + c->s = s; + c->score = 0; + c->next = all_cands; + all_cands = c; + bbb++; + ccc++; + } + } + } + + s->set_val("unit_cands",est_val(all_cands)); + } + if (clunits_debug) + printf("cands %d (extends %d) %s\n",bbb,ccc,(const char *)lookingfor); + return all_cands; +} + +static EST_VTPath *TS_npath(EST_VTPath *p,EST_VTCandidate *c,EST_Features &f) +{ + // Combine candidate c with previous path updating score + // with join cost + float cost; + EST_VTPath *np = new EST_VTPath; + CLunit *u0, *u1; + float u0_move=0.0, u1_move=0.0; + (void)f; + + np->c = c; + np->from = p; + if ((p == 0) || (p->c == 0)) + cost = 0; // nothing previous to join to + else + { + u0 = clunit(p->c->name); + u1 = clunit(c->name); +// printf("u0 %s u1 %s\n", +// (const char *)u0->name, +// (const char *)u1->name); + if (optimal_coupling) + cost = optimal_couple(u0,u1,u0_move,u1_move, + optimal_coupling, + different_prev_pen, + non_consecutive_pen); + else // naive measure + cost = naive_join_cost(u0,u1,c->s,u0_move,u1_move); + // When optimal_coupling == 2 the moves will be 0, just the scores + // are relevant + if (optimal_coupling == 1) + { + np->f.set(static_unit_prev_move,u0_move); // new (prev) end + np->f.set(static_unit_this_move,u1_move); // new start + } + } +// printf("cost %f continuity_weight %f\n", cost, continuity_weight); + cost *= continuity_weight; + np->state = c->pos; // "state" is candidate number + if (clunits_log_scores && (cost != 0)) + cost = log(cost); + + np->f.set(static_jscore,cost); + np->f.set(static_tscore,c->score); + if (p==0) + np->score = (c->score+cost); + else + np->score = (c->score+cost) + p->score; + np->f.set(static_cscore,np->score); + + if (clunits_debug > 1) + printf("joining cost %f\n",np->score); + return np; +} + +static float optimal_couple(CLunit *u0, + CLunit *u1, + float &u0_move, + float &u1_move, + int type, + float different_prev_pen, + float non_consecutive_pen + ) +{ + // Find combination cost of u0 to u1, checking for best + // frame up to n frames back in u0 and u1. + // Note this checks the u0 with u1's predecessor, which may or may not + // be of the same type + // There is some optimisation here in unit coeff access + EST_Track *u0_cep, *u1_p_cep; + float dist, best_val; + int i,eee; + int u0_st, u0_end; + int u1_p_st, u1_p_end; + int best_u0, best_u1; + CLunit *u1_p; + float f; + + u1_p = u1->prev_unit; + + u0_move = u0->end; + if (u1_p == 0) + u1_move = 0; + else + u1_move = u1_p->end; + + if (u1_p == u0) // they are consecutive + return 0.0; + if (u1_p == 0) // hacky condition, when there is no previous we'll + return 0.0; // assume a good join (should be silence there) + + if (u1_p->join_coeffs == 0) + cldb->load_join_coefs(u1_p); + // Get indexes into full cep for utterances rather than sub ceps + u0_cep = u0->join_coeffs; + u1_p_cep = u1_p->join_coeffs; + + u0_end = u0_cep->num_frames(); + u1_p_end = u1_p_cep->num_frames(); + + if (!streq(u1_p->base_name,u0->base_name)) + { /* prev(u1) is a different phone from u0 so don't slide */ + f = different_prev_pen; + u0_st = u0_cep->num_frames()-1; + u1_p_st = u1_p_cep->num_frames()-1; + } + else if (type == 2) + { /* we'll only check the edge for the join */ + u0_st = u0_cep->num_frames()-1; + u1_p_st = u1_p_cep->num_frames()-1; + f = 1; + } + else + { + u0_st = (int)(u0_cep->num_frames() * 0.33); + u1_p_st = (int)(u1_p_cep->num_frames() * 0.33); + f = 1; + } + + best_u0=u0_end; + best_u1=u1_p_end; + best_val = HUGE_VAL; + + // Here we look for the best join without sliding the windows + if ((u0_end-u0_st) < (u1_p_end-u1_p_st)) + eee = u0_end-u0_st; + else + eee = u1_p_end-u1_p_st; + for (i=0; i < eee; i++) + { + dist = frame_distance(*u0_cep,i+u0_st, + *u1_p_cep,i+u1_p_st, + cldb->cweights, + f0_join_weight); + if (dist < best_val) + { + best_val = dist; + best_u0 = i+u0_st; + best_u1 = i+u1_p_st; + } + } +#if 0 + // This tries *all* possible matches in the pair, its slow + // and has a tendency to shorten things more than you'd like + // so we just use the more simple test above. + int j; + for (i=u0_st; i < u0_end; i++) + { + for (j=u1_p_st; j < u1_p_end; j++) + { + dist = frame_distance(*u0_cep,i, + *u1_p_cep,j, + cldb->cweights); + if (dist < best_val) + { + best_val = dist; + best_u0 = i; + best_u1 = j; + } + } + } +#endif + + if (type == 1) + { + u0_move = u0_cep->t(best_u0); + u1_move = u1_p_cep->t(best_u1); + } + + return non_consecutive_pen+(best_val*f); +} + +static float naive_join_cost(CLunit *unit0, CLunit *unit1, + EST_Item *s, + float &u0_move, + float &u1_move) +{ + // A naive join cost, because I haven't ported the info yet + + u0_move = unit0->end; + u1_move = unit1->start; + + if (unit0 == unit1) + return 0; + else if (unit1->prev_unit->name == unit0->name) + return 0; + else if (ph_is_silence(s->name())) + return 0; + else if (ph_is_stop(s->name())) + return 0.2; + else if (ph_is_fricative(s->name())) + return 0.3; + else + return 1.0; +} + +static LISP cldb_load_all_coeffs(LISP filelist) +{ + LISP f; + + cldb = check_cldb(); + for (f=filelist; f; f=cdr(f)) + { + cldb->get_file_coefs_sig(get_c_string(car(f))); + cldb->get_file_join_coefs(get_c_string(car(f))); + } + + return NIL; +} + +void festival_clunits_init(void) +{ + // Initialization for clunits selection + + proclaim_module("clunits", + "Copyright (C) University of Edinburgh and CMU 1997-2010\n"); + + gc_protect(&clunits_params); + gc_protect(&selection_trees); + + festival_def_utt_module("Clunits_Select",clunits_select, + "(Clunits_Select UTT)\n\ + Select units from current databases using cluster selection method."); + + festival_def_utt_module("Clunits_Get_Units",clunits_get_units, + "(Clunits_Get_Units UTT)\n\ + Construct Unit relation from the selected units in Segment and extract\n\ + their parameters from the clunit db."); + + festival_def_utt_module("Clunits_Simple_Wave",clunits_simple_wave, + "(Clunits_Simple_Wave UTT)\n\ + Naively concatenate signals together into a single wave (for debugging)."); + + festival_def_utt_module("Clunits_Windowed_Wave",clunits_windowed_wave, + "(Clunits_Windowed_Wave UTT)\n\ + Use hamming window over edges of units to join them, no prosodic \n\ + modification though."); + + festival_def_utt_module("Clunits_SmoothedJoin_Wave",clunits_smoothedjoin_wave, + "(Clunits_SmoothedJoin_Wave UTT)\n\ + smoothed join."); + + init_subr_1("clunits:load_db",cl_load_db, + "(clunits:load_db PARAMS)\n\ + Load index file for cluster database and set up params, and select it."); + + init_subr_1("clunits:select",cldb_select, + "(clunits:select NAME)\n\ + Select a previously loaded cluster database."); + + init_subr_1("clunits:load_all_coefs",cldb_load_all_coeffs, + "(clunits:load_all_coefs FILEIDLIST)\n\ + Load in coefficients, signal and join coefficients for each named\n\ + fileid. This is can be called at startup to to reduce the load time\n\ + during synthesis (though may make the image large)."); + + init_subr_0("clunits:list",cldb_list, + "(clunits:list)\n\ + List names of currently loaded cluster databases."); + + init_subr_2("acost:build_disttabs",make_unit_distance_tables, + "(acost:build_disttabs UTTTYPES PARAMS)\n\ + Built matrices of distances between each ling_item in each each list\n\ + of ling_items in uttypes. Uses acoustic weights in PARAMS and save\n\ + the result as a matrix for later use."); + + init_subr_2("acost:utt.load_coeffs",acost_utt_load_coeffs, + "(acost:utt.load_coeffs UTT PARAMS)\n\ + Load in the acoustic coefficients into UTT and set the Acoustic_Coeffs\n\ + feature for each segment in UTT."); + + init_subr_3("acost:file_difference",ac_distance_tracks, + "(acost:file_difference FILENAME1 FILENAME2 PARAMS)\n\ + Load in the two named tracks and find the acoustic difference over all\n\ + based on the weights in PARAMS."); + + init_subr_2("cl_mapping", l_cl_mapping, + "(cl_mapping UTT PARAMS)\n\ + Impose prosody upto some percentage, and not absolutely."); + +} diff --git a/src/modules/clunits/clunits.h b/src/modules/clunits/clunits.h new file mode 100644 index 0000000..525d33f --- /dev/null +++ b/src/modules/clunits/clunits.h @@ -0,0 +1,111 @@ +/*************************************************************************/ +/* */ +/* Carnegie Mellon University and */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1998-2000 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH, CARNEGIE MELLON UNIVERSITY AND THE */ +/* CONTRIBUTORS TO THIS WORK DISCLAIM ALL WARRANTIES WITH REGARD TO */ +/* THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY */ +/* AND FITNESS, IN NO EVENT SHALL THE UNIVERSITY OF EDINBURGH, CARNEGIE */ +/* MELLON UNIVERSITY NOR THE CONTRIBUTORS BE LIABLE FOR ANY SPECIAL, */ +/* INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER */ +/* RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION */ +/* OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF */ +/* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ +/* */ +/*************************************************************************/ + +#ifndef __CLUNITS_H__ +#define __CLUNITS_H__ + +#include "EST_StringTrie.h" + +class CLunit { + public: + CLunit(); + ~CLunit(); + + EST_String fileid; + EST_String name; + EST_String base_name; + float start; + float mid; + float end; + class CLunit *prev_unit; + class CLunit *next_unit; + int samp_start; + int samp_end; + int middle_frame; + EST_Track *join_coeffs; + EST_Track *coefs; + EST_Wave *sig; +}; + +class CLfile { + public: + CLfile(); + ~CLfile(); + + EST_Track *join_coeffs; + EST_Track *coefs; + EST_Wave *sig; +}; + +class CLDB { + public: + CLDB(); + ~CLDB(); + + LISP params; + EST_StringTrie index; + EST_StringTrie fileindex; + EST_FVector cweights; + + CLunit *get_unit(const EST_String &name) + { return (CLunit *)index.lookup(name); } + CLfile *get_fileitem(const EST_String &name) + { return (CLfile *)fileindex.lookup(name); } + void load_coefs_sig(EST_Item *unit); + CLfile *get_file_coefs_sig(const EST_String &fileid); + void load_join_coefs(CLunit *unit); + CLfile *get_file_join_coefs(const EST_String &fileid); +}; + + +LISP cl_load_db(LISP params); +LISP acost_utt_load_coeffs(LISP utt, LISP params); +LISP make_unit_distance_tables(LISP unittypes, LISP params); +LISP ac_distance_tracks(LISP filename1, LISP filename2, LISP lweights); +void acost_dt_params(LISP params); +float ac_unit_distance(const EST_Track &unit1, + const EST_Track &unit2, + const EST_FVector wghts); +float frame_distance(const EST_Track &a, int ai, + const EST_Track &b, int bi, + const EST_FVector &wghts, + float f0_weight); +void cl_maybe_fix_pitch_c0(EST_Track *c); + +CLDB *check_cldb(); +LISP cldb_list(void); +LISP cldb_select(LISP dbname); +LISP l_cl_mapping(LISP utt, LISP method); + +#endif diff --git a/src/modules/clustergen/Makefile b/src/modules/clustergen/Makefile new file mode 100644 index 0000000..751beca --- /dev/null +++ b/src/modules/clustergen/Makefile @@ -0,0 +1,67 @@ +########################################################################### +## ## +## --------------------------------------------------------------- ## +## The HMM-Based Speech Synthesis System (HTS): version 1.1b ## +## HTS Working Group ## +## ## +## Department of Computer Science ## +## Nagoya Institute of Technology ## +## and ## +## Interdisciplinary Graduate School of Science and Engineering ## +## Tokyo Institute of Technology ## +## Copyright (c) 2001-2003 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and ## +## distribute this software and its documentation without ## +## restriction, including without limitation the rights to use, ## +## copy, modify, merge, publish, distribute, sublicense, and/or ## +## sell copies of this work, and to permit persons to whom this ## +## work is furnished to do so, subject to the following conditions: ## +## ## +## 1. The code must retain the above copyright notice, this list ## +## of conditions and the following disclaimer. ## +## ## +## 2. Any modifications must be clearly marked as such. ## +## ## +## NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSITITUTE OF TECHNOLOGY, ## +## HTS WORKING GROUP, AND THE CONTRIBUTORS TO THIS WORK DISCLAIM ## +## ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL ## +## IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSITITUTE OF ## +## TECHNOLOGY, HTS WORKING GROUP, NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY ## +## DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, ## +## WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTUOUS ## +## ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR ## +## PERFORMANCE OF THIS SOFTWARE. ## +## ## +########################################################################### +## This directory contains some low level support for statitsical ## +## parametric speech synthesis, mostly take from Nagoya Institutes of ## +## Technologies HTS Engine (though often with significant modification ## +## ## +## These are specifically designed to support the Clustergen synthesis ## +## technique. ## +## ## +## Alan W Black (awb@cs.cmu.edu) ## +########################################################################### +TOP=../../.. +DIRNAME=src/modules/clustergen + +H = mlsa_resynthesis.h vc.h simple_mlpg.h +CPPSRCS = clustergen.cc mlsa_resynthesis.cc vc.cc simple_mlpg.cc me_mlsa.cc +SRCS = $(CPPSRCS) +OBJS = $(CPPSRCS:.cc=.o) + +FILES=Makefile $(SRCS) $(H) + +LOCAL_INCLUDES = -I../include -DFESTIVAL + +INLIB = $(TOP)/src/lib/libFestival.a + +ALL = .buildlib + +include $(TOP)/config/common_make_rules + + diff --git a/src/modules/clustergen/clustergen.cc b/src/modules/clustergen/clustergen.cc new file mode 100644 index 0000000..e4e50ff --- /dev/null +++ b/src/modules/clustergen/clustergen.cc @@ -0,0 +1,68 @@ +/*************************************************************************/ +/* */ +/* Language Technologies Institute */ +/* Carnegie Mellon University */ +/* Copyright (c) 2005-2010 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* CARNEGIE MELLON UNIVERSITY AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL CARNEGIE MELLON UNIVERSITY NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* */ +/* Most of the key functions in this directory are derived from HTS */ +/* and are only provided with Scheme wraparounds for Clustergen */ +/* */ +/*************************************************************************/ + +/* Standard C Libraries */ +#include +#include +#include +#include +#include "festival.h" + +LISP mlsa_resynthesis(LISP ltrack, LISP strtrack); +LISP mlpg(LISP ltrack); +LISP me_mlsa_resynthesis(LISP ltrack, LISP strack); + +void festival_clustergen_init(void) +{ + proclaim_module("clustergen_engine", + "Copyright (C) CMU 2005-2010\n"); + + init_subr_2("mlsa_resynthesis",mlsa_resynthesis, + "(mlsa_resynthesis TRACK STRTRACK)\n\ + Return a WAVE synthesized from the F0/MCEP TRACK, STRTRACK is non-nil, use mixed excitation."); + + init_subr_1("mlpg",mlpg, + "(mlpg TRACK)\n\ + Return a track suitable for mlsa from a TRACK with dynamics in it."); + + init_subr_2("me_mlsa",me_mlsa_resynthesis, + "(me_mlsa TRACK STRTRACK)\n\ + Return a WAVE resynthesized using Mixed Excitation MLSA."); + +} + diff --git a/src/modules/clustergen/me_mlsa.cc b/src/modules/clustergen/me_mlsa.cc new file mode 100644 index 0000000..5139722 --- /dev/null +++ b/src/modules/clustergen/me_mlsa.cc @@ -0,0 +1,1355 @@ +/** +* The HMM-Based Speech Synthesis System (HTS) +* HTS Working Group +* +* Department of Computer Science +* Nagoya Institute of Technology +* and +* Interdisciplinary Graduate School of Science and Engineering +* Tokyo Institute of Technology +* +* Portions Copyright (c) 2001-2006 +* All Rights Reserved. +* +* Portions Copyright 2000-2007 DFKI GmbH. +* All Rights Reserved. +* +* Permission is hereby granted, free of charge, to use and +* distribute this software and its documentation without +* restriction, including without limitation the rights to use, +* copy, modify, merge, publish, distribute, sublicense, and/or +* sell copies of this work, and to permit persons to whom this +* work is furnished to do so, subject to the following conditions: +* +* 1. The source code must retain the above copyright notice, +* this list of conditions and the following disclaimer. +* +* 2. Any modifications to the source code must be clearly +* marked as such. +* +* 3. Redistributions in binary form must reproduce the above +* copyright notice, this list of conditions and the +* following disclaimer in the documentation and/or other +* materials provided with the distribution. Otherwise, one +* must contact the HTS working group. +* +* NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSTITUTE OF TECHNOLOGY, +* HTS WORKING GROUP, AND THE CONTRIBUTORS TO THIS WORK DISCLAIM +* ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL +* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT +* SHALL NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSTITUTE OF +* TECHNOLOGY, HTS WORKING GROUP, NOR THE CONTRIBUTORS BE LIABLE +* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY +* DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, +* WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTUOUS +* ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR +* PERFORMANCE OF THIS SOFTWARE. +* +* +* This software was translated to C for use within Festival to offer +* multi-excitation MLSA +* Alan W Black (awb@cs.cmu.edu) 3rd April 2009 +* +*/ + +#include +#include +#include +#include +#include +#include "festival.h" + +#include "mlsa_resynthesis.h" + +/** + * Synthesis of speech out of speech parameters. + * Mixed excitation MLSA vocoder. + * + * Java port and extension of HTS engine version 2.0 + * Extension: mixed excitation + * @author Marcela Charfuelan + * And ported to C by Alan W Black (awb@cs.cmu.edu) + */ + +#define boolean int +#define true 1 +#define false 0 + +typedef struct HTSData_struct { + + int rate; + int fperiod; + double rhos; + + int stage; + double alpha; + double beta; + boolean useLogGain; + double uf; + boolean algnst; /* use state level alignment for duration */ + boolean algnph; /* use phoneme level alignment for duration */ + boolean useMixExc; /* use Mixed Excitation */ + boolean useFourierMag; /* use Fourier magnitudes for pulse generation */ + boolean useGV; /* use global variance in parameter generation */ + boolean useGmmGV; /* use global variance as a Gaussian Mixture Model */ + boolean useUnitDurationContinuousFeature; /* for using external duration, so it will not be generated from HMMs*/ + boolean useUnitLogF0ContinuousFeature; /* for using external f0, so it will not be generated from HMMs*/ + + /** variables for controling generation of speech in the vocoder + * these variables have default values but can be fixed and read from the + * audio effects component. [Default][min--max] */ + double length; /* total number of frame for generated speech */ + /* length of generated speech (in seconds) [N/A][0.0--30.0] */ + double durationScale; /* less than 1.0 is faster and more than 1.0 is slower, min=0.1 max=3.0 */ + + boolean LogGain; + char *PdfStrFile, *PdfMagFile; + + int NumFilters, OrderFilters; + double **MixFilters; + double F0Std; + double F0Mean; + +} HTSData; + +#if 0 +typedef struct HTSData_struct { + + int rate = 16000; + int fperiod = 80; + double rhos = 0.0; + + int stage = 0; + double alpha = 0.42; + boolean useLogGain = false; + double uf = 0.5; + boolean algnst = false; /* use state level alignment for duration */ + boolean algnph = false; /* use phoneme level alignment for duration */ + boolean useMixExc = true; /* use Mixed Excitation */ + boolean useFourierMag = false; /* use Fourier magnitudes for pulse generation */ + boolean useGV = false; /* use global variance in parameter generation */ + boolean useGmmGV = false; /* use global variance as a Gaussian Mixture Model */ + boolean useUnitDurationContinuousFeature = false; /* for using external duration, so it will not be generated from HMMs*/ + boolean useUnitLogF0ContinuousFeature = false; /* for using external f0, so it will not be generated from HMMs*/ + + /** variables for controling generation of speech in the vocoder + * these variables have default values but can be fixed and read from the + * audio effects component. [Default][min--max] */ + double f0Std = 1.0; /* variable for f0 control, multiply f0 [1.0][0.0--5.0] */ + double f0Mean = 0.0; /* variable for f0 control, add f0 [0.0][0.0--100.0] */ + double length = 0.0; /* total number of frame for generated speech */ + /* length of generated speech (in seconds) [N/A][0.0--30.0] */ + double durationScale = 1.0; /* less than 1.0 is faster and more than 1.0 is slower, min=0.1 max=3.0 */ + +} HTSData; +#endif + +static int IPERIOD = 1; +static boolean GAUSS = true; +static int PADEORDER = 5; /* pade order for MLSA filter */ +static int IRLENG = 96; /* length of impulse response */ + +/* for MGLSA filter (mel-generalised log spectrum approximation filter) */ +static boolean NORMFLG1 = true; +static boolean NORMFLG2 = false; +static boolean MULGFLG1 = true; +static boolean MULGFLG2 = false; +static boolean NGAIN = false; + +static double ZERO = 1.0e-10; /* ~(0) */ +static double LZERO = (-1.0e+10); /* ~log(0) */ + +static int stage; /* Gamma=-1/stage : if stage=0 then Gamma=0 */ +static double xgamma; /* Gamma */ +static boolean use_log_gain; /* log gain flag (for LSP) */ +static int fprd; /* frame shift */ +static int iprd; /* interpolation period */ +static boolean gauss; /* flag to use Gaussian noise */ +static double p1; /* used in excitation generation */ +static double pc; /* used in excitation generation */ +static double *pade; /* used in mlsadf */ +static int ppade; /* offset for vector ppade */ + +static double *C; /* used in the MLSA/MGLSA filter */ +static double *CC; /* used in the MLSA/MGLSA filter */ +static double *CINC; /* used in the MLSA/MGLSA filter */ +static double *D1; /* used in the MLSA/MGLSA filter */ +static int CINC_length, CC_length, C_length, D1_length; + +static double rate; +static int pt1; /* used in mlsadf1 */ +static int pt2; /* used in mlsadf2 */ +static int *pt3; /* used in mlsadf2 */ + +/* mixed excitation variables */ +static int numM; /* Number of bandpass filters for mixed excitation */ +static int orderM; /* Order of filters for mixed excitation */ +static double **h; /* filters for mixed excitation */ +static double *xpulseSignal; /* the size of this should be orderM */ +static double *xnoiseSignal; /* the size of this should be orderM */ +static boolean mixedExcitation = false; +static boolean fourierMagnitudes = false; + +static boolean lpcVocoder = false; /* true if lpc vocoder is used, then the input should be lsp parameters */ + +void initVocoder(int mcep_order, int mcep_vsize, HTSData *htsData); +int htsMLSAVocoder(EST_Track *lf0Pst, + EST_Track *mcepPst, + EST_Track *strPst, + EST_Track *magPst, + int *voiced, + HTSData *htsData, + EST_Wave *wave); + + +LISP me_mlsa_resynthesis(LISP ltrack, LISP strack) +{ + /* Resynthesizes a wave from given track with mixed excitation*/ + EST_Track *t; + EST_Track *str_track; + EST_Wave *wave = 0; + EST_Track *mcep; + EST_Track *f0v; + EST_Track *str; + EST_Track *mag; + int *voiced; + int sr = 16000; + int i,j; + double shift; + HTSData htsData; + + htsData.alpha = 0.42; + htsData.beta = 0.0; + + if ((ltrack == NULL) || + (TYPEP(ltrack,tc_string) && + (streq(get_c_string(ltrack),"nil")))) + return siod(new EST_Wave(0,1,sr)); + + t = track(ltrack); + str_track = track(strack); + + f0v = new EST_Track(t->num_frames(),1); + mcep = new EST_Track(t->num_frames(),25); + str = new EST_Track(t->num_frames(),5); + mag = new EST_Track(t->num_frames(),10); + voiced = walloc(int,t->num_frames()); + + for (i=0; inum_frames(); i++) + { + f0v->a(i) = t->a(i,0); + if (f0v->a(i) > 0) + voiced[i] = 1; + else + voiced[i] = 0; + for (j=1; j<26; j++) + mcep->a(i,j-1) = t->a(i,j); + + for (j=0; j<5; j++) + { + str->a(i,j) = str_track->a(i,j); + } + /* printf("awb_debug str %d 0 %f 1 %f 2 %f 3 %f 4 %f\n", + i,str->a(i,0),str->a(i,1),str->a(i,2),str->a(i,3),str->a(i,4));*/ +#if 0 + for (j=57; j<66; j++) + mag->a(i,j-57) = t->a(i,j); +#endif + } + + if (t->num_frames() > 1) + shift = 1000.0*(t->t(1)-t->t(0)); + else + shift = 5.0; + + htsData.alpha = FLONM(siod_get_lval("mlsa_alpha_param", + "mlsa: mlsa_alpha_param not set")); + htsData.beta = FLONM(siod_get_lval("mlsa_beta_param", + "mlsa: mlsa_beta_param not set")); + htsData.stage = 0; + htsData.LogGain = false; + htsData.fperiod = 80; + htsData.rate = 16000; + htsData.rhos = 0.0; + + htsData.uf = 0.5; + htsData.algnst = false; /* use state level alignment for duration */ + htsData.algnph = false; /* use phoneme level alignment for duration */ + htsData.useMixExc = true; /* use Mixed Excitation */ + htsData.useFourierMag = false; /* use Fourier magnitudes for pulse generation */ + htsData.useGV = false; /* use global variance in parameter generation */ + htsData.useGmmGV = false; /* use global variance as a Gaussian Mixture Model */ + htsData.useUnitDurationContinuousFeature = false; /* for using external duration, so it will not be generated from HMMs*/ + htsData.useUnitLogF0ContinuousFeature = false; /* for using external f0, so it will not be generated from HMMs*/ + + /** variables for controling generation of speech in the vocoder + * these variables have default values but can be fixed and read from the + * audio effects component. [Default][min--max] */ + htsData.F0Std = 1.0; /* variable for f0 control, multiply f0 [1.0][0.0--5.0] */ + htsData.F0Mean = 0.0; /* variable for f0 control, add f0 [0.0][0.0--100.0] */ + htsData.length = 0.0; /* total number of frame for generated speech */ + /* length of generated speech (in seconds) [N/A][0.0--30.0] */ + htsData.durationScale = 1.0; /* less than 1.0 is faster and more than 1.0 is slower, min=0.1 max=3.0 */ + + LISP filters = siod_get_lval("me_mix_filters", + "mlsa: me_mix_filters not set"); + LISP f; + int fl; + htsData.NumFilters = 5; + for (fl=0,f=filters; f; fl++) + f=cdr(f); + htsData.OrderFilters = fl/htsData.NumFilters; + htsData.MixFilters = walloc(double *,htsData.NumFilters); + for (i=0; i < htsData.NumFilters; i++) + { + htsData.MixFilters[i] = walloc(double,htsData.OrderFilters); + for (j=0; jnum_frames() > 0) + /* mcep_order and number of deltas */ + htsMLSAVocoder(f0v,mcep,str,mag,voiced,&htsData,wave); + + delete f0v; + delete mcep; + delete str; + delete mag; + delete voiced; + + return siod(wave); +} + +/** The initialisation of VocoderSetup should be done when there is already + * information about the number of feature vectors to be processed, + * size of the mcep vector file, etc. */ +void initVocoder(int mcep_order, int mcep_vsize, HTSData *htsData) +{ + int vector_size; + double xrand; + + stage = htsData->stage; + if(stage != 0) + xgamma = -1.0 / stage; + else + xgamma = 0.0; + use_log_gain = htsData->LogGain; + + fprd = htsData->fperiod; + rate = htsData->rate; + iprd = IPERIOD; + gauss = GAUSS; + + /* XXX */ + xrand = rand(); + + if(stage == 0 ){ /* for MCP */ + + /* mcep_order=74 and pd=PADEORDER=5 (if no HTS_EMBEDDED is used) */ + vector_size = (mcep_vsize * ( 3 + PADEORDER) + 5 * PADEORDER + 6) - (3 * (mcep_order+1)); + CINC_length = CC_length = C_length = mcep_order+1; + D1_length = vector_size; + C = walloc(double,C_length); + CC = walloc(double,CC_length); + CINC = walloc(double,CINC_length); + D1 = walloc(double,D1_length); + + vector_size=21; + pade = walloc(double,vector_size); + /* ppade is a copy of pade in mlsadf() function : ppade = &( pade[pd*(pd+1)/2] ); */ + ppade = PADEORDER*(PADEORDER+1)/2; /* offset for vector pade */ + pade[0] = 1.0; + pade[1] = 1.0; + pade[2] = 0.0; + pade[3] = 1.0; + pade[4] = 0.0; + pade[5] = 0.0; + pade[6] = 1.0; + pade[7] = 0.0; + pade[8] = 0.0; + pade[9] = 0.0; + pade[10] = 1.0; + pade[11] = 0.4999273; + pade[12] = 0.1067005; + pade[13] = 0.01170221; + pade[14] = 0.0005656279; + pade[15] = 1.0; + pade[16] = 0.4999391; + pade[17] = 0.1107098; + pade[18] = 0.01369984; + pade[19] = 0.0009564853; + pade[20] = 0.00003041721; + + pt1 = PADEORDER+1; + pt2 = ( 2 * (PADEORDER+1)) + (PADEORDER * (mcep_order+2)); + pt3 = new int[PADEORDER+1]; + for(int i=PADEORDER; i>=1; i--) + pt3[i] = ( 2 * (PADEORDER+1)) + ((i-1)*(mcep_order+2)); + + } else { /* for LSP */ + vector_size = ((mcep_vsize+1) * (stage+3)) - ( 3 * (mcep_order+1)); + CINC_length = CC_length = C_length = mcep_order+1; + D1_length = vector_size; + C = walloc(double,C_length); + CC = walloc(double,CC_length); + CINC = walloc(double,CINC_length); + D1 = walloc(double,D1_length); + } + + /* excitation initialisation */ + p1 = -1; + pc = 0.0; + +} /* method initVocoder */ + + + +/** + * HTS_MLSA_Vocoder: Synthesis of speech out of mel-cepstral coefficients. + * This procedure uses the parameters generated in pdf2par stored in: + * PStream mceppst: Mel-cepstral coefficients + * PStream strpst : Filter bank stregths for mixed excitation + * PStream magpst : Fourier magnitudes ( OJO!! this is not used yet) + * PStream lf0pst : Log F0 + */ +#if 0 +AudioInputStream htsMLSAVocoder(HTSParameterGeneration pdf2par, HMMData htsData) +{ + float sampleRate = 16000.0F; //8000,11025,16000,22050,44100 + int sampleSizeInBits = 16; //8,16 + int channels = 1; //1,2 + boolean signed = true; //true,false + boolean bigEndian = false; //true,false + AudioFormat af = new AudioFormat( + sampleRate, + sampleSizeInBits, + channels, + signed, + bigEndian); + double [] audio_double = NULL; + + audio_double = htsMLSAVocoder(pdf2par.getlf0Pst(), pdf2par.getMcepPst(), pdf2par.getStrPst(), pdf2par.getMagPst(), + pdf2par.getVoicedArray(), htsData); + + long lengthInSamples = (audio_double.length * 2 ) / (sampleSizeInBits/8); + logger.info("length in samples=" + lengthInSamples ); + + /* Normalise the signal before return, this will normalise between 1 and -1 */ + double MaxSample = MathUtils.getAbsMax(audio_double); + for (int i=0; i1; i--){ + d[_pt3+i] = d[_pt3+i-1]; + } + + return(y); +} + +/** mlsdaf1: sub functions for MLSA filter */ +static double mlsadf1(double x, double *b, int m, double a, double aa, double *d) +{ + double v; + double out = 0.0; + int i; + //pt1 --> pt = &d1[pd+1] + + for(i=PADEORDER; i>=1; i--) { + d[i] = aa * d[pt1+i-1] + a * d[i]; + d[pt1+i] = d[i] * b[1]; + v = d[pt1+i] * pade[ppade+i]; + + //x += (1 & i) ? v : -v; + if(i == 1 || i == 3 || i == 5) + x += v; + else + x += -v; + out += v; + } + d[pt1+0] = x; + out += x; + + return(out); + +} + +/** mlsdaf2: sub functions for MLSA filter */ +static double mlsadf2(double x, double *b, int m, double a, double aa, double *d) +{ + double v; + double out = 0.0; + int i; + // pt2 --> pt = &d1[pd * (m+2)] + // pt3 --> pt = &d1[ 2*(pd+1) ] + + for(i=PADEORDER; i>=1; i--) { + d[pt2+i] = mlsafir(d[(pt2+i)-1], b, m, a, aa, d, pt3[i]); + v = d[pt2+i] * pade[ppade+i]; + + if(i == 1 || i == 3 || i == 5) + x += v; + else + x += -v; + out += v; + + } + d[pt2+0] = x; + out += x; + + return out; +} + +/** mlsadf: HTS Mel Log Spectrum Approximation filter */ +static double mlsadf(double x, double *b, int m, double a, double aa, double *d) +{ + + x = mlsadf1(x, b, m, a, aa, d); + x = mlsadf2(x, b, m-1, a, aa, d); + + return x; +} + + +/** uniform_rand: generate uniformly distributed random numbers 1 or -1 */ +static double uniformRand() +{ + double x; + + x = rand(); /* double uniformly distributed between 0.0 <= Math.random() < 1.0.*/ + if(x >= RAND_MAX/2.0) + return 1.0; + else + return -1.0; +} + +/** mc2b: transform mel-cepstrum to MLSA digital filter coefficients */ +static void mc2b(double *mc, double *b, int m, double a ) +{ + + b[m] = mc[m]; + for(m--; m>=0; m--) { + b[m] = mc[m] - a * b[m+1]; + } +} + +/** b2mc: transform MLSA digital filter coefficients to mel-cepstrum */ +static void b2mc(double *b, double *mc, int m, double a) +{ + double d, o; + int i; + d = mc[m] = b[m]; + for(i=m--; i>=0; i--) { + o = b[i] + (a * d); + d = b[i]; + mc[i] = o; + } +} + + +/** freqt: frequency transformation */ +//private void freqt(double c1[], int m1, int cepIndex, int m2, double a){ +static void freqt(double *c1, int m1, double *c2, int m2, double a) +{ + double *freqt_buff=NULL; /* used in freqt */ + int freqt_size=0; /* buffer size for freqt */ + int i, j; + double b = 1 - a * a; + int g; /* offset of freqt_buff */ + + if(m2 > freqt_size) { + freqt_buff = walloc(double,m2 + m2 + 2); + freqt_size = m2; + } + g = freqt_size +1; + + for(i = 0; i < m2+1; i++) + freqt_buff[g+i] = 0.0; + + for(i = -m1; i <= 0; i++){ + if(0 <= m2 ) + freqt_buff[g+0] = c1[-i] + a * (freqt_buff[0] = freqt_buff[g+0]); + if(1 <= m2) + freqt_buff[g+1] = b * freqt_buff[0] + a * (freqt_buff[1] = freqt_buff[g+1]); + + for(j=2; j<=m2; j++) + freqt_buff[g+j] = freqt_buff[j-1] + a * ( (freqt_buff[j] = freqt_buff[g+j]) - freqt_buff[g+j-1]); + + } + + /* move memory */ + for(i=0; i= nc) ? nc - 1 : n; + for(k = 1; k <= upl; k++ ) + d += k * c[k] * hh[n - k]; + hh[n] = d / n; + } +} + +/** b2en: functions for postfiltering */ +static double b2en(double *b, int m, double a) +{ + double *spectrum2en_buff=NULL; /* used in spectrum2en */ + int spectrum2en_size=0; /* buffer size for spectrum2en */ + double en = 0.0; + int i; + double *cep, *ir; + + if(spectrum2en_size < m) { + spectrum2en_buff = walloc(double,(m+1) + 2 * IRLENG); + spectrum2en_size = m; + } + cep = walloc(double,(m+1) + 2 * IRLENG); /* CHECK! these sizes!!! */ + ir = walloc(double,(m+1) + 2 * IRLENG); + + b2mc(b, spectrum2en_buff, m, a); + /* freqt(vs->mc, m, vs->cep, vs->irleng - 1, -a);*/ + freqt(spectrum2en_buff, m, cep, IRLENG-1, -a); + /* HTS_c2ir(vs->cep, vs->irleng, vs->ir, vs->irleng); */ + c2ir(cep, IRLENG, ir, IRLENG); + en = 0.0; + + for(i = 0; i < IRLENG; i++) + en += ir[i] * ir[i]; + + if (spectrum2en_buff) + wfree(spectrum2en_buff); + wfree(cep); + wfree(ir); + + return(en); +} + +/** ignorm: inverse gain normalization */ +static void ignorm(double *c1, double *c2, int m, double ng) +{ + double k; + int i; + if(ng != 0.0 ) { + k = pow(c1[0], ng); + for(i=m; i>=1; i--) + c2[i] = k * c1[i]; + c2[0] = (k - 1.0) / ng; + } else { + /* movem */ + for(i=1; i=1; m--) + c2[m] = c1[m] / k; + c2[0] = pow(k, 1.0 / g); + } else { + /* movem */ + for(i=1; i<=m; i++) + c2[i] = c1[i]; + c2[0] = exp(c1[0]); + } + +} + +/** lsp2lpc: transform LSP to LPC. lsp[1..m] --> a=lpc[0..m] a[0]=1.0 */ +static void lsp2lpc(double *lsp, double *a, int m) +{ + double *lsp2lpc_buff=NULL; /* used in lsp2lpc */ + int lsp2lpc_size=0; /* buffer size of lsp2lpc */ + int i, k, mh1, mh2, flag_odd; + double xx, xf, xff; + int p, q; /* offsets of lsp2lpc_buff */ + int a0, a1, a2, b0, b1, b2; /* offsets of lsp2lpc_buff */ + + flag_odd = 0; + if(m % 2 == 0) + mh1 = mh2 = m / 2; + else { + mh1 = (m+1) / 2; + mh2 = (m-1) / 2; + flag_odd = 1; + } + + if(m > lsp2lpc_size){ + lsp2lpc_buff = walloc(double,5 * m + 6); + lsp2lpc_size = m; + } + + /* offsets of lsp2lpcbuff */ + p = m; + q = p + mh1; + a0 = q + mh2; + a1 = a0 + (mh1 +1); + a2 = a1 + (mh1 +1); + b0 = a2 + (mh1 +1); + b1 = b0 + (mh2 +1); + b2 = b1 + (mh2 +1); + + /* move lsp -> lsp2lpc_buff */ + for(i=0; i= 0; i--) + a[i + 1] = -a[i]; + a[0] = 1.0; + + if (lsp2lpc_buff) + wfree(lsp2lpc_buff); +} + +/** gc2gc: generalized cepstral transformation */ +static void gc2gc(double *c1, int m1, double g1, double *c2, int m2, double g2) +{ + double *gc2gc_buff=NULL; /* used in gc2gc */ + int gc2gc_size=0; /* buffer size for gc2gc */ + int i, min, k, mk; + double ss1, ss2, cc; + + if( m1 > gc2gc_size ) { + gc2gc_buff = walloc(double,m1 + 1); /* check if these buffers should be created all the time */ + gc2gc_size = m1; + } + + /* movem*/ + for(i=0; i<(m1+1); i++) + gc2gc_buff[i] = c1[i]; + + c2[0] = gc2gc_buff[0]; + + for( i=1; i<=m2; i++){ + ss1 = ss2 = 0.0; + min = m1 < i ? m1 : i - 1; + for(k=1; k<=min; k++){ + mk = i - k; + cc = gc2gc_buff[k] * c2[mk]; + ss2 += k * cc; + ss1 += mk * cc; + } + + if(i <= m1) + c2[i] = gc2gc_buff[i] + (g2 * ss2 - g1 * ss1) / i; + else + c2[i] = (g2 * ss2 - g1 * ss1) / i; + } + + if (gc2gc_buff) + wfree(gc2gc_buff); +} + + /** mgc2mgc: frequency and generalized cepstral transformation */ +static void mgc2mgc(double *c1, int m1, double a1, double g1, double *c2, int m2, double a2, double g2) +{ + double a; + + if(a1 == a2){ + gnorm(c1, c1, m1, g1); + gc2gc(c1, m1, g1, c2, m2, g2); + ignorm(c2, c2, m2, g2); + } else { + a = (a2 -a1) / (1 - a1 * a2); + freqt(c1, m1, c2, m2, a); + gnorm(c2, c2, m2, g1); + gc2gc(c2, m2, g1, c2, m2, g2); + ignorm(c2, c2, m2, g2); + + } +} + +/** lsp2mgc: transform LSP to MGC. lsp=C[0..m] mgc=C[0..m] */ +static void lsp2mgc(double *lsp, double *mgc, int m, double alpha) +{ + int i; + /* lsp2lpc */ + lsp2lpc(lsp, mgc, m); /* lsp starts in 1! lsp[1..m] --> mgc[0..m] */ + if(use_log_gain) + mgc[0] = exp(lsp[0]); + else + mgc[0] = lsp[0]; + + /* mgc2mgc*/ + if(NORMFLG1) + ignorm(mgc, mgc, m, xgamma); + else if(MULGFLG1) + mgc[0] = (1.0 - mgc[0]) * stage; + + if(MULGFLG1) + for(i=m; i>=1; i--) + mgc[i] *= -stage; + + mgc2mgc(mgc, m, alpha, xgamma, mgc, m, alpha, xgamma); /* input and output is in mgc=C */ + + if(NORMFLG2) + gnorm(mgc, mgc, m, xgamma); + else if(MULGFLG2) + mgc[0] = mgc[0] * xgamma + 1.0; + + if(MULGFLG2) + for(i=m; i>=1; i--) + mgc[i] *= xgamma; + +} + +/** mglsadf: sub functions for MGLSA filter */ +static double mglsadff(double x, double *b, int m, double a, double *d, int d_offset) +{ + int i; + double y; + y = d[d_offset+0] * b[1]; + + for(i=1; i0; i--) + d[d_offset+i] = d[d_offset+i-1]; + d[d_offset+0] = a * d[d_offset+0] + (1 - a * a) * x; + + return x; +} + +static double mglsadf(double x, double *b, int m, double a, int n, double *d) +{ + int i; + for(i=0; i 0.0 && m > 1){ + if(postfilter_size < m){ + postfilter_buff = walloc(double,m+1); + postfilter_size = m; + } + mc2b(mcp, postfilter_buff, m, alpha); + e1 = b2en(postfilter_buff, m, alpha); + + postfilter_buff[1] -= beta * alpha * mcp[2]; + for(k = 2; k < m; k++) + postfilter_buff[k] *= (1.0 +beta); + e2 = b2en(postfilter_buff, m, alpha); + postfilter_buff[0] += log(e1/e2) / 2; + b2mc(postfilter_buff, mcp, m, alpha); + + } + + if (postfilter_buff) + wfree(postfilter_buff); + +} + +static int modShift(int n, int N) +{ + if( n < 0 ) + while( n < 0 ) + n = n + N; + else + while( n >= N ) + n = n - N; + return n; +} + +/** Generate one pitch period from Fourier magnitudes */ +static double *genPulseFromFourierMag(EST_Track *mag, int n, double f0, boolean aperiodicFlag) +{ + + int numHarm = mag->num_channels(); + int i; + int currentF0 = (int)round(f0); + int T, T2; + double *pulse = NULL; + + if(currentF0 < 512) + T = 512; + else + T = 1024; + T2 = 2*T; + + /* since is FFT2 no aperiodicFlag or jitter of 25% is applied */ + + /* get the pulse */ + pulse = walloc(double,T); + EST_FVector real(T2); + EST_FVector imag(T2); + + /* copy Fourier magnitudes (Wai C. Chu "Speech Coding algorithms foundation and evolution of standardized coders" pg. 460) */ + real[0] = real[T] = 0.0; /* DC component set to zero */ + for(i=1; i<=numHarm; i++){ + real[i] = real[T-i] = real[T+i] = real[T2-i] = mag->a(n, i-1); /* Symetric extension */ + imag[i] = imag[T-i] = imag[T+i] = imag[T2-i] = 0.0; + } + for(i=(numHarm+1); i<(T-numHarm); i++){ /* Default components set to 1.0 */ + real[i] = real[T-i] = real[T+i] = real[T2-i] = 1.0; + imag[i] = imag[T-i] = imag[T+i] = imag[T2-i] = 0.0; + } + + /* Calculate inverse Fourier transform */ + IFFT(real, imag); + + /* circular shift and normalise multiplying by sqrt(F0) */ + double sqrt_f0 = sqrt((float)currentF0); + for(i=0; ialpha; + double beta = htsData->beta; + double aa = 1-alpha*alpha; + int audio_size; /* audio size in samples, calculated as num frames * frame period */ + double *audio_double = NULL; + double *magPulse = NULL; /* pulse generated from Fourier magnitudes */ + int magSample, magPulseSize; + boolean aperiodicFlag = false; + + double *d; /* used in the lpc vocoder */ + + double f0, f0Std, f0Shift, f0MeanOri; + double *mc = NULL; /* feature vector for a particular frame */ + double *hp = NULL; /* pulse shaping filter, initialised once it is known orderM */ + double *hn = NULL; /* noise shaping filter, initialised once it is known orderM */ + + /* Initialise vocoder and mixed excitation, once initialised it is known the order + * of the filters so the shaping filters hp and hn can be initialised. */ + m = mcepPst->num_channels(); + mc = walloc(double,m); + + initVocoder(m-1, mcepPst->num_frames(), htsData); + + d = walloc(double,m); + if (lpcVocoder) + { + /* printf("Using LPC vocoder\n"); */ + for(i=0; iuseMixExc; + fourierMagnitudes = htsData->useFourierMag; + + if ( mixedExcitation ) + { + numM = htsData->NumFilters; + orderM = htsData->OrderFilters; + + xpulseSignal = walloc(double,orderM); + xnoiseSignal = walloc(double,orderM); + /* initialise xp_sig and xn_sig */ + for(i=0; iMixFilters; + hp = walloc(double,orderM); + hn = walloc(double,orderM); + + //Check if the number of filters is equal to the order of strpst + //i.e. the number of filters is equal to the number of generated strengths per frame. +#if 0 + if(numM != strPst->num_channels()) { + printf("htsMLSAVocoder: error num mix-excitation filters = %d " + " in configuration file is different from generated str order= %d\n", + numM, strPst->num_channels()); + } + printf("HMM speech generation with mixed-excitation.\n"); +#endif + } +#if 0 + else + printf("HMM speech generation without mixed-excitation.\n"); + + if( fourierMagnitudes && htsData->PdfMagFile != NULL) + printf("Pulse generated with Fourier Magnitudes.\n"); + else + printf("Pulse generated as a unit pulse.\n"); + + if(beta != 0.0) + printf("Postfiltering applied with beta=%f",(float)beta); + else + printf("No postfiltering applied.\n"); +#endif + + /* Clear content of c, should be done if this function is + called more than once with a new set of generated parameters. */ + for(i=0; i< C_length; i++) + C[i] = CC[i] = CINC[i] = 0.0; + for(i=0; i< D1_length; i++) + D1[i]=0.0; + + f0Std = htsData->F0Std; + f0Shift = htsData->F0Mean; + f0MeanOri = 0.0; + + /* XXX */ + for (mcepframe=0,lf0frame=0; mcepframenum_frames(); mcepframe++) + { + if(voiced[mcepframe]) + { /* WAS WRONG */ + f0MeanOri = f0MeanOri + lf0Pst->a(mcepframe, 0); + lf0frame++; + } + } + f0MeanOri = f0MeanOri/lf0frame; + + /* ____________________Synthesize speech waveforms_____________________ */ + /* generate Nperiod samples per mcepframe */ + s = 0; /* number of samples */ + s_double = 0; + audio_size = mcepPst->num_frames() * (fprd); + audio_double = walloc(double,audio_size); /* initialise buffer for audio */ + magSample = 1; + magPulseSize = 0; + + for(mcepframe=0,lf0frame=0; mcepframenum_frames(); mcepframe++) + { + /* get current feature vector mcp */ + for(i=0; ia(mcepframe, i); + + /* f0 modification through the MARY audio effects */ + if(voiced[mcepframe]){ + f0 = f0Std * lf0Pst->a(mcepframe, 0) + (1-f0Std) * f0MeanOri + f0Shift; + lf0frame++; + if(f0 < 0.0) + f0 = 0.0; + } + else{ + f0 = 0.0; + } + + /* if mixed excitation get shaping filters for this frame */ + if (mixedExcitation) + { + for(j=0; ja(mcepframe, i) * h[i][j]; + hn[j] += ( 1 - strPst->a(mcepframe, i) ) * h[i][j]; + } + } + } + + /* f0->pitch, in original code here it is used p, so f0=p in the c code */ + if(f0 != 0.0) + f0 = rate/f0; + + /* p1 is initialised in -1, so this will be done just for the first frame */ + if( p1 < 0 ) { + p1 = f0; + pc = p1; + /* for LSP */ + if(stage != 0){ + if( use_log_gain) + C[0] = LZERO; + else + C[0] = ZERO; + for(i=0; i MGC */ + lsp2mgc(C, C, (m-1), alpha); + mc2b(C, C, (m-1), alpha); + gnorm(C, C, (m-1), xgamma); + for(i=1; i0.0 */ + postfilter_mcp(mc, (m-1), alpha, beta); + /* mc2b: transform mel-cepstrum to MLSA digital filter coefficients */ + mc2b(mc, CC, (m-1), alpha); + for(i=0; i=0; j--) { + if(p1 == 0.0) { + if(gauss) + x = 0 /* rand.nextGaussian() */; /* XXX returns double, gaussian distribution mean=0.0 and var=1.0 */ + else + x = uniformRand(); /* returns 1.0 or -1.0 uniformly distributed */ + + if(mixedExcitation) { + xn = x; + xp = 0.0; + } + } else { + if( (pc += 1.0) >= p1 ){ + if(fourierMagnitudes){ + /* jitter is applied just in voiced frames when the stregth of the first band is < 0.5*/ + /* this will work just if Radix FFT is used */ + /*if(strPst.getPar(mcepframe, 0) < 0.5) + aperiodicFlag = true; + else + aperiodicFlag = false; + magPulse = genPulseFromFourierMagRadix(magPst, mcepframe, p1, aperiodicFlag); + */ + + magPulse = genPulseFromFourierMag(magPst, mcepframe, p1, aperiodicFlag); + magSample = 0; + magPulseSize = -27 /* magPulse.length*/; /** XXX **/ + x = magPulse[magSample]; + magSample++; + } else + x = sqrt(p1); + + pc = pc - p1; + } else { + + if(fourierMagnitudes){ + if(magSample >= magPulseSize ){ + x = 0.0; + } + else + x = magPulse[magSample]; + magSample++; + } else + x = 0.0; + } + + if(mixedExcitation) { + xp = x; + if(gauss) + xn = 0 /* rand.nextGaussian() */ ; /* XXX */ + else + xn = uniformRand(); + } + } + + /* apply the shaping filters to the pulse and noise samples */ + /* i need memory of at least for M samples in both signals */ + if(mixedExcitation) { + fxp = 0.0; + fxn = 0.0; + for(k=orderM-1; k>0; k--) { + fxp += hp[k] * xpulseSignal[k]; + fxn += hn[k] * xnoiseSignal[k]; + xpulseSignal[k] = xpulseSignal[k-1]; + xnoiseSignal[k] = xnoiseSignal[k-1]; + } + fxp += hp[0] * xp; + fxn += hn[0] * xn; + xpulseSignal[0] = xp; + xnoiseSignal[0] = xn; + + /* x is a pulse noise excitation and mix is mixed excitation */ + mix = fxp+fxn; + + /* comment this line if no mixed excitation, just pulse and noise */ + x = mix; /* excitation sample */ + /* printf("awb_debug me %d %f\n",(int)(s_double),(float)x); */ + } + + if(lpcVocoder){ + // LPC filter C[k=0] = gain is not used! + if(!NGAIN) + x *= C[0]; + for(k=(m-1); k>1; k--){ + x = x - (C[k] * d[k]); + d[k] = d[k-1]; + } + x = x - (C[1] * d[1]); + d[1] = x; + + } else if(stage == 0 ){ + if(x != 0.0 ) + x *= exp(C[0]); + x = mlsadf(x, C, m, alpha, aa, D1); + + } else { + if(!NGAIN) + x *= C[0]; + x = mglsadf(x, C, (m-1), alpha, stage, D1); + } + + audio_double[s_double] = x; + s_double++; + + if((--i) == 0 ) { + p1 += inc; + for(k=0; kcc, v->c, m + 1); */ + for(i=0; iresize(audio_size,1); + for (i=0; ia(i) = (short)audio_double[i]; + + return 0; + +} /* method htsMLSAVocoder() */ + + diff --git a/src/modules/clustergen/mlsa_resynthesis.cc b/src/modules/clustergen/mlsa_resynthesis.cc new file mode 100644 index 0000000..b1e998e --- /dev/null +++ b/src/modules/clustergen/mlsa_resynthesis.cc @@ -0,0 +1,942 @@ +/* --------------------------------------------------------------- */ +/* The HMM-Based Speech Synthesis System (HTS): version 1.1b */ +/* HTS Working Group */ +/* */ +/* Department of Computer Science */ +/* Nagoya Institute of Technology */ +/* and */ +/* Interdisciplinary Graduate School of Science and Engineering */ +/* Tokyo Institute of Technology */ +/* Copyright (c) 2001-2003 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and */ +/* distribute this software and its documentation without */ +/* restriction, including without limitation the rights to use, */ +/* copy, modify, merge, publish, distribute, sublicense, and/or */ +/* sell copies of this work, and to permit persons to whom this */ +/* work is furnished to do so, subject to the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list */ +/* of conditions and the following disclaimer. */ +/* */ +/* 2. Any modifications must be clearly marked as such. */ +/* */ +/* NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSITITUTE OF TECHNOLOGY, */ +/* HTS WORKING GROUP, AND THE CONTRIBUTORS TO THIS WORK DISCLAIM */ +/* ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL */ +/* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSITITUTE OF */ +/* TECHNOLOGY, HTS WORKING GROUP, NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY */ +/* DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, */ +/* WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTUOUS */ +/* ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR */ +/* PERFORMANCE OF THIS SOFTWARE. */ +/* */ +/* --------------------------------------------------------------- */ +/* This is Zen's MLSA filter as ported by Toda to festvox vc */ +/* and back ported into hts/festival so we can do MLSA filtering */ +/* If I took more time I could probably make this use the same as */ +/* as the other code in this directory -- awb@cs.cmu.edu 03JAN06 */ +/* --------------------------------------------------------------- */ + +/*********************************************************************/ +/* */ +/* Mel-cepstral vocoder (pulse/noise excitation & MLSA filter) */ +/* 2003/12/26 by Heiga Zen */ +/* */ +/* Extracted from HTS and slightly modified */ +/* by Tomoki Toda (tomoki@ics.nitech.ac.jp) */ +/* June 2004 */ +/* Integrate as a Voice Conversion module */ +/* */ +/*-------------------------------------------------------------------*/ + +#include +#include +#include +#include +#include +#include "festival.h" + +#include "mlsa_resynthesis.h" + +static void wavecompressor(DVECTOR wav); +static DVECTOR xdvalloc(long length); +static DVECTOR xdvcut(DVECTOR x, long offset, long length); +static void xdvfree(DVECTOR vector); +static double dvmax(DVECTOR x, long *index); +static double dvmin(DVECTOR x, long *index); +static DMATRIX xdmalloc(long row, long col); +static void xdmfree(DMATRIX matrix); + +static void waveampcheck(DVECTOR wav, XBOOL msg_flag); + +static void init_vocoder(double fs, int framel, int m, VocoderSetup *vs); +static void vocoder(double p, double *mc, EST_Track *str, + int t, + int m, double a, double beta, + VocoderSetup *vs, double *wav, long *pos); +static double mlsadf(double x, double *b, int m, double a, int pd, double *d, + VocoderSetup *vs); +static double mlsadf1(double x, double *b, int m, double a, int pd, double *d, + VocoderSetup *vs); +static double mlsadf2(double x, double *b, int m, double a, int pd, double *d, + VocoderSetup *vs); +static double mlsafir (double x, double *b, int m, double a, double *d); +static double nrandom (VocoderSetup *vs); +static double rnd (unsigned long *next); +static unsigned long srnd (unsigned long seed); +static int mseq (VocoderSetup *vs); +static void mc2b (double *mc, double *b, int m, double a); +static double b2en (double *b, int m, double a, VocoderSetup *vs); +static void b2mc (double *b, double *mc, int m, double a); +static void freqt (double *c1, int m1, double *c2, int m2, double a, + VocoderSetup *vs); +static void c2ir (double *c, int nc, double *h, int leng); + + +#if 0 +static DVECTOR get_dpowvec(DMATRIX rmcep, DMATRIX cmcep); +static double get_dpow(double *rmcep, double *cmcep, int m, double a, + VocoderSetup *vs); +#endif +static void free_vocoder(VocoderSetup *vs); + +LISP mlsa_resynthesis(LISP ltrack, LISP strtrack) +{ + /* Resynthesizes a wave from given track */ + EST_Track *t; + EST_Track *str = 0; + EST_Wave *wave = 0; + DVECTOR w; + DMATRIX mcep; + DVECTOR f0v; + int sr = 16000; + int i,j; + double shift; + double ALPHA = 0.42; + double BETA = 0.0; + + if ((ltrack == NULL) || + (TYPEP(ltrack,tc_string) && + (streq(get_c_string(ltrack),"nil")))) + return siod(new EST_Wave(0,1,sr)); + + t = track(ltrack); + + if (strtrack != NULL) + { /* We have to do mixed-excitation */ + str = track(strtrack); + } + + f0v = xdvalloc(t->num_frames()); + mcep = xdmalloc(t->num_frames(),t->num_channels()-1); + + for (i=0; inum_frames(); i++) + { + f0v->data[i] = t->a(i,0); + for (j=1; jnum_channels(); j++) + mcep->data[i][j-1] = t->a(i,j); + } + + if (t->num_frames() > 1) + shift = 1000.0*(t->t(1)-t->t(0)); + else + shift = 5.0; + + ALPHA = FLONM(siod_get_lval("mlsa_alpha_param", + "mlsa: mlsa_alpha_param not set")); + BETA = FLONM(siod_get_lval("mlsa_beta_param", + "mlsa: mlsa_beta_param not set")); + + w = synthesis_body(mcep,f0v,str,sr,shift,ALPHA,BETA); + + wave = new EST_Wave(w->length,1,sr); + + for (i=0; ilength; i++) + wave->a(i) = (short)w->data[i]; + + xdmfree(mcep); + xdvfree(f0v); + xdvfree(w); + + return siod(wave); +} + + +DVECTOR synthesis_body(DMATRIX mcep, // input mel-cep sequence + DVECTOR f0v, // input F0 sequence + EST_Track *str, // str for mixed excitation + double fs, // sampling frequency (Hz) + double framem, // FFT length + double alpha, + double beta) +{ + long t, pos; + int framel; + double f0; + VocoderSetup vs; + DVECTOR xd = NODATA; + DVECTOR syn = NODATA; + int i,j; + + framel = (int)(framem * fs / 1000.0); + init_vocoder(fs, framel, mcep->col - 1, &vs); + + if (str != NULL) + { + /* Mixed excitation filters */ + LISP filters = siod_get_lval("me_mix_filters", + "mlsa: me_mix_filters not set"); + LISP f; + int fl; + for (fl=0,f=filters; f; fl++) + f=cdr(f); + for (fl=0,f=filters; f; fl++) + f=cdr(f); + vs.ME_num = 5; + vs.ME_order = fl/vs.ME_num; + + for (i=0; i < vs.ME_num; i++) + { + for (j=0; jrow * (framel + 2)); + for (t = 0, pos = 0; t < mcep->row; t++) { + if (t >= f0v->length) f0 = 0.0; + else f0 = f0v->data[t]; + + vocoder(f0, mcep->data[t], + str, t, + mcep->col - 1, + alpha, beta, &vs, + xd->data, &pos); + + } + syn = xdvcut(xd, 0, pos); + + // normalized amplitude + /* waveampcheck(syn, XFALSE); */ + wavecompressor(syn); + + // memory free + xdvfree(xd); + free_vocoder(&vs); + + return syn; +} + +static void wavecompressor(DVECTOR wav) +{ + /* a somewhat over specific compressor */ + int i; + double maxvalue, absv, d; + int sign; + + maxvalue = MAX(FABS(dvmax(wav, NULL)), FABS(dvmin(wav, NULL))); + for (i=0; i < wav->length; i++) + { + sign = ( wav->data[i] < 0 ) ? -1 : 1; + absv = FABS(wav->data[i]); + if (absv > 15000) + { + d = absv - 15000; + d /= maxvalue-15000; + wav->data[i] = sign*(12500+(5000*d)); + } + else if (absv > 10000) + wav->data[i] = sign*(10000+((absv-10000)/2.0)); + } + +} + +static void waveampcheck(DVECTOR wav, XBOOL msg_flag) +{ + double value; + int k; + + value = MAX(FABS(dvmax(wav, NULL)), FABS(dvmin(wav, NULL))); + if (value >= 32000.0) { + if (msg_flag == XTRUE) { + fprintf(stderr, "amplitude is too big: %f\n", value); + fprintf(stderr, "execute normalization\n"); + } + /* was dvscoper(wav, "*", 32000.0 / value); */ + for (k = 0; k < wav->length; k++) { + wav->data[k] = wav->data[k] * (32000.0/value); + if (wav->imag != NULL) { + wav->imag[k] = wav->imag[k] * (32000.0/value); + } + } + } + + return; +} + +static void init_vocoder(double fs, int framel, int m, VocoderSetup *vs) +{ + // initialize global parameter + int i; + + vs->fprd = framel; + vs->iprd = 1; + vs->seed = 1; + vs->pd = 5; + + vs->next =1; + vs->gauss = MTRUE; + + vs->pade[ 0]=1.0; + vs->pade[ 1]=1.0; vs->pade[ 2]=0.0; + vs->pade[ 3]=1.0; vs->pade[ 4]=0.0; vs->pade[ 5]=0.0; + vs->pade[ 6]=1.0; vs->pade[ 7]=0.0; vs->pade[ 8]=0.0; vs->pade[ 9]=0.0; + vs->pade[10]=1.0; vs->pade[11]=0.4999273; vs->pade[12]=0.1067005; vs->pade[13]=0.01170221; vs->pade[14]=0.0005656279; + vs->pade[15]=1.0; vs->pade[16]=0.4999391; vs->pade[17]=0.1107098; vs->pade[18]=0.01369984; vs->pade[19]=0.0009564853; + vs->pade[20]=0.00003041721; + + vs->rate = fs; + vs->c = wcalloc(double,3 * (m + 1) + 3 * (vs->pd + 1) + vs->pd * (m + 2)); + + vs->p1 = -1; + vs->sw = 0; + vs->x = 0x55555555; + + // for postfiltering + vs->mc = NULL; + vs->o = 0; + vs->d = NULL; + vs->irleng= 64; + + // for MIXED EXCITATION + vs->ME_order = 48; + vs->ME_num = 5; + vs->hpulse = walloc(double,vs->ME_order); + vs->hnoise = walloc(double,vs->ME_order); + vs->xpulsesig = walloc(double,vs->ME_order); + vs->xnoisesig = walloc(double,vs->ME_order); + vs->h = walloc(double *,vs->ME_num); + for (i=0; i< vs->ME_num; i++) + vs->h[i] = walloc(double,vs->ME_order); + + return; +} + +static double plus_or_minus_one() +{ + /* Randomly return 1 or -1 */ + if (rand() > RAND_MAX/2.0) + return 1.0; + else + return -1.0; +} + +static void vocoder(double p, double *mc, + EST_Track *str, int t, + int m, double a, double beta, + VocoderSetup *vs, double *wav, long *pos) +{ + double inc, x, e1, e2; + int i, j, k; + double xpulse, xnoise; + double fxpulse, fxnoise; + + if (str != NULL) /* MIXED-EXCITATION */ + { + /* Copy in str's and build hpulse and hnoise for this frame */ + for (i=0; iME_order; i++) + { + vs->hpulse[i] = vs->hnoise[i] = 0.0; + for (j=0; jME_num; j++) + { + vs->hpulse[i] += str->a(t,j) * vs->h[j][i]; + vs->hnoise[i] += (1 - str->a(t,j)) * vs->h[j][i]; + } + } + printf("awb_debug str %f %f %f %f %f\n", + str->a(t,0), + str->a(t,1), + str->a(t,2), + str->a(t,3), + str->a(t,4)); + } + + if (p != 0.0) + p = vs->rate / p; // f0 -> pitch + + if (vs->p1 < 0) { + if (vs->gauss & (vs->seed != 1)) + vs->next = srnd((unsigned)vs->seed); + + vs->p1 = p; + vs->pc = vs->p1; + vs->cc = vs->c + m + 1; + vs->cinc = vs->cc + m + 1; + vs->d1 = vs->cinc + m + 1; + + mc2b(mc, vs->c, m, a); + + if (beta > 0.0 && m > 1) { + e1 = b2en(vs->c, m, a, vs); + vs->c[1] -= beta * a * mc[2]; + for (k=2;k<=m;k++) + vs->c[k] *= (1.0 + beta); + e2 = b2en(vs->c, m, a, vs); + vs->c[0] += log(e1/e2)/2; + } + + return; + } + + mc2b(mc, vs->cc, m, a); + if (beta>0.0 && m > 1) { + e1 = b2en(vs->cc, m, a, vs); + vs->cc[1] -= beta * a * mc[2]; + for (k = 2; k <= m; k++) + vs->cc[k] *= (1.0 + beta); + e2 = b2en(vs->cc, m, a, vs); + vs->cc[0] += log(e1 / e2) / 2.0; + } + + for (k=0; k<=m; k++) + vs->cinc[k] = (vs->cc[k] - vs->c[k]) * + (double)vs->iprd / (double)vs->fprd; + + if (vs->p1!=0.0 && p!=0.0) { + inc = (p - vs->p1) * (double)vs->iprd / (double)vs->fprd; + } else { + inc = 0.0; + vs->pc = p; + vs->p1 = 0.0; + } + + for (j = vs->fprd, i = (vs->iprd + 1) / 2; j--;) { + if (vs->p1 == 0.0) { + if (vs->gauss) + x = (double) nrandom(vs); + else + x = plus_or_minus_one(); + + if (str != NULL) /* MIXED EXCITATION */ + { + xnoise = x; + xpulse = 0.0; + } + } else { + if ((vs->pc += 1.0) >= vs->p1) + { + x = sqrt (vs->p1); + vs->pc = vs->pc - vs->p1; + } + else + x = 0.0; + + if (str != NULL) /* MIXED EXCITATION */ + { + xpulse = x; + xnoise = plus_or_minus_one(); + } + } + + /* MIXED EXCITATION */ + /* The real work -- apply shaping filters to pulse and noise */ + if (str != NULL) + { + fxpulse = fxnoise = 0.0; + for (k=vs->ME_order-1; k>0; k--) + { + fxpulse += vs->hpulse[k] * vs->xpulsesig[k]; + fxnoise += vs->hnoise[k] * vs->xnoisesig[k]; + + vs->xpulsesig[k] = vs->xpulsesig[k-1]; + vs->xnoisesig[k] = vs->xnoisesig[k-1]; + } + fxpulse += vs->hpulse[0] * xpulse; + fxnoise += vs->hnoise[0] * xnoise; + vs->xpulsesig[0] = xpulse; + vs->xnoisesig[0] = xnoise; + + x = fxpulse + fxnoise; /* excitation is pulse plus noise */ + printf("awb_debug %f\n",(float)x); + } + + x *= exp(vs->c[0]); + + x = mlsadf(x, vs->c, m, a, vs->pd, vs->d1, vs); + + wav[*pos] = x; + *pos += 1; + + if (!--i) { + vs->p1 += inc; + for (k = 0; k <= m; k++) vs->c[k] += vs->cinc[k]; + i = vs->iprd; + } + } + + vs->p1 = p; + memmove(vs->c,vs->cc,sizeof(double)*(m+1)); + + return; +} + +static double mlsadf(double x, double *b, int m, double a, int pd, double *d, VocoderSetup *vs) +{ + + vs->ppade = &(vs->pade[pd*(pd+1)/2]); + + x = mlsadf1 (x, b, m, a, pd, d, vs); + x = mlsadf2 (x, b, m, a, pd, &d[2*(pd+1)], vs); + + return(x); +} + +static double mlsadf1(double x, double *b, int m, double a, int pd, double *d, VocoderSetup *vs) +{ + double v, out = 0.0, *pt, aa; + register int i; + + aa = 1 - a*a; + pt = &d[pd+1]; + + for (i=pd; i>=1; i--) { + d[i] = aa*pt[i-1] + a*d[i]; + pt[i] = d[i] * b[1]; + v = pt[i] * vs->ppade[i]; + x += (1 & i) ? v : -v; + out += v; + } + + pt[0] = x; + out += x; + + return(out); +} + +static double mlsadf2 (double x, double *b, int m, double a, int pd, double *d, VocoderSetup *vs) +{ + double v, out = 0.0, *pt, aa; + register int i; + + aa = 1 - a*a; + pt = &d[pd * (m+2)]; + + for (i=pd; i>=1; i--) { + pt[i] = mlsafir (pt[i-1], b, m, a, &d[(i-1)*(m+2)]); + v = pt[i] * vs->ppade[i]; + + x += (1&i) ? v : -v; + out += v; + } + + pt[0] = x; + out += x; + + return(out); +} + +static double mlsafir (double x, double *b, int m, double a, double *d) +{ + double y = 0.0; + double aa; + register int i; + + aa = 1 - a*a; + + d[0] = x; + d[1] = aa*d[0] + a*d[1]; + + for (i=2; i<=m; i++) { + d[i] = d[i] + a*(d[i+1]-d[i-1]); + y += d[i]*b[i]; + } + + for (i=m+1; i>1; i--) + d[i] = d[i-1]; + + return(y); +} + +static double nrandom (VocoderSetup *vs) +{ + if (vs->sw == 0) { + vs->sw = 1; + do { + vs->r1 = 2.0 * rnd(&vs->next) - 1.0; + vs->r2 = 2.0 * rnd(&vs->next) - 1.0; + vs->s = vs->r1 * vs->r1 + vs->r2 * vs->r2; + } while (vs->s > 1 || vs->s == 0); + + vs->s = sqrt (-2 * log(vs->s) / vs->s); + + return(vs->r1*vs->s); + } + else { + vs->sw = 0; + + return (vs->r2*vs->s); + } +} + +static double rnd (unsigned long *next) +{ + double r; + + *next = *next * 1103515245L + 12345; + r = (*next / 65536L) % 32768L; + + return(r/RANDMAX); +} + +static unsigned long srnd ( unsigned long seed ) +{ + return(seed); +} + +static int mseq (VocoderSetup *vs) +{ + register int x0, x28; + + vs->x >>= 1; + + if (vs->x & B0) + x0 = 1; + else + x0 = -1; + + if (vs->x & B28) + x28 = 1; + else + x28 = -1; + + if (x0 + x28) + vs->x &= B31_; + else + vs->x |= B31; + + return(x0); +} + +// mc2b : transform mel-cepstrum to MLSA digital fillter coefficients +static void mc2b (double *mc, double *b, int m, double a) +{ + b[m] = mc[m]; + + for (m--; m>=0; m--) + b[m] = mc[m] - a * b[m+1]; + + return; +} + + +static double b2en (double *b, int m, double a, VocoderSetup *vs) +{ + double en; + int k; + + if (vs->omc != NULL) + wfree(vs->mc); + + vs->mc = wcalloc(double,(m + 1) + 2 * vs->irleng); + vs->cep = vs->mc + m+1; + vs->ir = vs->cep + vs->irleng; + } + + b2mc(b, vs->mc, m, a); + freqt(vs->mc, m, vs->cep, vs->irleng-1, -a, vs); + c2ir(vs->cep, vs->irleng, vs->ir, vs->irleng); + en = 0.0; + + for (k=0;kirleng;k++) + en += vs->ir[k] * vs->ir[k]; + + return(en); +} + + +// b2bc : transform MLSA digital filter coefficients to mel-cepstrum +static void b2mc (double *b, double *mc, int m, double a) +{ + double d, o; + + d = mc[m] = b[m]; + for (m--; m>=0; m--) { + o = b[m] + a * d; + d = b[m]; + mc[m] = o; + } + + return; +} + +// freqt : frequency transformation +static void freqt (double *c1, int m1, double *c2, int m2, double a, VocoderSetup *vs) +{ + register int i, j; + double b; + + if (vs->d==NULL) { + vs->size = m2; + vs->d = wcalloc(double,vs->size + vs->size + 2); + vs->g = vs->d+vs->size+1; + } + + if (m2>vs->size) { + wfree(vs->d); + vs->size = m2; + vs->d = wcalloc(double,vs->size + vs->size + 2); + vs->g = vs->d+vs->size+1; + } + + b = 1-a*a; + for (i=0; ig[i] = 0.0; + + for (i=-m1; i<=0; i++) { + if (0 <= m2) + vs->g[0] = c1[-i]+a*(vs->d[0]=vs->g[0]); + if (1 <= m2) + vs->g[1] = b*vs->d[0]+a*(vs->d[1]=vs->g[1]); + for (j=2; j<=m2; j++) + vs->g[j] = vs->d[j-1]+a*((vs->d[j]=vs->g[j])-vs->g[j-1]); + } + + memmove(c2,vs->g,sizeof(double)*(m2+1)); + + return; +} + +// c2ir : The minimum phase impulse response is evaluated from the minimum phase cepstrum +static void c2ir (double *c, int nc, double *h, int leng) +{ + register int n, k, upl; + double d; + + h[0] = exp(c[0]); + for (n=1; n=nc) ? nc-1 : n; + for (k=1; k<=upl; k++) + d += k*c[k]*h[n-k]; + h[n] = d/n; + } + + return; +} + +#if 0 +static double get_dpow(double *rmcep, double *cmcep, int m, double a, + VocoderSetup *vs) +{ + double e1, e2, dpow; + + if (vs->p1 < 0) { + vs->p1 = 1; + vs->cc = vs->c + m + 1; + vs->cinc = vs->cc + m + 1; + vs->d1 = vs->cinc + m + 1; + } + + mc2b(rmcep, vs->c, m, a); + e1 = b2en(vs->c, m, a, vs); + + mc2b(cmcep, vs->cc, m, a); + e2 = b2en(vs->cc, m, a, vs); + + dpow = log(e1 / e2) / 2.0; + + return dpow; +} +#endif + +static void free_vocoder(VocoderSetup *vs) +{ + int i; + + wfree(vs->c); + wfree(vs->mc); + wfree(vs->d); + + vs->c = NULL; + vs->mc = NULL; + vs->d = NULL; + vs->ppade = NULL; + vs->cc = NULL; + vs->cinc = NULL; + vs->d1 = NULL; + vs->g = NULL; + vs->cep = NULL; + vs->ir = NULL; + + wfree(vs->hpulse); + wfree(vs->hnoise); + wfree(vs->xpulsesig); + wfree(vs->xnoisesig); + for (i=0; iME_num; i++) + wfree(vs->h[i]); + wfree(vs->h); + + return; +} + +/* from vector.cc */ + +static DVECTOR xdvalloc(long length) +{ + DVECTOR x; + + length = MAX(length, 0); + x = wcalloc(struct DVECTOR_STRUCT,1); + x->data = wcalloc(double,MAX(length, 1)); + x->imag = NULL; + x->length = length; + + return x; +} + +static void xdvfree(DVECTOR x) +{ + if (x != NULL) { + if (x->data != NULL) { + wfree(x->data); + } + if (x->imag != NULL) { + wfree(x->imag); + } + wfree(x); + } + + return; +} + +static void dvialloc(DVECTOR x) +{ + if (x->imag != NULL) { + wfree(x->imag); + } + x->imag = wcalloc(double,x->length); + + return; +} + +static DVECTOR xdvcut(DVECTOR x, long offset, long length) +{ + long k; + long pos; + DVECTOR y; + + y = xdvalloc(length); + if (x->imag != NULL) { + dvialloc(y); + } + + for (k = 0; k < y->length; k++) { + pos = k + offset; + if (pos >= 0 && pos < x->length) { + y->data[k] = x->data[pos]; + if (y->imag != NULL) { + y->imag[k] = x->imag[pos]; + } + } else { + y->data[k] = 0.0; + if (y->imag != NULL) { + y->imag[k] = 0.0; + } + } + } + + return y; +} + +static DMATRIX xdmalloc(long row, long col) +{ + DMATRIX matrix; + int i; + + matrix = wcalloc(struct DMATRIX_STRUCT,1); + matrix->data = wcalloc(double *,row); + for (i=0; idata[i] = wcalloc(double,col); + matrix->imag = NULL; + matrix->row = row; + matrix->col = col; + + return matrix; +} + +void xdmfree(DMATRIX matrix) +{ + int i; + + if (matrix != NULL) { + if (matrix->data != NULL) { + for (i=0; irow; i++) + wfree(matrix->data[i]); + wfree(matrix->data); + } + if (matrix->imag != NULL) { + for (i=0; irow; i++) + wfree(matrix->imag[i]); + wfree(matrix->imag); + } + wfree(matrix); + } + + return; +} + + +/* from voperate.cc */ +static double dvmax(DVECTOR x, long *index) +{ + long k; + long ind; + double max; + + ind = 0; + max = x->data[ind]; + for (k = 1; k < x->length; k++) { + if (max < x->data[k]) { + ind = k; + max = x->data[k]; + } + } + + if (index != NULL) { + *index = ind; + } + + return max; +} + +static double dvmin(DVECTOR x, long *index) +{ + long k; + long ind; + double min; + + ind = 0; + min = x->data[ind]; + for (k = 1; k < x->length; k++) { + if (min > x->data[k]) { + ind = k; + min = x->data[k]; + } + } + + if (index != NULL) { + *index = ind; + } + + return min; +} diff --git a/src/modules/clustergen/mlsa_resynthesis.h b/src/modules/clustergen/mlsa_resynthesis.h new file mode 100644 index 0000000..5868bb0 --- /dev/null +++ b/src/modules/clustergen/mlsa_resynthesis.h @@ -0,0 +1,129 @@ +/*********************************************************************/ +/* */ +/* Nagoya Institute of Technology, Aichi, Japan, */ +/* Nara Institute of Science and Technology, Nara, Japan */ +/* and */ +/* Carnegie Mellon University, Pittsburgh, PA */ +/* Copyright (c) 2003-2004 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and */ +/* distribute this software and its documentation without */ +/* restriction, including without limitation the rights to use, */ +/* copy, modify, merge, publish, distribute, sublicense, and/or */ +/* sell copies of this work, and to permit persons to whom this */ +/* work is furnished to do so, subject to the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list */ +/* of conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* */ +/* NAGOYA INSTITUTE OF TECHNOLOGY, NARA INSTITUTE OF SCIENCE AND */ +/* TECHNOLOGY, CARNEGIE MELLON UNIVERSITY, AND THE CONTRIBUTORS TO */ +/* THIS WORK DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, */ +/* INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, */ +/* IN NO EVENT SHALL NAGOYA INSTITUTE OF TECHNOLOGY, NARA */ +/* INSTITUTE OF SCIENCE AND TECHNOLOGY, CARNEGIE MELLON UNIVERSITY, */ +/* NOR THE CONTRIBUTORS BE LIABLE FOR ANY SPECIAL, INDIRECT OR */ +/* CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM */ +/* LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, */ +/* NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN */ +/* CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ +/* */ +/*********************************************************************/ +/* */ +/* Author : Tomoki Toda (tomoki@ics.nitech.ac.jp) */ +/* Date : June 2004 */ +/* */ +/* Modified by Alan W Black (awb@cs.cmu.edu) Jan 2006 */ +/* taken from festvox/src/vc/ back into Festival */ +/*-------------------------------------------------------------------*/ +/* */ +/* Subroutine for Speech Synthesis */ +/* */ +/*-------------------------------------------------------------------*/ + +#ifndef __MLSA_RESYNTHESIS_H +#define __MLSA_RESYNTHESIS_H + +typedef struct DVECTOR_STRUCT { + long length; + double *data; + double *imag; +} *DVECTOR; + +typedef struct DMATRIX_STRUCT { + long row; + long col; + double **data; + double **imag; +} *DMATRIX; + +#define XBOOL int +#define XTRUE 1 +#define XFALSE 0 + +#define NODATA NULL + +#define FABS(x) ((x) >= 0.0 ? (x) : -(x)) +#define MAX(a, b) ((a) > (b) ? (a) : (b)) + +DVECTOR synthesis_body(DMATRIX mcep, DVECTOR f0v, + EST_Track *str, + double fs, double framem, + double alpha, double beta); +#define RANDMAX 32767 +#define B0 0x00000001 +#define B28 0x10000000 +#define B31 0x80000000 +#define B31_ 0x7fffffff +#define Z 0x00000000 + +typedef enum {MFALSE, MTRUE} Boolean; + +typedef struct _VocoderSetup { + + int fprd; + int iprd; + int seed; + int pd; + unsigned long next; + Boolean gauss; + double p1; + double pc; + double pj; + double pade[21]; + double *ppade; + double *c, *cc, *cinc, *d1; + double rate; + + int sw; + double r1, r2, s; + + int x; + + /* for postfiltering */ + int size; + double *d; + double *g; + double *mc; + double *cep; + double *ir; + int o; + int irleng; + + /* for MIXED EXCITATION */ + int ME_order; + int ME_num; + double *hpulse; + double *hnoise; + + double *xpulsesig; + double *xnoisesig; + + double **h; + +} VocoderSetup; + +#endif /* __RESYNTHESIS_SUB_H */ diff --git a/src/modules/clustergen/simple_mlpg.cc b/src/modules/clustergen/simple_mlpg.cc new file mode 100644 index 0000000..aca0b5f --- /dev/null +++ b/src/modules/clustergen/simple_mlpg.cc @@ -0,0 +1,1007 @@ +/* --------------------------------------------------------------- */ +/* The HMM-Based Speech Synthesis System (HTS): version 1.1.1 */ +/* HTS Working Group */ +/* */ +/* Department of Computer Science */ +/* Nagoya Institute of Technology */ +/* and */ +/* Interdisciplinary Graduate School of Science and Engineering */ +/* Tokyo Institute of Technology */ +/* Copyright (c) 2001-2003 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and */ +/* distribute this software and its documentation without */ +/* restriction, including without limitation the rights to use, */ +/* copy, modify, merge, publish, distribute, sublicense, and/or */ +/* sell copies of this work, and to permit persons to whom this */ +/* work is furnished to do so, subject to the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list */ +/* of conditions and the following disclaimer. */ +/* */ +/* 2. Any modifications must be clearly marked as such. */ +/* */ +/* NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSITITUTE OF TECHNOLOGY, */ +/* HTS WORKING GROUP, AND THE CONTRIBUTORS TO THIS WORK DISCLAIM */ +/* ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL */ +/* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSITITUTE OF */ +/* TECHNOLOGY, HTS WORKING GROUP, NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY */ +/* DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, */ +/* WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS */ +/* ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR */ +/* PERFORMANCE OF THIS SOFTWARE. */ +/* */ +/* --------------------------------------------------------------- */ +/* mlpg.c : speech parameter generation from pdf sequence */ +/* */ +/* 2003/12/26 by Heiga Zen */ +/* --------------------------------------------------------------- */ +/*********************************************************************/ +/* */ +/* Nagoya Institute of Technology, Aichi, Japan, */ +/* and */ +/* Carnegie Mellon University, Pittsburgh, PA */ +/* Copyright (c) 2003-2004,2008 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and */ +/* distribute this software and its documentation without */ +/* restriction, including without limitation the rights to use, */ +/* copy, modify, merge, publish, distribute, sublicense, and/or */ +/* sell copies of this work, and to permit persons to whom this */ +/* work is furnished to do so, subject to the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list */ +/* of conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* */ +/* NAGOYA INSTITUTE OF TECHNOLOGY, CARNEGIE MELLON UNIVERSITY, AND */ +/* THE CONTRIBUTORS TO THIS WORK DISCLAIM ALL WARRANTIES WITH */ +/* REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF */ +/* MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL NAGOYA INSTITUTE */ +/* OF TECHNOLOGY, CARNEGIE MELLON UNIVERSITY, NOR THE CONTRIBUTORS */ +/* BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR */ +/* ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR */ +/* PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER */ +/* TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE */ +/* OR PERFORMANCE OF THIS SOFTWARE. */ +/* */ +/*********************************************************************/ +/* */ +/* Author : Tomoki Toda (tomoki@ics.nitech.ac.jp) */ +/* Date : June 2004 */ +/* */ +/* Modified as a single file for inclusion in festival/flite */ +/* May 2008 awb@cs.cmu.edu */ +/*-------------------------------------------------------------------*/ +/* */ +/* ML-Based Parameter Generation */ +/* */ +/*-------------------------------------------------------------------*/ + +#include "festival.h" +#include "simple_mlpg.h" +#define mlpg_alloc(X,Y) (walloc(Y,X)) +#define mlpg_free wfree +#define cst_errmsg EST_error +#define cst_error() + +#ifdef CMUFLITE +#include "cst_alloc.h" +#include "cst_string.h" +#include "cst_math.h" +#include "cst_vc.h" +#include "cst_track.h" +#include "cst_wave.h" +#include "cst_mlpg.h" + +#define mlpg_alloc(X,Y) (cst_alloc(Y,X)) +#define mlpg_free cst_free +#endif + +static MLPGPARA xmlpgpara_init(int dim, int dim2, int dnum, int clsnum); +static void xmlpgparafree(MLPGPARA param); +static double get_like_pdfseq_vit(int dim, int dim2, int dnum, int clsnum, + MLPGPARA param, + EST_Track *model, + XBOOL dia_flag); +static void get_dltmat(DMATRIX mat, MLPG_DWin *dw, int dno, DMATRIX dmat); + +static double *dcalloc(int x, int xoff); +static double **ddcalloc(int x, int y, int xoff, int yoff); + +/***********************************/ +/* ML using Choleski decomposition */ +/***********************************/ +/* Diagonal Covariance Version */ +static void InitDWin(PStreamChol *pst, const float *dynwin, int fsize); +static void InitPStreamChol(PStreamChol *pst, const float *dynwin, int fsize, + int order, int T); +static void mlgparaChol(DMATRIX pdf, PStreamChol *pst, DMATRIX mlgp); +static void mlpgChol(PStreamChol *pst); +static void calc_R_and_r(PStreamChol *pst, const int m); +static void Choleski(PStreamChol *pst); +static void Choleski_forward(PStreamChol *pst); +static void Choleski_backward(PStreamChol *pst, const int m); +static double get_gauss_full(long clsidx, + DVECTOR vec, // [dim] + DVECTOR detvec, // [clsnum] + DMATRIX weightmat, // [clsnum][1] + DMATRIX meanvec, // [clsnum][dim] + DMATRIX invcovmat); // [clsnum * dim][dim] +static double get_gauss_dia(long clsidx, + DVECTOR vec, // [dim] + DVECTOR detvec, // [clsnum] + DMATRIX weightmat, // [clsnum][1] + DMATRIX meanmat, // [clsnum][dim] + DMATRIX invcovmat); // [clsnum][dim] +static double cal_xmcxmc(long clsidx, + DVECTOR x, + DMATRIX mm, // [num class][dim] + DMATRIX cm); // [num class * dim][dim] + +const float mlpg_dynwin[] = { -0.5, 0.0, 0.5 }; +#define mlpg_dynwinsize 3 + +static MLPGPARA xmlpgpara_init(int dim, int dim2, int dnum, + int clsnum) +{ + MLPGPARA param; + + // memory allocation + param = mlpg_alloc(1,struct MLPGPARA_STRUCT); + param->ov = xdvalloc(dim); + param->iuv = NODATA; + param->iumv = NODATA; + param->flkv = xdvalloc(dnum); + param->stm = NODATA; + param->dltm = xdmalloc(dnum, dim2); + param->pdf = NODATA; + param->detvec = NODATA; + param->wght = xdmalloc(clsnum, 1); + param->mean = xdmalloc(clsnum, dim); + param->cov = NODATA; + param->clsidxv = NODATA; + /* dia_flag */ + param->clsdetv = xdvalloc(1); + param->clscov = xdmalloc(1, dim); + + param->vdet = 1.0; + param->vm = NODATA; + param->vv = NODATA; + param->var = NODATA; + + return param; +} + +static void xmlpgparafree(MLPGPARA param) +{ + if (param != NODATA) { + if (param->ov != NODATA) xdvfree(param->ov); + if (param->iuv != NODATA) xdvfree(param->iuv); + if (param->iumv != NODATA) xdvfree(param->iumv); + if (param->flkv != NODATA) xdvfree(param->flkv); + if (param->stm != NODATA) xdmfree(param->stm); + if (param->dltm != NODATA) xdmfree(param->dltm); + if (param->pdf != NODATA) xdmfree(param->pdf); + if (param->detvec != NODATA) xdvfree(param->detvec); + if (param->wght != NODATA) xdmfree(param->wght); + if (param->mean != NODATA) xdmfree(param->mean); + if (param->cov != NODATA) xdmfree(param->cov); + if (param->clsidxv != NODATA) xlvfree(param->clsidxv); + if (param->clsdetv != NODATA) xdvfree(param->clsdetv); + if (param->clscov != NODATA) xdmfree(param->clscov); + if (param->vm != NODATA) xdvfree(param->vm); + if (param->vv != NODATA) xdvfree(param->vv); + if (param->var != NODATA) xdvfree(param->var); + mlpg_free(param); + } + + return; +} + +static double get_like_pdfseq_vit(int dim, int dim2, int dnum, int clsnum, + MLPGPARA param, + EST_Track *model, + XBOOL dia_flag) +{ + long d, c, k, l, j; + double sumgauss; + double like = 0.0; + + for (d = 0, like = 0.0; d < dnum; d++) { + // read weight and mean sequences + param->wght->data[0][0] = 0.9; /* FIXME weights */ + for (j=0; jmean->data[0][j] = model->a((int)d,(int)((j+1)*2)); + + // observation vector + for (k = 0; k < dim2; k++) { + param->ov->data[k] = param->stm->data[d][k]; + param->ov->data[k + dim2] = param->dltm->data[d][k]; + } + + // mixture index + c = d; + param->clsdetv->data[0] = param->detvec->data[c]; + + // calculating likelihood + if (dia_flag == XTRUE) { + for (k = 0; k < param->clscov->col; k++) + param->clscov->data[0][k] = param->cov->data[c][k]; + sumgauss = get_gauss_dia(0, param->ov, param->clsdetv, + param->wght, param->mean, param->clscov); + } else { + for (k = 0; k < param->clscov->row; k++) + for (l = 0; l < param->clscov->col; l++) + param->clscov->data[k][l] = + param->cov->data[k + param->clscov->row * c][l]; + sumgauss = get_gauss_full(0, param->ov, param->clsdetv, + param->wght, param->mean, param->clscov); + } + if (sumgauss <= 0.0) param->flkv->data[d] = -1.0 * INFTY2; + else param->flkv->data[d] = log(sumgauss); + like += param->flkv->data[d]; + + // estimating U', U'*M + if (dia_flag == XTRUE) { + // PDF [U'*M U'] + for (k = 0; k < dim; k++) { + param->pdf->data[d][k] = + param->clscov->data[0][k] * param->mean->data[0][k]; + param->pdf->data[d][k + dim] = param->clscov->data[0][k]; + } + } else { + // PDF [U'*M U'] + for (k = 0; k < dim; k++) { + param->pdf->data[d][k] = 0.0; + for (l = 0; l < dim; l++) { + param->pdf->data[d][k * dim + dim + l] = + param->clscov->data[k][l]; + param->pdf->data[d][k] += + param->clscov->data[k][l] * param->mean->data[0][l]; + } + } + } + } + + like /= (double)dnum; + + return like; +} + +#if 0 +static double get_like_gv(long dim2, long dnum, MLPGPARA param) +{ + long k; + double av = 0.0, dif = 0.0; + double vlike = -INFTY; + + if (param->vm != NODATA && param->vv != NODATA) { + for (k = 0; k < dim2; k++) + calc_varstats(param->stm->data, k, dnum, &av, + &(param->var->data[k]), &dif); + vlike = log(get_gauss_dia5(param->vdet, 1.0, param->var, + param->vm, param->vv)); + } + + return vlike; +} + +static void sm_mvav(DMATRIX mat, long hlen) +{ + long k, l, m, p; + double d, sd; + DVECTOR vec = NODATA; + DVECTOR win = NODATA; + + vec = xdvalloc(mat->row); + + // smoothing window + win = xdvalloc(hlen * 2 + 1); + for (k = 0, d = 1.0, sd = 0.0; k < hlen; k++, d += 1.0) { + win->data[k] = d; win->data[win->length - k - 1] = d; + sd += d + d; + } + win->data[k] = d; sd += d; + for (k = 0; k < win->length; k++) win->data[k] /= sd; + + for (l = 0; l < mat->col; l++) { + for (k = 0; k < mat->row; k++) { + for (m = 0, vec->data[k] = 0.0; m < win->length; m++) { + p = k - hlen + m; + if (p >= 0 && p < mat->row) + vec->data[k] += mat->data[p][l] * win->data[m]; + } + } + for (k = 0; k < mat->row; k++) mat->data[k][l] = vec->data[k]; + } + + xdvfree(win); + xdvfree(vec); + + return; +} +#endif + +static void get_dltmat(DMATRIX mat, MLPG_DWin *dw, int dno, DMATRIX dmat) +{ + int i, j, k, tmpnum; + + tmpnum = (int)mat->row - dw->width[dno][WRIGHT]; + for (k = dw->width[dno][WRIGHT]; k < tmpnum; k++) // time index + for (i = 0; i < (int)mat->col; i++) // dimension index + for (j = dw->width[dno][WLEFT], dmat->data[k][i] = 0.0; + j <= dw->width[dno][WRIGHT]; j++) + dmat->data[k][i] += mat->data[k + j][i] * dw->coef[dno][j]; + + for (i = 0; i < (int)mat->col; i++) { // dimension index + for (k = 0; k < dw->width[dno][WRIGHT]; k++) // time index + for (j = dw->width[dno][WLEFT], dmat->data[k][i] = 0.0; + j <= dw->width[dno][WRIGHT]; j++) + if (k + j >= 0) + dmat->data[k][i] += mat->data[k + j][i] * dw->coef[dno][j]; + else + dmat->data[k][i] += (2.0 * mat->data[0][i] - mat->data[-k - j][i]) * dw->coef[dno][j]; + for (k = tmpnum; k < (int)mat->row; k++) // time index + for (j = dw->width[dno][WLEFT], dmat->data[k][i] = 0.0; + j <= dw->width[dno][WRIGHT]; j++) + if (k + j < (int)mat->row) + dmat->data[k][i] += mat->data[k + j][i] * dw->coef[dno][j]; + else + dmat->data[k][i] += (2.0 * mat->data[mat->row - 1][i] - mat->data[mat->row - k - j + mat->row - 2][i]) * dw->coef[dno][j]; + } + + return; +} + + +static double *dcalloc(int x, int xoff) +{ + double *ptr; + + ptr = mlpg_alloc(x,double); + /* ptr += xoff; */ /* Just not going to allow this */ + return(ptr); +} + +static double **ddcalloc(int x, int y, int xoff, int yoff) +{ + double **ptr; + register int i; + + ptr = mlpg_alloc(x,double *); + for (i = 0; i < x; i++) ptr[i] = dcalloc(y, yoff); + /* ptr += xoff; */ /* Just not going to allow this */ + return(ptr); +} + +///////////////////////////////////// +// ML using Choleski decomposition // +///////////////////////////////////// +static void InitDWin(PStreamChol *pst, const float *dynwin, int fsize) +{ + int i,j; + int leng; + + pst->dw.num = 1; // only static + if (dynwin) { + pst->dw.num = 2; // static + dyn + } + // memory allocation + pst->dw.width = mlpg_alloc(pst->dw.num,int *); + for (i = 0; i < pst->dw.num; i++) + pst->dw.width[i] = mlpg_alloc(2,int); + + pst->dw.coef = mlpg_alloc(pst->dw.num, double *); + pst->dw.coef_ptrs = mlpg_alloc(pst->dw.num, double *); + // window for static parameter WLEFT = 0, WRIGHT = 1 + pst->dw.width[0][WLEFT] = pst->dw.width[0][WRIGHT] = 0; + pst->dw.coef_ptrs[0] = mlpg_alloc(1,double); + pst->dw.coef[0] = pst->dw.coef_ptrs[0]; + pst->dw.coef[0][0] = 1.0; + + // set delta coefficients + for (i = 1; i < pst->dw.num; i++) { + pst->dw.coef_ptrs[i] = mlpg_alloc(fsize, double); + pst->dw.coef[i] = pst->dw.coef_ptrs[i]; + for (j=0; jdw.coef[i][j] = (double)dynwin[j]; + // set pointer + leng = fsize / 2; // L (fsize = 2 * L + 1) + pst->dw.coef[i] += leng; // [L] -> [0] center + pst->dw.width[i][WLEFT] = -leng; // -L left + pst->dw.width[i][WRIGHT] = leng; // L right + if (fsize % 2 == 0) pst->dw.width[i][WRIGHT]--; + } + + pst->dw.maxw[WLEFT] = pst->dw.maxw[WRIGHT] = 0; + for (i = 0; i < pst->dw.num; i++) { + if (pst->dw.maxw[WLEFT] > pst->dw.width[i][WLEFT]) + pst->dw.maxw[WLEFT] = pst->dw.width[i][WLEFT]; + if (pst->dw.maxw[WRIGHT] < pst->dw.width[i][WRIGHT]) + pst->dw.maxw[WRIGHT] = pst->dw.width[i][WRIGHT]; + } + + return; +} + +static void InitPStreamChol(PStreamChol *pst, const float *dynwin, int fsize, + int order, int T) +{ + // order of cepstrum + pst->order = order; + + // windows for dynamic feature + InitDWin(pst, dynwin, fsize); + + // dimension of observed vector + pst->vSize = (pst->order + 1) * pst->dw.num; // odim = dim * (1--3) + + // memory allocation + pst->T = T; // number of frames + pst->width = pst->dw.maxw[WRIGHT] * 2 + 1; // width of R + pst->mseq = ddcalloc(T, pst->vSize, 0, 0); // [T][odim] + pst->ivseq = ddcalloc(T, pst->vSize, 0, 0); // [T][odim] + pst->R = ddcalloc(T, pst->width, 0, 0); // [T][width] + pst->r = dcalloc(T, 0); // [T] + pst->g = dcalloc(T, 0); // [T] + pst->c = ddcalloc(T, pst->order + 1, 0, 0); // [T][dim] + + return; +} + +static void mlgparaChol(DMATRIX pdf, PStreamChol *pst, DMATRIX mlgp) +{ + int t, d; + + // error check + if (pst->vSize * 2 != pdf->col || pst->order + 1 != mlgp->col) { + cst_errmsg("Error mlgparaChol: Different dimension\n"); + cst_error(); + } + + // mseq: U^{-1}*M, ifvseq: U^{-1} + for (t = 0; t < pst->T; t++) { + for (d = 0; d < pst->vSize; d++) { + pst->mseq[t][d] = pdf->data[t][d]; + pst->ivseq[t][d] = pdf->data[t][pst->vSize + d]; + } + } + + // ML parameter generation + mlpgChol(pst); + + // extracting parameters + for (t = 0; t < pst->T; t++) + for (d = 0; d <= pst->order; d++) + mlgp->data[t][d] = pst->c[t][d]; + + return; +} + +// generate parameter sequence from pdf sequence using Choleski decomposition +static void mlpgChol(PStreamChol *pst) +{ + register int m; + + // generating parameter in each dimension + for (m = 0; m <= pst->order; m++) { + calc_R_and_r(pst, m); + Choleski(pst); + Choleski_forward(pst); + Choleski_backward(pst, m); + } + + return; +} + +//------ parameter generation fuctions +// calc_R_and_r: calculate R = W'U^{-1}W and r = W'U^{-1}M +static void calc_R_and_r(PStreamChol *pst, const int m) +{ + register int i, j, k, l, n; + double wu; + + for (i = 0; i < pst->T; i++) { + pst->r[i] = pst->mseq[i][m]; + pst->R[i][0] = pst->ivseq[i][m]; + + for (j = 1; j < pst->width; j++) pst->R[i][j] = 0.0; + + for (j = 1; j < pst->dw.num; j++) { + for (k = pst->dw.width[j][0]; k <= pst->dw.width[j][1]; k++) { + n = i + k; + if (n >= 0 && n < pst->T && pst->dw.coef[j][-k] != 0.0) { + l = j * (pst->order + 1) + m; + pst->r[i] += pst->dw.coef[j][-k] * pst->mseq[n][l]; + wu = pst->dw.coef[j][-k] * pst->ivseq[n][l]; + + for (l = 0; l < pst->width; l++) { + n = l-k; + if (n <= pst->dw.width[j][1] && i + l < pst->T && + pst->dw.coef[j][n] != 0.0) + pst->R[i][l] += wu * pst->dw.coef[j][n]; + } + } + } + } + } + + return; +} + +// Choleski: Choleski factorization of Matrix R +static void Choleski(PStreamChol *pst) +{ + register int t, j, k; + + pst->R[0][0] = sqrt(pst->R[0][0]); + + for (j = 1; j < pst->width; j++) pst->R[0][j] /= pst->R[0][0]; + + for (t = 1; t < pst->T; t++) { + for (j = 1; j < pst->width; j++) + if (t - j >= 0) + pst->R[t][0] -= pst->R[t - j][j] * pst->R[t - j][j]; + + pst->R[t][0] = sqrt(pst->R[t][0]); + + for (j = 1; j < pst->width; j++) { + for (k = 0; k < pst->dw.maxw[WRIGHT]; k++) + if (j != pst->width - 1) + pst->R[t][j] -= + pst->R[t - k - 1][j - k] * pst->R[t - k - 1][j + 1]; + + pst->R[t][j] /= pst->R[t][0]; + } + } + + return; +} + +// Choleski_forward: forward substitution to solve linear equations +static void Choleski_forward(PStreamChol *pst) +{ + register int t, j; + double hold; + + pst->g[0] = pst->r[0] / pst->R[0][0]; + + for (t=1; t < pst->T; t++) { + hold = 0.0; + for (j = 1; j < pst->width; j++) + if (t - j >= 0 && pst->R[t - j][j] != 0.0) + hold += pst->R[t - j][j] * pst->g[t - j]; + pst->g[t] = (pst->r[t] - hold) / pst->R[t][0]; + } + + return; +} + +// Choleski_backward: backward substitution to solve linear equations +static void Choleski_backward(PStreamChol *pst, const int m) +{ + register int t, j; + double hold; + + pst->c[pst->T - 1][m] = pst->g[pst->T - 1] / pst->R[pst->T - 1][0]; + + for (t = pst->T - 2; t >= 0; t--) { + hold = 0.0; + for (j = 1; j < pst->width; j++) + if (t + j < pst->T && pst->R[t][j] != 0.0) + hold += pst->R[t][j] * pst->c[t + j][m]; + pst->c[t][m] = (pst->g[t] - hold) / pst->R[t][0]; + } + + return; +} + +//////////////////////////////////// +// ML Considering Global Variance // +//////////////////////////////////// +#if 0 +static void varconv(double **c, const int m, const int T, const double var) +{ + register int n; + double sd, osd; + double oav = 0.0, ovar = 0.0, odif = 0.0; + + calc_varstats(c, m, T, &oav, &ovar, &odif); + osd = sqrt(ovar); sd = sqrt(var); + for (n = 0; n < T; n++) + c[n][m] = (c[n][m] - oav) / osd * sd + oav; + + return; +} + +static void calc_varstats(double **c, const int m, const int T, + double *av, double *var, double *dif) +{ + register int i; + register double d; + + *av = 0.0; + *var = 0.0; + *dif = 0.0; + for (i = 0; i < T; i++) *av += c[i][m]; + *av /= (double)T; + for (i = 0; i < T; i++) { + d = c[i][m] - *av; + *var += d * d; *dif += d; + } + *var /= (double)T; + + return; +} + +// Diagonal Covariance Version +static void mlgparaGrad(DMATRIX pdf, PStreamChol *pst, DMATRIX mlgp, const int max, + double th, double e, double alpha, DVECTOR vm, DVECTOR vv, + XBOOL nrmflag, XBOOL extvflag) +{ + int t, d; + + // error check + if (pst->vSize * 2 != pdf->col || pst->order + 1 != mlgp->col) { + cst_errmsg("Error mlgparaChol: Different dimension\n"); + cst_error(); + } + + // mseq: U^{-1}*M, ifvseq: U^{-1} + for (t = 0; t < pst->T; t++) { + for (d = 0; d < pst->vSize; d++) { + pst->mseq[t][d] = pdf->data[t][d]; + pst->ivseq[t][d] = pdf->data[t][pst->vSize + d]; + } + } + + // ML parameter generation + mlpgChol(pst); + + // extend variance + if (extvflag == XTRUE) + for (d = 0; d <= pst->order; d++) + varconv(pst->c, d, pst->T, vm->data[d]); + + // estimating parameters + mlpgGrad(pst, max, th, e, alpha, vm, vv, nrmflag); + + // extracting parameters + for (t = 0; t < pst->T; t++) + for (d = 0; d <= pst->order; d++) + mlgp->data[t][d] = pst->c[t][d]; + + return; +} + +// generate parameter sequence from pdf sequence using gradient +static void mlpgGrad(PStreamChol *pst, const int max, double th, double e, + double alpha, DVECTOR vm, DVECTOR vv, XBOOL nrmflag) +{ + register int m, i, t; + double diff, n, dth; + + if (nrmflag == XTRUE) + n = (double)(pst->T * pst->vSize) / (double)(vm->length); + else n = 1.0; + + // generating parameter in each dimension + for (m = 0; m <= pst->order; m++) { + calc_R_and_r(pst, m); + dth = th * sqrt(vm->data[m]); + for (i = 0; i < max; i++) { + calc_grad(pst, m); + if (vm != NODATA && vv != NODATA) + calc_vargrad(pst, m, alpha, n, vm->data[m], vv->data[m]); + for (t = 0, diff = 0.0; t < pst->T; t++) { + diff += pst->g[t] * pst->g[t]; + pst->c[t][m] += e * pst->g[t]; + } + diff = sqrt(diff / (double)pst->T); + if (diff < dth || diff == 0.0) break; + } + } + + return; +} + +// calc_grad: calculate -RX + r = -W'U^{-1}W * X + W'U^{-1}M +void calc_grad(PStreamChol *pst, const int m) +{ + register int i, j; + + for (i = 0; i < pst->T; i++) { + pst->g[i] = pst->r[i] - pst->c[i][m] * pst->R[i][0]; + for (j = 1; j < pst->width; j++) { + if (i + j < pst->T) pst->g[i] -= pst->c[i + j][m] * pst->R[i][j]; + if (i - j >= 0) pst->g[i] -= pst->c[i - j][m] * pst->R[i - j][j]; + } + } + + return; +} + +static void calc_vargrad(PStreamChol *pst, const int m, double alpha, double n, + double vm, double vv) +{ + register int i; + double vg, w1, w2; + double av = 0.0, var = 0.0, dif = 0.0; + + if (alpha > 1.0 || alpha < 0.0) { + w1 = 1.0; w2 = 1.0; + } else { + w1 = alpha; w2 = 1.0 - alpha; + } + + calc_varstats(pst->c, m, pst->T, &av, &var, &dif); + + for (i = 0; i < pst->T; i++) { + vg = -(var - vm) * (pst->c[i][m] - av) * vv * 2.0 / (double)pst->T; + pst->g[i] = w1 * pst->g[i] / n + w2 * vg; + } + + return; +} +#endif + +// diagonal covariance +static DVECTOR xget_detvec_diamat2inv(DMATRIX covmat) // [num class][dim] +{ + long dim, clsnum; + long i, j; + double det; + DVECTOR detvec = NODATA; + + clsnum = covmat->row; + dim = covmat->col; + // memory allocation + detvec = xdvalloc(clsnum); + for (i = 0; i < clsnum; i++) { + for (j = 0, det = 1.0; j < dim; j++) { + det *= covmat->data[i][j]; + if (det > 0.0) { + covmat->data[i][j] = 1.0 / covmat->data[i][j]; + } else { + cst_errmsg("error:(class %ld) determinant <= 0, det = %f\n", i, det); + xdvfree(detvec); + return NODATA; + } + } + detvec->data[i] = det; + } + + return detvec; +} + +static double get_gauss_full(long clsidx, + DVECTOR vec, // [dim] + DVECTOR detvec, // [clsnum] + DMATRIX weightmat, // [clsnum][1] + DMATRIX meanvec, // [clsnum][dim] + DMATRIX invcovmat) // [clsnum * dim][dim] +{ + double gauss; + + if (detvec->data[clsidx] <= 0.0) { + cst_errmsg("#error: det <= 0.0\n"); + cst_error(); + } + + gauss = weightmat->data[clsidx][0] + / sqrt(pow(2.0 * PI, (double)vec->length) * detvec->data[clsidx]) + * exp(-1.0 * cal_xmcxmc(clsidx, vec, meanvec, invcovmat) / 2.0); + + return gauss; +} + +static double cal_xmcxmc(long clsidx, + DVECTOR x, + DMATRIX mm, // [num class][dim] + DMATRIX cm) // [num class * dim][dim] +{ + long clsnum, k, l, b, dim; + double *vec = NULL; + double td, d; + + dim = x->length; + clsnum = mm->row; + b = clsidx * dim; + if (mm->col != dim || cm->col != dim || clsnum * dim != cm->row) { + cst_errmsg("Error cal_xmcxmc: different dimension\n"); + cst_error(); + } + + // memory allocation + vec = mlpg_alloc((int)dim, double); + for (k = 0; k < dim; k++) vec[k] = x->data[k] - mm->data[clsidx][k]; + for (k = 0, d = 0.0; k < dim; k++) { + for (l = 0, td = 0.0; l < dim; l++) td += vec[l] * cm->data[l + b][k]; + d += td * vec[k]; + } + // memory free + mlpg_free(vec); vec = NULL; + + return d; +} + +#if 0 +// diagonal covariance +static double get_gauss_dia5(double det, + double weight, + DVECTOR vec, // dim + DVECTOR meanvec, // dim + DVECTOR invcovvec) // dim +{ + double gauss, sb; + long k; + + if (det <= 0.0) { + cst_errmsg("#error: det <= 0.0\n"); + cst_error(); + } + + for (k = 0, gauss = 0.0; k < vec->length; k++) { + sb = vec->data[k] - meanvec->data[k]; + gauss += sb * invcovvec->data[k] * sb; + } + + gauss = weight / sqrt(pow(2.0 * PI, (double)vec->length) * det) + * exp(-gauss / 2.0); + + return gauss; +} +#endif + +static double get_gauss_dia(long clsidx, + DVECTOR vec, // [dim] + DVECTOR detvec, // [clsnum] + DMATRIX weightmat, // [clsnum][1] + DMATRIX meanmat, // [clsnum][dim] + DMATRIX invcovmat) // [clsnum][dim] +{ + double gauss, sb; + long k; + + if (detvec->data[clsidx] <= 0.0) { + cst_errmsg("#error: det <= 0.0\n"); + cst_error(); + } + + for (k = 0, gauss = 0.0; k < vec->length; k++) { + sb = vec->data[k] - meanmat->data[clsidx][k]; + gauss += sb * invcovmat->data[clsidx][k] * sb; + } + + gauss = weightmat->data[clsidx][0] + / sqrt(pow(2.0 * PI, (double)vec->length) * detvec->data[clsidx]) + * exp(-gauss / 2.0); + + return gauss; +} + +static void pst_free(PStreamChol *pst) +{ + int i; + + for (i=0; idw.num; i++) + mlpg_free(pst->dw.width[i]); + mlpg_free(pst->dw.width); pst->dw.width = NULL; + for (i=0; idw.num; i++) + mlpg_free(pst->dw.coef_ptrs[i]); + mlpg_free(pst->dw.coef); pst->dw.coef = NULL; + mlpg_free(pst->dw.coef_ptrs); pst->dw.coef_ptrs = NULL; + + for (i=0; iT; i++) + mlpg_free(pst->mseq[i]); + mlpg_free(pst->mseq); + for (i=0; iT; i++) + mlpg_free(pst->ivseq[i]); + mlpg_free(pst->ivseq); + for (i=0; iT; i++) + mlpg_free(pst->R[i]); + mlpg_free(pst->R); + mlpg_free(pst->r); + mlpg_free(pst->g); + for (i=0; iT; i++) + mlpg_free(pst->c[i]); + mlpg_free(pst->c); + + return; +} + +LISP mlpg(LISP ltrack) +{ + /* Generate an (mcep) track using Maximum Likelihood Parameter Generation */ + MLPGPARA param = NODATA; + EST_Track *param_track, *out; + int dim, dim_st; + float like; + int i,j; + int nframes; + PStreamChol pst; + + if ((ltrack == NULL) || + (TYPEP(ltrack,tc_string) && + (streq(get_c_string(ltrack),"nil")))) + return NIL; + + param_track = track(ltrack); + + nframes = param_track->num_frames(); + dim = (param_track->num_channels()/2)-1; + dim_st = dim/2; /* dim2 in original code */ + out = new EST_Track(); + out->copy_setup(*param_track); + out->resize(nframes,dim_st+1); + + param = xmlpgpara_init(dim,dim_st,nframes,nframes); + + // mixture-index sequence + param->clsidxv = xlvalloc(nframes); + for (i=0; iclsidxv->data[i] = i; + + // initial static feature sequence + param->stm = xdmalloc(nframes,dim_st); + for (i=0; istm->data[i][j] = param_track->a(i,(j+1)*2); + } + + /* Load cluster means */ + for (i=0; imean->data[i][j] = param_track->a(i,(j+1)*2); + + /* GMM parameters diagonal covariance */ + InitPStreamChol(&pst, mlpg_dynwin, mlpg_dynwinsize, dim_st-1, nframes); + param->pdf = xdmalloc(nframes,dim*2); + param->cov = xdmalloc(nframes,dim); + for (i=0; icov->data[i][j] = + param_track->a(i,(j+1)*2+1) * + param_track->a(i,(j+1)*2+1); + param->detvec = xget_detvec_diamat2inv(param->cov); + + /* global variance parameters */ + /* TBD get_gv_mlpgpara(param, vmfile, vvfile, dim2, msg_flag); */ + + if (nframes > 0) + { + get_dltmat(param->stm, &pst.dw, 1, param->dltm); + + like = get_like_pdfseq_vit(dim, dim_st, nframes, nframes, param, + param_track, XTRUE); + + /* vlike = get_like_gv(dim2, dnum, param); */ + + mlgparaChol(param->pdf, &pst, param->stm); + } + + /* Put the answer back into the output track */ + for (i=0; it(i) = param_track->t(i); + out->a(i,0) = param_track->a(i,0); /* F0 */ + for (j=0; ja(i,j+1) = param->stm->data[i][j]; + } + } + + // memory free + xmlpgparafree(param); + pst_free(&pst); + + return siod(out); +} + diff --git a/src/modules/clustergen/simple_mlpg.h b/src/modules/clustergen/simple_mlpg.h new file mode 100644 index 0000000..a1faece --- /dev/null +++ b/src/modules/clustergen/simple_mlpg.h @@ -0,0 +1,204 @@ +/* --------------------------------------------------------------- */ +/* The HMM-Based Speech Synthesis System (HTS): version 1.1.1 */ +/* HTS Working Group */ +/* */ +/* Department of Computer Science */ +/* Nagoya Institute of Technology */ +/* and */ +/* Interdisciplinary Graduate School of Science and Engineering */ +/* Tokyo Institute of Technology */ +/* Copyright (c) 2001-2003 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and */ +/* distribute this software and its documentation without */ +/* restriction, including without limitation the rights to use, */ +/* copy, modify, merge, publish, distribute, sublicense, and/or */ +/* sell copies of this work, and to permit persons to whom this */ +/* work is furnished to do so, subject to the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list */ +/* of conditions and the following disclaimer. */ +/* */ +/* 2. Any modifications must be clearly marked as such. */ +/* */ +/* NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSITITUTE OF TECHNOLOGY, */ +/* HTS WORKING GROUP, AND THE CONTRIBUTORS TO THIS WORK DISCLAIM */ +/* ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL */ +/* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSITITUTE OF */ +/* TECHNOLOGY, HTS WORKING GROUP, NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY */ +/* DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, */ +/* WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS */ +/* ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR */ +/* PERFORMANCE OF THIS SOFTWARE. */ +/* */ +/* --------------------------------------------------------------- */ +/* mlpg.c : speech parameter generation from pdf sequence */ +/* */ +/* 2003/12/26 by Heiga Zen */ +/* --------------------------------------------------------------- */ + +/*********************************************************************/ +/* */ +/* Nagoya Institute of Technology, Aichi, Japan, */ +/* and */ +/* Carnegie Mellon University, Pittsburgh, PA */ +/* Copyright (c) 2003-2004 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and */ +/* distribute this software and its documentation without */ +/* restriction, including without limitation the rights to use, */ +/* copy, modify, merge, publish, distribute, sublicense, and/or */ +/* sell copies of this work, and to permit persons to whom this */ +/* work is furnished to do so, subject to the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list */ +/* of conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* */ +/* NAGOYA INSTITUTE OF TECHNOLOGY, CARNEGIE MELLON UNIVERSITY, AND */ +/* THE CONTRIBUTORS TO THIS WORK DISCLAIM ALL WARRANTIES WITH */ +/* REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF */ +/* MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL NAGOYA INSTITUTE */ +/* OF TECHNOLOGY, CARNEGIE MELLON UNIVERSITY, NOR THE CONTRIBUTORS */ +/* BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR */ +/* ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR */ +/* PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER */ +/* TORTUOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE */ +/* OR PERFORMANCE OF THIS SOFTWARE. */ +/* */ +/*********************************************************************/ +/* */ +/* ML-Based Parameter Generation */ +/* 2003/12/26 by Heiga Zen */ +/* */ +/* Basic functions are extracted from HTS and */ +/* modified by Tomoki Toda (tomoki@ics.nitech.ac.jp) */ +/* June 2004 */ +/* Integrate as a Voice Conversion module */ +/* */ +/*-------------------------------------------------------------------*/ +/* */ +/* Author : Tomoki Toda (tomoki@ics.nitech.ac.jp) */ +/* Date : June 2004 */ +/* */ +/*-------------------------------------------------------------------*/ +/* Integrated into a single C file for festival/flite integration */ +/* May 2008 awb@cs.cmu.edu */ +/*-------------------------------------------------------------------*/ + +#ifndef _MLPG_H +#define _MLPG_H + +#include "vc.h" + +#define INFTY ((double) 1.0e+38) +#define INFTY2 ((double) 1.0e+19) +#define INVINF ((double) 1.0e-38) +#define INVINF2 ((double) 1.0e-19) + +#ifdef PI +/* ok */ +#elif defined(M_PI) +#define PI M_PI +#else +#define PI 3.1415926535897932385 +#endif + +#define WLEFT 0 +#define WRIGHT 1 + +typedef struct _DWin { + int num; /* number of static + deltas */ + int **width; /* width [0..num-1][0(left) 1(right)] */ + double **coef; /* coefficient [0..num-1][length[0]..length[1]] */ + double **coef_ptrs; /* keeps the pointers so we can free them */ + int maxw[2]; /* max width [0(left) 1(right)] */ +} MLPG_DWin; + +typedef struct _PStreamChol { + int vSize; // size of ovserved vector + int order; // order of cepstrum + int T; // number of frames + int width; // width of WSW + MLPG_DWin dw; + double **mseq; // sequence of mean vector + double **ivseq; // sequence of invarsed covariance vector + double ***ifvseq; // sequence of invarsed full covariance vector + double **R; // WSW[T][range] + double *r; // WSM [T] + double *g; // g [T] + double **c; // parameter c +} PStreamChol; + +typedef struct MLPGPARA_STRUCT { + DVECTOR ov; + DVECTOR iuv; + DVECTOR iumv; + DVECTOR flkv; + DMATRIX stm; + DMATRIX dltm; + DMATRIX pdf; + DVECTOR detvec; + DMATRIX wght; + DMATRIX mean; + DMATRIX cov; + LVECTOR clsidxv; + DVECTOR clsdetv; + DMATRIX clscov; + double vdet; + DVECTOR vm; + DVECTOR vv; + DVECTOR var; +} *MLPGPARA; + +#if 0 +static double get_like_gv(long dim2, long dnum, MLPGPARA param); +static void sm_mvav(DMATRIX mat, long hlen); +#endif +#if 0 +/* Full Covariance Version */ +static void InitPStreamCholFC(PStreamChol *pst, char *dynwinf, char *accwinf, + int order, int T); +static void mlgparaCholFC(DMATRIX pdf, PStreamChol *pst, DMATRIX mlgp); +static void mlpgCholFC(PStreamChol *pst); +static void calc_R_and_r_FC(PStreamChol *pst); +static void CholeskiFC(PStreamChol *pst); +static void Choleski_forwardFC(PStreamChol *pst); +static void Choleski_backwardFC(PStreamChol *pst); +#endif + +/**********************************/ +/* ML Considering Global Variance */ +/**********************************/ +#if 0 +static void varconv(double **c, const int m, const int T, const double var); +static void calc_varstats(double **c, const int m, const int T, + double *av, double *var, double *dif); +/* Diagonal Covariance Version */ +static void mlgparaGrad(DMATRIX pdf, PStreamChol *pst, DMATRIX mlgp, + const int max, double th, double e, double alpha, + DVECTOR vm, DVECTOR vv, XBOOL nrmflag, XBOOL extvflag); +static void mlpgGrad(PStreamChol *pst, const int max, double th, double e, + double alpha, DVECTOR vm, DVECTOR vv, XBOOL nrmflag); +static void calc_grad(PStreamChol *pst, const int m); +static void calc_vargrad(PStreamChol *pst, const int m, double alpha, double n, + double vm, double vv); +static double get_gauss_dia5(double det, + double weight, + DVECTOR vec, // dim + DVECTOR meanvec, // dim + DVECTOR invcovvec); // dim +#endif + + +#if 0 +static void get_gv_mlpgpara(MLPGPARA param, char *vmfile, char *vvfile, + long dim2, XBOOL msg_flag); +#endif + +#endif /* _MLPG_H */ diff --git a/src/modules/clustergen/vc.cc b/src/modules/clustergen/vc.cc new file mode 100644 index 0000000..cde0f49 --- /dev/null +++ b/src/modules/clustergen/vc.cc @@ -0,0 +1,276 @@ +/* --------------------------------------------------------------- */ +/* The HMM-Based Speech Synthesis System (HTS): version 1.1b */ +/* HTS Working Group */ +/* */ +/* Department of Computer Science */ +/* Nagoya Institute of Technology */ +/* and */ +/* Interdisciplinary Graduate School of Science and Engineering */ +/* Tokyo Institute of Technology */ +/* Copyright (c) 2001-2003 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and */ +/* distribute this software and its documentation without */ +/* restriction, including without limitation the rights to use, */ +/* copy, modify, merge, publish, distribute, sublicense, and/or */ +/* sell copies of this work, and to permit persons to whom this */ +/* work is furnished to do so, subject to the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list */ +/* of conditions and the following disclaimer. */ +/* */ +/* 2. Any modifications must be clearly marked as such. */ +/* */ +/* NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSITITUTE OF TECHNOLOGY, */ +/* HTS WORKING GROUP, AND THE CONTRIBUTORS TO THIS WORK DISCLAIM */ +/* ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL */ +/* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSITITUTE OF */ +/* TECHNOLOGY, HTS WORKING GROUP, NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY */ +/* DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, */ +/* WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTUOUS */ +/* ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR */ +/* PERFORMANCE OF THIS SOFTWARE. */ +/* */ +/* --------------------------------------------------------------- */ +/* This is Zen's MLSA filter as ported by Toda to festvox vc */ +/* and back ported into hts/festival so we can do MLSA filtering */ +/* If I took more time I could probably make this use the same as */ +/* as the other code in this directory -- awb@cs.cmu.edu 03JAN06 */ +/* --------------------------------------------------------------- */ +/* and then ported into Flite (November 2007 awb@cs.cmu.edu) */ + +/*********************************************************************/ +/* */ +/* vector (etc) code common to mlpg and mlsa */ +/*-------------------------------------------------------------------*/ + +#include "festival.h" +#include "vc.h" + +/* from vector.cc */ + +#define cst_free wfree +#define cst_alloc walloc + +LVECTOR xlvalloc(long length) +{ + LVECTOR x; + + length = MAX(length, 0); + x = cst_alloc(struct LVECTOR_STRUCT,1); + x->data = cst_alloc(long,MAX(length, 1)); + x->imag = NULL; + x->length = length; + + return x; +} + +void xlvfree(LVECTOR x) +{ + if (x != NULL) { + if (x->data != NULL) { + cst_free(x->data); + } + if (x->imag != NULL) { + cst_free(x->imag); + } + cst_free(x); + } + + return; +} + +DVECTOR xdvalloc(long length) +{ + DVECTOR x; + + length = MAX(length, 0); + x = cst_alloc(struct DVECTOR_STRUCT,1); + x->data = cst_alloc(double,MAX(length, 1)); + x->imag = NULL; + x->length = length; + + return x; +} + +void xdvfree(DVECTOR x) +{ + if (x != NULL) { + if (x->data != NULL) { + cst_free(x->data); + } + if (x->imag != NULL) { + cst_free(x->imag); + } + cst_free(x); + } + + return; +} + +void dvialloc(DVECTOR x) +{ + if (x->imag != NULL) { + cst_free(x->imag); + } + x->imag = cst_alloc(double,x->length); + + return; +} + +DVECTOR xdvcut(DVECTOR x, long offset, long length) +{ + long k; + long pos; + DVECTOR y; + + y = xdvalloc(length); + if (x->imag != NULL) { + dvialloc(y); + } + + for (k = 0; k < y->length; k++) { + pos = k + offset; + if (pos >= 0 && pos < x->length) { + y->data[k] = x->data[pos]; + if (y->imag != NULL) { + y->imag[k] = x->imag[pos]; + } + } else { + y->data[k] = 0.0; + if (y->imag != NULL) { + y->imag[k] = 0.0; + } + } + } + + return y; +} + +DMATRIX xdmalloc(long row, long col) +{ + DMATRIX matrix; + int i; + + matrix = cst_alloc(struct DMATRIX_STRUCT,1); + matrix->data = cst_alloc(double *,row); + for (i=0; idata[i] = cst_alloc(double,col); + matrix->imag = NULL; + matrix->row = row; + matrix->col = col; + + return matrix; +} + +void xdmfree(DMATRIX matrix) +{ + int i; + + if (matrix != NULL) { + if (matrix->data != NULL) { + for (i=0; irow; i++) + cst_free(matrix->data[i]); + cst_free(matrix->data); + } + if (matrix->imag != NULL) { + for (i=0; irow; i++) + cst_free(matrix->imag[i]); + cst_free(matrix->imag); + } + cst_free(matrix); + } + + return; +} + +DVECTOR xdvinit(double j, double incr, double n) +{ + long k; + long num; + DVECTOR x; + + if ((incr > 0.0 && j > n) || (incr < 0.0 && j < n)) { + x = xdvnull(); + return x; + } + if (incr == 0.0) { + num = (long)n; + if (num <= 0) { + x = xdvnull(); + return x; + } + } else { + num = LABS((long)((n - j) / incr)) + 1; + } + + /* memory allocate */ + x = xdvalloc(num); + + /* initailize data */ + for (k = 0; k < x->length; k++) { + x->data[k] = j + (k * incr); + } + + return x; +} + +/* from voperate.cc */ +double dvmax(DVECTOR x, long *index) +{ + long k; + long ind; + double max; + + ind = 0; + max = x->data[ind]; + for (k = 1; k < x->length; k++) { + if (max < x->data[k]) { + ind = k; + max = x->data[k]; + } + } + + if (index != NULL) { + *index = ind; + } + + return max; +} + +double dvmin(DVECTOR x, long *index) +{ + long k; + long ind; + double min; + + ind = 0; + min = x->data[ind]; + for (k = 1; k < x->length; k++) { + if (min > x->data[k]) { + ind = k; + min = x->data[k]; + } + } + + if (index != NULL) { + *index = ind; + } + + return min; +} + +double dvsum(DVECTOR x) +{ + long k; + double sum; + + for (k = 0, sum = 0.0; k < x->length; k++) { + sum += x->data[k]; + } + + return sum; +} diff --git a/src/modules/clustergen/vc.h b/src/modules/clustergen/vc.h new file mode 100644 index 0000000..4796427 --- /dev/null +++ b/src/modules/clustergen/vc.h @@ -0,0 +1,101 @@ +/*********************************************************************/ +/* */ +/* Nagoya Institute of Technology, Aichi, Japan, */ +/* Nara Institute of Science and Technology, Nara, Japan */ +/* and */ +/* Carnegie Mellon University, Pittsburgh, PA */ +/* Copyright (c) 2003-2004 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and */ +/* distribute this software and its documentation without */ +/* restriction, including without limitation the rights to use, */ +/* copy, modify, merge, publish, distribute, sublicense, and/or */ +/* sell copies of this work, and to permit persons to whom this */ +/* work is furnished to do so, subject to the following conditions: */ +/* */ +/* 1. The code must retain the above copyright notice, this list */ +/* of conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* */ +/* NAGOYA INSTITUTE OF TECHNOLOGY, NARA INSTITUTE OF SCIENCE AND */ +/* TECHNOLOGY, CARNEGIE MELLON UNIVERSITY, AND THE CONTRIBUTORS TO */ +/* THIS WORK DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, */ +/* INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, */ +/* IN NO EVENT SHALL NAGOYA INSTITUTE OF TECHNOLOGY, NARA */ +/* INSTITUTE OF SCIENCE AND TECHNOLOGY, CARNEGIE MELLON UNIVERSITY, */ +/* NOR THE CONTRIBUTORS BE LIABLE FOR ANY SPECIAL, INDIRECT OR */ +/* CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM */ +/* LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, */ +/* NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN */ +/* CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ +/* */ +/*********************************************************************/ +/* */ +/* Author : Tomoki Toda (tomoki@ics.nitech.ac.jp) */ +/* Date : June 2004 */ +/* */ +/* Functions shared between mlpg and mlsa */ +/*-------------------------------------------------------------------*/ + +#ifndef __CST_VC_H +#define __CST_VC_H + +typedef struct LVECTOR_STRUCT { + long length; + long *data; + long *imag; +} *LVECTOR; + +typedef struct DVECTOR_STRUCT { + long length; + double *data; + double *imag; +} *DVECTOR; + +typedef struct DMATRIX_STRUCT { + long row; + long col; + double **data; + double **imag; +} *DMATRIX; + +#define XBOOL int +#define XTRUE 1 +#define XFALSE 0 + +#define NODATA NULL + +#define FABS(x) ((x) >= 0.0 ? (x) : -(x)) +#define LABS(x) ((x) >= 0 ? (x) : -(x)) +#define MAX(a, b) ((a) > (b) ? (a) : (b)) + +#define xdvnull() xdvalloc(0) + +#define xdvnums(length, value) xdvinit((double)(value), 0.0, (double)(length)) +#define xdvzeros(length) xdvnums(length, 0.0) + +LVECTOR xlvalloc(long length); +void xlvfree(LVECTOR x); +DVECTOR xdvalloc(long length); +DVECTOR xdvcut(DVECTOR x, long offset, long length); +void xdvfree(DVECTOR vector); +double dvmax(DVECTOR x, long *index); +double dvmin(DVECTOR x, long *index); +DMATRIX xdmalloc(long row, long col); +void xdmfree(DMATRIX matrix); +DVECTOR xdvinit(double j, double incr, double n); + +double dvsum(DVECTOR x); + +#define RANDMAX 32767 +#define B0 0x00000001 +#define B28 0x10000000 +#define B31 0x80000000 +#define B31_ 0x7fffffff +#define Z 0x00000000 + +typedef enum {MFALSE, MTRUE} Boolean; + +#endif /* __CST_VC_H */ diff --git a/src/modules/diphone/Makefile b/src/modules/diphone/Makefile new file mode 100644 index 0000000..964297a --- /dev/null +++ b/src/modules/diphone/Makefile @@ -0,0 +1,58 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## A new clean implementation of diphone selection and concatenation ## +## code supporting the CSTR standard diphone db format ## +## Mostly written by Alistair Conkie ## +########################################################################### +TOP=../../.. +DIRNAME=src/modules/diphone +H = diphone.h +#EXTRASRCS = di_psolaTM.cc +SRCS = diphone.cc di_reslpc.cc di_psola.cc di_select.cc \ + di_pitch.cc di_io.cc oc.cc +OBJS = $(SRCS:.cc=.o) + +FILES=Makefile $(SRCS) $(H) $(EXTRASRCS) + +LOCAL_INCLUDES = -I../include +LOCAL_DEFINES = $(MODULE_DIPHONE_DEFINES) + +INLIB = $(TOP)/src/lib/libFestival.a + +ALL = .buildlib + +include $(TOP)/config/common_make_rules + + + + diff --git a/src/modules/diphone/di_io.cc b/src/modules/diphone/di_io.cc new file mode 100644 index 0000000..8b53ccd --- /dev/null +++ b/src/modules/diphone/di_io.cc @@ -0,0 +1,1097 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alistair Conkie */ +/* Date : August 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* The groupfile stuff is all awb's fault. */ +/* */ +/*************************************************************************/ +#include +#include "EST_unix.h" +#include +#include "festival.h" +#include "diphone.h" + +static unsigned int DIPHONE_MAGIC=0x46544449; /* FTDI */ + +static void load_index(DIPHONE_DATABASE *database); +static void load_diphs(DIPHONE_DATABASE *database); +static void load_lpc_file(DIPHONE_DATABASE *db,int diph,int mode); +static void lpc2ref(const float *lpc, float *rfc, int order); +static void extract_lpc_frames(DIPHONE_DATABASE *db, int diph, EST_Track &lpc); +static void load_signal_file(DIPHONE_DATABASE *db, int i, int mode); +static void database_malloc(int ndiphs, DIPHONE_DATABASE *database); +static void di_group_load_signal(DIPHONE_DATABASE *db); +static void di_group_load_lpc_params(DIPHONE_DATABASE *db); +static void di_group_load_pm(DIPHONE_DATABASE *db); + +void di_load_database(DIPHONE_DATABASE *database) +{ + // Load the ungrouped form + database_malloc(database->ndiphs,database); + + load_index(database); + load_diphs(database); + +} + +static void database_malloc(int ndiphs, DIPHONE_DATABASE *database) +{ + // So why am I not using all those cute C++ classes ? + // well I suppose I just don't know enough about binary loading + // and saving to trust them, but that's a poor excuse. + int i; + + database->nindex = 0; + database->zone = 0; + + database->indx = walloc(DI_INDEX *,ndiphs); + database->vox = walloc(DI_VOX *,ndiphs); + database->pm = walloc(DI_PM *,ndiphs); + database->lpc = walloc(DI_LPC *,ndiphs); + + for(i=0;iindx[i] = walloc(DI_INDEX,1); + database->vox[i] = walloc(DI_VOX,1); + database->vox[i]->signal = 0; + database->pm[i] = walloc(DI_PM,1); + database->pm[i]->mark = 0; + database->lpc[i] = walloc(DI_LPC,1); + database->lpc[i]->f = 0; + } + +} + +static void load_index(DIPHONE_DATABASE *database) +{ + EST_TokenStream ts; + int i; + EST_String line; + + if (ts.open(database->index_file) == -1) + { + cerr << "Diphone: Can't open file " << database->index_file << endl; + festival_error(); + } + + for (i=0; (!ts.eof()) && (indiphs);) + { + line = ts.get_upto_eoln(); + if ((line.length() > 0) && (line[0] != ';')) + { + EST_TokenStream ls; + ls.open_string(line); + database->indx[i]->diph = wstrdup(ls.get().string()); + database->indx[i]->file = wstrdup(ls.get().string()); + database->indx[i]->beg = atof(ls.get().string()); + database->indx[i]->mid = atof(ls.get().string()); + database->indx[i]->end = atof(ls.get().string()); + ls.close(); + i++; + } + } + + if (i == database->ndiphs) + { + cerr << "Diphone: too many diphones in DB" << endl; + festival_error(); + } + + database->nindex = i; + database->ndiphs = i; + + ts.close(); +} + +static void load_diphs(DIPHONE_DATABASE *database) +{ + int i; + + for(i=0;inindex;i++) + { + load_signal_file(database,i,database->sig_access_type); +// if (database->type = di_lpc) +// load_lpc_file(database,i,database->sig_access_type); + load_pitch_file(database,i,database->sig_access_type); + } +} + +static void load_lpc_file(DIPHONE_DATABASE *db,int diph,int mode) +{ + // Load LPC coefficients + EST_String lpc_file; + EST_Track lpc; + + if (db->lpc[diph]->f != 0) + return; // already loaded + + if (mode == di_direct) + { + lpc_file = EST_String(db->lpc_dir) + + db->indx[diph]->file + db->lpc_ext; + + if (lpc.load(lpc_file) != format_ok) + { + cerr << "Diphone: failed to read lpc file " << + lpc_file << endl; + festival_error(); + } + if (lpc.num_channels() != db->lpc_order) + { + cerr << "Diphone: lpc file " << + lpc_file << " has order " << lpc.num_channels() << + " while database has " << db->lpc_order << endl; + festival_error(); + } + // Extract frames (pitch synchronously) + extract_lpc_frames(db,diph,lpc); + } + + return; +} + +static void ref2lpc(const float *rfc, float *lpc, int order) +{ + // Here we use Christopher Longet Higgin's algorithm converted to + // an equivalent by awb. Its doesn't have hte reverse order or + // negation requirement. + float a,b; + int n,k; + + for (n=0; n < order; n++) + { + lpc[n] = rfc[n]; + for (k=0; 2*(k+1) <= n+1; k++) + { + a = lpc[k]; + b = lpc[n-(k+1)]; + lpc[k] = a-b*lpc[n]; + lpc[n-(k+1)] = b-a*lpc[n]; + } + } +} + +static void extract_lpc_frames(DIPHONE_DATABASE *db, int diph, EST_Track &lpc) +{ + // Extract LPC frames from lpc, one for each pitch mark + int frame_num; + float pos,factor; + float ps_pos; + int i,j,k; + + db->lpc[diph]->f = walloc(float *,db->pm[diph]->nmark); + float *lpcs = walloc(float,lpc.num_channels()); + for (i=0; i < db->pm[diph]->nmark; i++) + { + if (db->lpc_pitch_synch) + { + db->lpc[diph]->f[i] = walloc(float,lpc.num_channels()); + pos = (((float)db->pm[diph]->mark[i]-db->sig_band)/ + (float)db->samp_freq) + + (db->indx[diph]->beg/1000.0); + for (j=1,ps_pos=0; jlpc[diph]->f[i][0] = lpcs[0]; + lpc2ref(&lpcs[1],&db->lpc[diph]->f[i][1], + lpc.num_channels()-1); + break; + } + } + if (j==lpc.num_frames()) + { + cerr << "Diphone: lpc access, failed to find lpc coeffs" + << endl; + festival_error(); + } + } + else + { // Not pitch synchronous so find closest frames and + // interpolate between them + db->lpc[diph]->f[i] = walloc(float,lpc.num_channels()); + // position of current mark in seconds + pos = (((float)db->pm[diph]->mark[i]-db->sig_band)/ + (float)db->samp_freq) + + (db->indx[diph]->beg/1000.0); + // Convert to frames, rounding and subtracting start offset + frame_num = (int)((pos/db->lpc_frame_shift)) + - db->lpc_frame_offset; + if (frame_num+1 < lpc.num_frames()) + { // Interpolate between them + factor = (pos - ((1+frame_num)*db->lpc_frame_shift))/ + db->lpc_frame_shift; + for (j=0; j < lpc.num_channels(); j++) + { + db->lpc[diph]->f[i][j] = + lpc(frame_num,j) + + (factor * (lpc(frame_num+1,j)- + lpc(frame_num,j))); + } + } + if (frame_num >= lpc.num_frames()) + { + cerr << "Diphone: LPC frame past end of file \"" << + db->indx[diph]->file << "\"" << endl; + memset(db->lpc[diph]->f[i],0,sizeof(float)*lpc.num_channels()); + } + else // Last one so just take it as is + { + lpc.copy_frame_out(frame_num, db->lpc[diph]->f[i], + 0, lpc.num_channels()); + } + } + } + db->lpc[diph]->nframes = db->pm[diph]->nmark; + wfree(lpcs); +// db->lpc_order = lpc.num_channels(); + +} + +static void lpc2ref(const float *lpc, float *rfc, int order) +{ + // LPC to reflection coefficients + // from code from Borja Etxebarria + int i,j; + float f,ai; + float *vo,*vx; + float *vn = new float[order]; + + i = order-1; + rfc[i] = ai = lpc[i]; + f = 1-ai*ai; + i--; + + for (j=0; j<=i; j++) + rfc[j] = (lpc[j]+((ai*lpc[i-j])))/f; + + /* vn=vtmp in previous #define */ + vo=rfc; + + for ( ;i>0; ) + { + ai=vo[i]; + f = 1-ai*ai; + i--; + for (j=0; j<=i; j++) + vn[j] = (vo[j]+((ai*vo[i-j])))/f; + + rfc[i]=vn[i]; + + vx = vn; + vn = vo; + vo = vx; + } + + delete [] vn; +} + +static void load_signal_file(DIPHONE_DATABASE *db, int i, int mode) +{ + // Load signal (or lpc residual) file) + int beg_samp,end_samp,zone,nsamples; + EST_String signal_file; + EST_String sig_type; + int offset,error; + beg_samp = 0; + zone = 0; + nsamples = 0; + + if (db->gtype == di_ungrouped) + { + beg_samp = (int)((db->indx[i]->beg)/1000.0*db->samp_freq); + end_samp = (int)((db->indx[i]->end)/1000.0*db->samp_freq); + + nsamples = end_samp - beg_samp; + + zone = db->sig_band; + db->zone = zone; + db->vox[i]->nsamples = nsamples+(2*zone); + db->vox[i]->signal = 0; + } + + if (mode == di_direct) + { + if (db->gtype == di_ungrouped) + { + EST_Wave w; + if (db->type == di_lpc) + { + signal_file = EST_String(db->lpc_dir) + + EST_String(db->indx[i]->file) + + EST_String(db->lpc_res_ext); + sig_type = db->lpc_res_type; + // Different LPC techniques will leave various offsets + // in the residule, you have to specify this explicitly + beg_samp -= (int)(db->lpc_res_offset * db->samp_freq); + } + else + { + signal_file = EST_String(db->signal_dir) + + EST_String(db->indx[i]->file) + + EST_String(db->signal_ext); + sig_type = db->signal_type; + } + offset = beg_samp-zone; + if (offset < 0) + offset = 0; + if (w.load_file(signal_file,sig_type, + db->samp_freq, "short", EST_NATIVE_BO, + 1, offset, nsamples+2*zone) != format_ok) + { + cerr << "Diphone: failed to read " << sig_type + << " format signal file " << signal_file << endl; + festival_error(); + } + db->vox[i]->signal = walloc(short,w.num_samples()); + if (beg_samp-zone < 0) // wasn't enough space at beginning + error = abs(beg_samp-zone); + else + error = 0; + memset(db->vox[i]->signal,0,error*sizeof(short)); + for (int j=0; j < w.num_samples()-error; j++) + db->vox[i]->signal[error+j] = w(j); + db->vox[i]->nsamples = w.num_samples()-error; + } + else // grouped so have to access the group file + { + if (db->gfd == NULL) + { + cerr << "Diphone: can no longer access the group file" << endl; + festival_error(); + } + if (db->group_encoding == di_raw) + { + db->vox[i]->signal = walloc(short,db->vox[i]->nsamples); + fseek(db->gfd,db->gsignalbase+(db->offsets[i]*2),SEEK_SET); + fread(db->vox[i]->signal,sizeof(short), + db->vox[i]->nsamples,db->gfd); + if (db->swap) + swap_bytes_short(db->vox[i]->signal,db->vox[i]->nsamples); + } + else if (db->group_encoding == di_ulaw) + { + unsigned char *ulaw = + walloc(unsigned char,db->vox[i]->nsamples); + db->vox[i]->signal = walloc(short,db->vox[i]->nsamples); + fseek(db->gfd,db->gsignalbase+(db->offsets[i]),SEEK_SET); + fread(ulaw,sizeof(unsigned char),db->vox[i]->nsamples,db->gfd); + ulaw_to_short(ulaw,db->vox[i]->signal,db->vox[i]->nsamples); + wfree(ulaw); + } + else + { + cerr << "Diphone: unknown group type" << endl; + festival_error(); + } + } + } + +} + +void load_pitch_file(DIPHONE_DATABASE *database, int i, int mode) +{ + // load files from newer Track format + int mark[5000]; + EST_String pitch_file; + EST_Track pms; + float fnum; + int k,k1,k2,m,zone,beg_samp,p; + + if ((database->pm[i]->mark != 0) || + (mode != di_direct)) + return; + + pitch_file = EST_String(database->pitch_dir)+database->indx[i]->file+ + database->pitch_ext; + if (pms.load(pitch_file) != format_ok) + { + cerr << "Diphone: Can't open pitch file " << pitch_file << endl; + festival_error(); + } + /* assumptions.. only those within the limits of the diphone */ + + beg_samp = (int)((database->indx[i]->beg)/1000.0*database->samp_freq); + + zone = database->sig_band; + + k = 0; + k1 = 0; + k2 = 0; + for (p=0; p < pms.num_frames(); p++) + { + fnum = pms.t(p)*1000.0; + if((fnum>database->indx[i]->beg) && (fnumindx[i]->mid)) + { + mark[k] = (int)(fnum/1000.0*database->samp_freq - beg_samp + zone); + if ((mark[k] >= database->vox[i]->nsamples+zone) || + (mark[k] > 64534)) + { + fprintf(stderr,"Diphone: Mark out of range -- too large %s\n", + (const char *)pitch_file); + k--; k1--; + } + if(mark[k] < zone) + { + fprintf(stderr,"Diphone: Mark out of range -- too small %s\n", + (const char *)pitch_file); + k--; k1--; + } + k++; + k1++; + } + else if((fnum>=database->indx[i]->mid) && + (fnumindx[i]->end)) + { + mark[k] = (int)(fnum/1000.0*database->samp_freq - beg_samp + zone); + if ((mark[k] >= database->vox[i]->nsamples+zone) || + (mark[k] > 64534)) + { + fprintf(stderr,"Diphone: Mark out of range -- too large %s\n", + (const char *)pitch_file); + k--; k2--; + } + if(mark[k] < zone) + { + fprintf(stderr,"Diphone: Mark out of range -- too small %s\n", + (const char *)pitch_file); + k--; k2--; + } + k++; + k2++; + } + } + database->pm[i]->mark = walloc(unsigned short,k); + for(m=0;mpm[i]->mark[m] = (unsigned short)mark[m]; + + database->pm[i]->nmark = (unsigned short)k; + database->pm[i]->lmark = (unsigned short)k1; + database->pm[i]->rmark = (unsigned short)k2; + if (database->pm[i]->rmark == 0) + { + *cdebug << "Diphone: modifying edge pms for " + << database->indx[i]->diph << endl; + database->pm[i]->rmark = 1; + database->pm[i]->lmark -= 1; + } + if (database->pm[i]->nmark <= 0) + { + cerr << "Diphone: diphone " << database->indx[i]->diph << + " has 0 pitchmarks" << endl; + festival_error(); + } + +} + +#if 0 +void load_pitch_file(DIPHONE_DATABASE *database, int i, int mode) +{ + char s[100]; + int mark[5000]; + EST_String pitch_file; + FILE *pfd; + float fnum; + int k,k1,k2,m,zone,beg_samp; + + if ((database->pm[i]->mark != 0) || + (mode != di_direct)) + return; + + pitch_file = EST_String(database->pitch_dir)+database->indx[i]->file+ + database->pitch_ext; + if((pfd=fopen(pitch_file,"rb")) == NULL) + { + cerr << "Diphone: Can't open pitch file " << pitch_file << endl; + festival_error(); + } + /* assumptions.. only those within the limits of the diphone */ + + beg_samp = (int)((database->indx[i]->beg)/1000.0*database->samp_freq); + + zone = database->sig_band; + + k = 0; + k1 = 0; + k2 = 0; + while(fgets(s,100,pfd) != NULL) + { + sscanf(s,"%f",&fnum); + if((fnum>database->indx[i]->beg) && (fnumindx[i]->mid)) + { + mark[k] = (int)(fnum/1000.0*database->samp_freq - beg_samp + zone); + if ((mark[k] >= database->vox[i]->nsamples+zone) || + (mark[k] > 64534)) + { + fprintf(stderr,"Diphone: Mark out of range -- too large %s\n", + (const char *)pitch_file); + k--; k1--; + } + if(mark[k] < zone) + { + fprintf(stderr,"Diphone: Mark out of range -- too small %s\n", + (const char *)pitch_file); + k--; k1--; + } + k++; + k1++; + } + else if((fnum>=database->indx[i]->mid) && + (fnumindx[i]->end)) + { + mark[k] = (int)(fnum/1000.0*database->samp_freq - beg_samp + zone); + if ((mark[k] >= database->vox[i]->nsamples+zone) || + (mark[k] > 64534)) + { + fprintf(stderr,"Diphone: Mark out of range -- too large %s\n", + (const char *)pitch_file); + k--; k2--; + } + if(mark[k] < zone) + { + fprintf(stderr,"Diphone: Mark out of range -- too small %s\n", + (const char *)pitch_file); + k--; k2--; + } + k++; + k2++; + } + } + database->pm[i]->mark = walloc(unsigned short,k); + for(m=0;mpm[i]->mark[m] = (unsigned short)mark[m]; + + database->pm[i]->nmark = (unsigned short)k; + database->pm[i]->lmark = (unsigned short)k1; + database->pm[i]->rmark = (unsigned short)k2; + if (database->pm[i]->rmark == 0) + { + *cdebug << "Diphone: modifying edge pms for " + << database->indx[i]->diph << endl; + database->pm[i]->rmark = 1; + database->pm[i]->lmark -= 1; + } + if (database->pm[i]->nmark <= 0) + { + cerr << "Diphone: diphone " << database->indx[i]->diph << + " has 0 pitchmarks" << endl; + festival_error(); + } + + fclose(pfd); +} +#endif + +/* Buffer to hold current diphone signal when using ondemand access */ +/* method. It remembers the last phone accessed as typical access is */ +/* the same one for a few times */ +static short *diph_buffer = 0; +static int diph_max_size = 0; +static int last_diph = -1; +static DIPHONE_DATABASE *last_db = 0; + +short *di_get_diph_signal(int diph,DIPHONE_DATABASE *db) +{ + // Get the diphone signal (or residual) from wherever + + if (db->sig_access_type == di_direct) // all pre-loaded + return db->vox[diph]->signal; + else if (db->sig_access_type == di_dynamic) // Load and keep + { + if (db->vox[diph]->signal == 0) + load_signal_file(db,diph,di_direct); + return db->vox[diph]->signal; + } + else if (db->sig_access_type == di_ondemand) // Load and free afterwards + { // Loads it into a common buffer, over written each time + if ((diph == last_diph) && + (db == last_db)) // ensure db hasn't changed + return diph_buffer; + load_signal_file(db,diph,di_direct); + if (diph_max_size < db->vox[diph]->nsamples) + { + wfree(diph_buffer); + diph_buffer = walloc(short,db->vox[diph]->nsamples); + diph_max_size = db->vox[diph]->nsamples; + } + memmove(diph_buffer,db->vox[diph]->signal, + db->vox[diph]->nsamples*sizeof(short)); + wfree(db->vox[diph]->signal); + db->vox[diph]->signal = 0; + last_db = db; last_diph = diph; + return diph_buffer; + } + else + { + cerr << "Diphone: unknown diphone signal access strategy" << endl; + festival_error(); + } + return NULL; +} + +/* The buffer used to hold the requested frame */ +static float frame_buff[128]; + +float *di_get_diph_lpc_mark(int diph,int mark,DIPHONE_DATABASE *db) +{ + // Get the coeff frame for diph at mark + + load_lpc_file(db,diph,di_direct); + + memmove(frame_buff, + db->lpc[diph]->f[mark], + sizeof(float)*db->lpc_order); + + return frame_buff; +} + +short *di_get_diph_res_mark(int diph,int mark,int size,DIPHONE_DATABASE *db) +{ + // Get the residual for diph at mark, use the signal field + // to hold it as they are so similar. + short *residual; + + residual = di_get_diph_signal(diph,db); + + // Take residual around this midpoint + + int pos_samp = db->pm[diph]->mark[mark] - size/2; + + if (pos_samp < 0) + { + pos_samp = 0; + *cdebug << "DIPHONE: sig_band too short to the left" << endl; + } + if (pos_samp+size >= db->vox[diph]->nsamples) + { + pos_samp = db->vox[diph]->nsamples - size; + *cdebug << "DIPHONE: sig_band too short to the right" << endl; + } + + return &residual[pos_samp]; +} + +void di_load_grouped_db(const EST_Pathname &filename, DIPHONE_DATABASE *db, + LISP global_params) +{ + // Get index file and saved data from grouped file + int i,j; + unsigned int magic; + int strsize; + char *diphnames; + LISP params; + + if ((db->gfd=fopen(filename,"rb")) == NULL) + { + cerr << "Diphone: cannot open group file " << + filename << " for reading" << endl; + festival_error(); + } + + fread(&magic,sizeof(int),1,db->gfd); + if (magic == SWAPINT(DIPHONE_MAGIC)) + db->swap = TRUE; + else if (magic != DIPHONE_MAGIC) + { + cerr << "Diphone: " << filename << " not a group file" << endl; + festival_error(); + } + + params = lreadf(db->gfd); // read the parameters in LISP form + + di_general_parameters(db,params); // some may be reset later + di_fixed_parameters(db,params); + di_general_parameters(db,global_params); // reset some params + + database_malloc(db->ndiphs,db); + db->nindex = db->ndiphs; // we can trust that number this time + + fread(&strsize,sizeof(int),1,db->gfd); // number of chars in diph names + if (db->swap) + strsize = SWAPINT(strsize); + diphnames = walloc(char,strsize); + fread(diphnames,sizeof(char),strsize,db->gfd); + for (j=i=0;inindex;i++) + { + db->indx[i]->diph = &diphnames[j]; + db->indx[i]->file = 0; + for ( ; diphnames[j] != '\0'; j++) // skip to next diphname + if (j > strsize) + { + cerr << "Diphone: group file diphone name table corrupted" + << endl; + festival_error(); + } + j++; + } + + // Diphone signals + di_group_load_signal(db); + // Diphone LPC parameters + if (db->type == di_lpc) + di_group_load_lpc_params(db); + // Diphone Pitch marks + di_group_load_pm(db); + + if (db->sig_access_type == di_direct) + { + fclose(db->gfd); // read eveything + db->gfd = 0; + } + +} + +static void di_group_load_signal(DIPHONE_DATABASE *db) +{ + int i; + unsigned short *samp_counts; + int sample_offset,totsamples; + + samp_counts = walloc(unsigned short,db->nindex); + fread(samp_counts,sizeof(unsigned short),db->nindex,db->gfd); + if (db->swap) swap_bytes_ushort(samp_counts,db->nindex); + fread(&totsamples,sizeof(int),1,db->gfd); + if (db->swap) + totsamples = SWAPINT(totsamples); + if (db->sig_access_type == di_direct) + { + if (db->group_encoding == di_raw) + { + db->allsignal = walloc(short,totsamples); + fread(db->allsignal,sizeof(short),totsamples,db->gfd); + if (db->swap) + swap_bytes_short(db->allsignal,totsamples); + } + else if (db->group_encoding == di_ulaw) + { + db->allulawsignal = walloc(unsigned char,totsamples); + fread(db->allulawsignal,sizeof(unsigned char),totsamples,db->gfd); + } + } + else + { + db->gsignalbase = ftell(db->gfd); + db->offsets = walloc(int,db->nindex); + } + + sample_offset = 0; + for (i=0; i < db->nindex; i++) + { + db->vox[i]->nsamples = samp_counts[i]; + if (db->sig_access_type == di_direct) + { + if (db->group_encoding == di_raw) + db->vox[i]->signal = &db->allsignal[sample_offset]; + else if (db->group_encoding == di_ulaw) + { + db->vox[i]->signal = walloc(short,samp_counts[i]); + ulaw_to_short(&db->allulawsignal[sample_offset], + db->vox[i]->signal,samp_counts[i]); + } + else + { + cerr << "Diphone: unknown group type to unpack" << endl; + festival_error(); + } + } + else + { + db->offsets[i] = sample_offset; + db->vox[i]->signal = 0; + } + sample_offset += samp_counts[i]; + } + if (db->sig_access_type != di_direct) + if (db->group_encoding == di_ulaw) + fseek(db->gfd,(long)sample_offset,SEEK_CUR); + else + fseek(db->gfd,(long)sample_offset*sizeof(short),SEEK_CUR); + wfree(samp_counts); +} + +static void di_group_load_lpc_params(DIPHONE_DATABASE *db) +{ + // LPC params are always fully loaded + int totframes; + int i,j,k; + unsigned short *frame_counts; + int frame_offset; + int this_frame; + + frame_counts = walloc(unsigned short, db->nindex); + fread(frame_counts,sizeof(unsigned short),db->nindex,db->gfd); + if (db->swap) swap_bytes_ushort(frame_counts,db->nindex); + fread(&totframes,sizeof(int),1,db->gfd); + if (db->swap) totframes = SWAPINT(totframes); + if (db->group_encoding == di_raw) // its as floats + { + db->allframes = walloc(float,totframes*db->lpc_order); + fread(db->allframes,sizeof(float), + totframes*db->lpc_order,db->gfd); + if (db->swap) + swap_bytes_float(db->allframes,totframes*db->lpc_order); + } + else if (db->group_encoding == di_ulaw) // its as shorts + { + db->allframesshort = walloc(short,totframes*db->lpc_order); + fread(db->allframesshort,sizeof(short), + totframes*db->lpc_order,db->gfd); + if (db->swap) + swap_bytes_short(db->allframesshort, + totframes*db->lpc_order); + } + frame_offset = 0; + for (i=0; i < db->nindex; i++) + { + db->lpc[i]->nframes = frame_counts[i]; + db->lpc[i]->f = walloc(float *,frame_counts[i]); + if (db->group_encoding == di_raw) + for (j=0;jlpc[i]->nframes;j++) + db->lpc[i]->f[j] = + &db->allframes[(frame_offset+j)*db->lpc_order]; + else if (db->group_encoding == di_ulaw) + { + int fixedpoint = FALSE; + if (siod_get_lval("lpc_fixedpoint",NULL) != NIL) + fixedpoint = TRUE; + for (j=0;jlpc[i]->nframes;j++) + { + db->lpc[i]->f[j] = walloc(float,db->lpc_order); + this_frame = (frame_offset+j)*db->lpc_order; + if (fixedpoint) + for (k=0;klpc_order;k++) + db->lpc[i]->f[j][k] = + (float)db->allframesshort[this_frame+k]; + else + for (k=0;klpc_order;k++) + db->lpc[i]->f[j][k] = + (float)db->allframesshort[this_frame+k]/32766.0; + } + } + else + { + cerr << "Diphone: unknown group type to unpack" << endl; + festival_error(); + } + frame_offset += frame_counts[i]; + } + wfree(db->allframesshort); + db->allframesshort = 0; // don't really need this any more + wfree(frame_counts); +} + +static void di_group_load_pm(DIPHONE_DATABASE *db) +{ + unsigned short *pm_info; + int i,j; + + pm_info = walloc(unsigned short,db->nindex*3); + if (fread(pm_info,sizeof(unsigned short),db->nindex*3,db->gfd) != + (unsigned int)(db->nindex*3)) + { + cerr << "DIPHONE: short group file, can't read pm\n"; + festival_error(); + } + if (db->swap) + for (i=0; i < db->nindex*3; i++) + pm_info[i] = SWAPSHORT(pm_info[i]); + for (i=0; i < db->nindex; i++) + { + db->pm[i]->mark = walloc(unsigned short,pm_info[i*3]); + db->pm[i]->nmark = pm_info[i*3]; + db->pm[i]->lmark = pm_info[(i*3)+1]; + db->pm[i]->rmark = pm_info[(i*3)+2]; + fread(db->pm[i]->mark,sizeof(unsigned short),db->pm[i]->nmark,db->gfd); + if (db->swap) + for (j=0; j < db->pm[i]->nmark; j++) + db->pm[i]->mark[j] = SWAPSHORT(db->pm[i]->mark[j]); + } +} + +static LISP di_enlispen_params(DIPHONE_DATABASE *db) +{ + // Return lisp representation of the parameters in db + + return cons(make_param_str("name",db->name), + cons(make_param_str("type",db->type_str), + cons(make_param_str("index_file",db->index_file), + cons(make_param_str("signal_dir",db->signal_dir), + cons(make_param_str("signal_ext",db->signal_ext), + cons(make_param_str("signal_type",db->signal_type), + cons(make_param_str("pitch_dir",db->pitch_dir), + cons(make_param_str("pitch_ext",db->pitch_ext), + cons(make_param_str("lpc_dir",db->lpc_dir), + cons(make_param_str("lpc_ext",db->lpc_ext), + cons(make_param_str("lpc_res_ext",db->lpc_res_ext), + cons(make_param_str("lpc_type",db->lpc_type), + cons(make_param_str("lpc_res_type",db->lpc_res_type), + cons(make_param_float("lpc_res_offset",db->lpc_res_offset), + cons(make_param_int("lpc_frame_offset",db->lpc_frame_offset), + cons(make_param_int("lpc_order",db->lpc_order), + cons(make_param_float("lpc_frame_shift",db->lpc_frame_shift), + cons(make_param_int("samp_freq",db->samp_freq), + cons(make_param_str("phoneset",db->phoneset), + cons(make_param_str("access_type",db->sig_access_type_str), + cons(make_param_str("group_encoding",db->group_encoding_str), + cons(make_param_int("num_diphones",db->nindex), + cons(make_param_int("sig_band",db->sig_band), + cons(make_param_int("def_f0",db->def_f0), + cons(make_param_str("default_diphone",db->default_diphone), + cons(make_param_lisp("alternates_before",db->alternates_before), + cons(make_param_lisp("alternates_after",db->alternates_after), + NIL))))))))))))))))))))))))))); +} + +void di_save_grouped_db(const EST_Pathname &filename, DIPHONE_DATABASE *db) +{ + // Get index file and saved data from grouped file + FILE *fd; + LISP params; + int strsize,totsamples,totframes; + int i,j,k; + + if ((fd=fopen(filename,"wb")) == NULL) + { + cerr << "Diphone: cannot open group file " << + filename << " for saving" << endl; + festival_error(); + } + + fwrite(&DIPHONE_MAGIC,sizeof(int),1,fd); + params = di_enlispen_params(db); // get lisp representation of parameters + lprin1f(params,fd); + + // Only need to dump the diphone names, not the rest of the indx info + strsize = 0; + for (i=0;inindex;i++) + strsize += strlen(db->indx[i]->diph)+1; + fwrite(&strsize,sizeof(int),1,fd); + for (i=0;inindex;i++) + fwrite(db->indx[i]->diph,sizeof(char),strlen(db->indx[i]->diph)+1,fd); + + // Diphone Signals + // Dump the signal sizes first to make reading easier + totsamples = 0; + for (i=0;inindex;i++) + { + if (db->vox[i]->signal == 0) // in case it isn't loaded yet + { + load_pitch_file(db,i,di_direct); + load_signal_file(db,i,di_direct); + } + fwrite(&db->vox[i]->nsamples,sizeof(unsigned short),1,fd); + totsamples += db->vox[i]->nsamples; + } + fwrite(&totsamples,sizeof(int),1,fd); + // Dump signals (compressed if necessary) + for (i=0;inindex;i++) + { + if (db->group_encoding == di_raw) + fwrite(db->vox[i]->signal,sizeof(short),db->vox[i]->nsamples,fd); + else if (db->group_encoding == di_ulaw) + { + unsigned char *ulaw = walloc(unsigned char,db->vox[i]->nsamples); + short_to_ulaw(db->vox[i]->signal,ulaw,db->vox[i]->nsamples); + fwrite(ulaw,sizeof(unsigned char),db->vox[i]->nsamples,fd); + wfree(ulaw); + } + else + { + cerr << "Diphone: unknown group type for dumping" << endl; + festival_error(); + } + + } + + // Diphone LPC parameters + if (db->type == di_lpc) + { + for (i=0;inindex;i++) // ensure they are all loaded + load_lpc_file(db,i,di_direct); + totframes = 0; + for (i=0;inindex;i++) + { + fwrite(&db->lpc[i]->nframes,sizeof(unsigned short),1,fd); + totframes += db->lpc[i]->nframes; + } + fwrite(&totframes,sizeof(int),1,fd); + for (i=0;inindex;i++) + { + if (db->group_encoding == di_raw) // saved as floats + { + for (j=0; jlpc[i]->nframes; j++) + fwrite(db->lpc[i]->f[j],sizeof(float),db->lpc_order,fd); + } + else if (db->group_encoding == di_ulaw) // saved as shorts + { + short *sh = new short[db->lpc_order]; + + for (j=0; jlpc[i]->nframes; j++) + { + for (k=0; klpc_order; k++) + sh[k] = (short)(db->lpc[i]->f[j][k]*32766.0); + fwrite(sh,sizeof(short),db->lpc_order,fd); + } + delete sh; + } + else + { + cerr << "Diphone: unknown group type for dumping" << endl; + festival_error(); + } + } + } + + // Diphone Pitch Marks + for (i=0;inindex;i++) + { + fwrite(&db->pm[i]->nmark,sizeof(unsigned short),1,fd); + fwrite(&db->pm[i]->lmark,sizeof(unsigned short),1,fd); + fwrite(&db->pm[i]->rmark,sizeof(unsigned short),1,fd); + } + for (i=0;inindex;i++) + { + fwrite(db->pm[i]->mark,sizeof(unsigned short),db->pm[i]->nmark,fd); + i=i; + } + + fclose(fd); + +} + diff --git a/src/modules/diphone/di_pitch.cc b/src/modules/diphone/di_pitch.cc new file mode 100644 index 0000000..3c169f7 --- /dev/null +++ b/src/modules/diphone/di_pitch.cc @@ -0,0 +1,117 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alistair Conkie */ +/* Date : August 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* A free version of diphone selection and concatenation supported the */ +/* standard CSTR diphone dbs */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" +#include "diphone.h" + +static int interpolated_freq(int k, DIPHONE_SPN *ps,int def_f0); +static int interpolate(int a,int b,int c,int d,int e); + +void di_calc_pitch(DIPHONE_DATABASE *db, DIPHONE_SPN *ps, DIPHONE_ACOUSTIC *as) +{ + int j,k; + int y; + int l = 0; + int k_old = 0; + int k_fine = 0; + int x = 0; + + for(j=0;jt_sz;j++) + ps->abs_targ[j] = (int)(ps->cum_dur[ps->targ_phon[j]] + + ps->pc_targs[j]*ps->duration[ps->targ_phon[j]]/100.0); + + as->cum_pitch[0] = 0; + for(k=0;kcum_dur[ps->p_sz];k+=100) + { + y = interpolated_freq(k,ps,db->def_f0); + x += 100*y; + while(x>db->samp_freq) + { + k_fine = k + interpolate(x-100*y,0,x,100,db->samp_freq); + x -= db->samp_freq; + as->pitch[l] = k_fine-k_old; + as->cum_pitch[l+1] = as->pitch[l] + as->cum_pitch[l]; + l++; + if (l == as->p_max) + { + cerr << "Diphone: too many pitch marks\n"; + festival_error(); + } + k_old = k_fine; + } + } + as->p_sz = l; + + return; +} + +static int interpolated_freq(int k, DIPHONE_SPN *ps,int def_f0) +{ + int i; + int freq; + + if(!ps->t_sz) + return(def_f0); + else if(kabs_targ[0]) + return(ps->targ_freq[0]); + else if(k>=ps->abs_targ[ps->t_sz-1]) + return(ps->targ_freq[ps->t_sz-1]); + for(i=1;it_sz;i++) { + if((kabs_targ[i]) && (k>=ps->abs_targ[i-1])) + { + freq = interpolate(ps->abs_targ[i-1], + ps->targ_freq[i-1], + ps->abs_targ[i], + ps->targ_freq[i],k); + return(freq); + } + } + return(-1); /* should never arrive here */ +} + +static int interpolate(int a,int b,int c,int d,int e) +{ + int f; + + f = (c*b + d*e - e*b -a*d)/(c-a); + + return(f); +} diff --git a/src/modules/diphone/di_psola.cc b/src/modules/diphone/di_psola.cc new file mode 100644 index 0000000..87fb6ff --- /dev/null +++ b/src/modules/diphone/di_psola.cc @@ -0,0 +1,74 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alistair Conkie */ +/* Date : August 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* As CNET (France Telecom) claims a patent on the PSOLA/TD(*) algorithm */ +/* we have decided not to distrubute PSOLA code as part of the basic */ +/* system. */ +/* */ +/* We have, of course, implemented PSOLA as part of our own research. */ +/* Depending on the installation, our PSOLA impelementation may or may */ +/* be included with your distribution. If you do have it then */ +/* define HAVE_DI_PSOLA_TM (in config/config_make_rules). If defined */ +/* the file di_psolaTM.C is included and PSOLA is available otherwise */ +/* an error function is compiled. */ +/* */ +/* if PSOLA is included "di_psolaTM" is added to the proclaimed module */ +/* list so that its existence may be tested in the LISP domain, as used */ +/* in lib/gsw_diphone.scm */ +/* */ +/* (*) PSOLA/TD is a Trade Mark of France Telecom */ +/* */ +/*=======================================================================*/ +#include +#include +#include +#include +#include "festival.h" +#include "diphone.h" + +#ifdef SUPPORT_PSOLA_TM +#include "di_psolaTM.cc" +#else +void di_psola_tm(DIPHONE_DATABASE *db, DIPHONE_ACOUSTIC *as, DIPHONE_OUTPUT *output) +{ + (void)db; + (void)as; + (void)output; + + cerr << "Diphone: di_psola is not available in this installation" << endl; + festival_error(); +} +#endif diff --git a/src/modules/diphone/di_reslpc.cc b/src/modules/diphone/di_reslpc.cc new file mode 100644 index 0000000..d354c2d --- /dev/null +++ b/src/modules/diphone/di_reslpc.cc @@ -0,0 +1,466 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black with all the magic stuff from */ +/* Alistair Conkie, Steve Isard and Paul Taylor */ +/* Date : October 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Residual excited LPC synthesis */ +/* */ +/* From "Issues in High Quality LPC Analysis and Synthesis" */ +/* Melvyn Hunt, Dariusz Zweirzynski and Raymond Carr, Eurospeech 89 */ +/* vol 2 pp 348-351, Paris 1989 */ +/* */ +/*=======================================================================*/ +#include +#include +#include +#include +#include "festival.h" +#include "diphone.h" + +static void add_lpc_coeff(int nth,float *p_coeffs,EST_Track &lpc_coeff); +static void add_residual(int insize, int outsize, int position, + short *p_residual, + short *residual); +static void lpc_resynth(EST_Track &lpc_coeffs, + short *residual, + DIPHONE_ACOUSTIC *as, + DIPHONE_OUTPUT *output); +static void lpc_resynth_fixp(EST_Track &lpc_coeffs, + short *residual, + DIPHONE_ACOUSTIC *as, + DIPHONE_OUTPUT *output); +static int min(int a, int b); +static void ref2lpc_fixp(const int *rfc, int *lpc, int order); + +void reslpc_resynth(const EST_String &file, + DIPHONE_DATABASE *db, + DIPHONE_ACOUSTIC *as, + DIPHONE_OUTPUT *output) +{ + // Resynthesize a named file from the current + // This is not fully general, just for my testing + EST_Track lpc_coeffs; + EST_Wave residual; + int i,frame_len; + + if (lpc_coeffs.load(EST_String(db->lpc_dir)+file+EST_String(db->lpc_ext)) + != format_ok) + { + cerr << "Diphone: lpc resynthesis, failed to read lpc file" + << endl; + festival_error(); + } + if (residual.load(EST_String(db->lpc_dir)+file+EST_String(db->lpc_res_ext)) != + format_ok) + { + cerr << "Diphone: lpc resynthesis, failed to read residual file" + << endl; + festival_error(); + } + if (db->lpc_pitch_synch) + { + as->outwidth = walloc(int,lpc_coeffs.num_frames()); + output->o_max = 0; + for (i=0; ioutwidth[i] = frame_len; + output->o_max += frame_len; + + // Get rid of extra first coeff (time position) +// for (int j=0; jo_max *= 2; + output->track = walloc(short,output->o_max); +// lpc_coeffs.set_num_channels(lpc_coeffs.num_channels()-1); + } + else + { + as->outwidth = walloc(int,lpc_coeffs.num_frames()); + frame_len = (int)(db->lpc_frame_shift*db->samp_freq); + for (i=0; ioutwidth[i] = frame_len; + output->o_max = i*frame_len*2; + output->track = walloc(short,output->o_max); + } + + short *shresidual = walloc(short,residual.num_samples()); + for (i=0; i < residual.num_samples(); i++) + shresidual[i] = residual.a_no_check(i); + + // Reconstruct the waveform + if (siod_get_lval("debug_reslpc",NULL) != NIL) + { + residual.save("res.wav","nist"); + lpc_coeffs.save("lpc.ascii","ascii"); + } + if (siod_get_lval("lpc_fixedpoint",NULL) != NIL) + lpc_resynth_fixp(lpc_coeffs,shresidual,as,output); + else + lpc_resynth(lpc_coeffs,shresidual,as,output); + + wfree(shresidual); +} + +void di_reslpc(DIPHONE_DATABASE *db, + DIPHONE_ACOUSTIC *as, + DIPHONE_OUTPUT *output) +{ + int i; + int curr_diph; + int curr_mark; + int max_marks; + EST_Track lpc_coeffs(as->p_sz-1,db->lpc_order); + short *residual; + + residual = walloc(short,as->cum_pitch[as->p_sz-1]+db->sig_band); + memset(residual,0,sizeof(short)*as->cum_pitch[as->p_sz-1]+db->sig_band); + + /* original widths */ + for(i=0;ip_sz-1;i++) + { + curr_diph = as->diph_ref[i]; + curr_mark = as->diph_mark[i]; + max_marks = db->pm[curr_diph]->nmark; + if (curr_mark == 0) + { + as->inwidth[i] = + db->pm[curr_diph]->mark[1]-db->pm[curr_diph]->mark[0]; + } + else if (curr_mark == max_marks-1) + { + as->inwidth[i] = db->pm[curr_diph]->mark[curr_mark] + -db->pm[curr_diph]->mark[curr_mark-1]; + } + else + { + as->inwidth[i] = min(db->pm[curr_diph]->mark[curr_mark+1] + -db->pm[curr_diph]->mark[curr_mark], + db->pm[curr_diph]->mark[curr_mark] + -db->pm[curr_diph]->mark[curr_mark-1]); + } + } + + /* new widths */ + as->outwidth[0] = as->pitch[0]; + for(i=1;ip_sz;i++) + as->outwidth[i] = min(as->pitch[i],as->pitch[i-1]); + + int res_pos=0; + /* For each pitch period, copy the lpc coeffs and construct a residual */ + for(i=0;ip_sz-1;i++) + { + curr_diph = as->diph_ref[i]; + curr_mark = as->diph_mark[i]; + + if (as->outwidth[i] <= 5) + continue; // too wee to worry about + + /* get coeffs for this pitch period */ + add_lpc_coeff(i, + di_get_diph_lpc_mark(curr_diph,curr_mark,db), + lpc_coeffs); + + /* get residual for this pitch period (and slice it) */ + add_residual(as->inwidth[i], + as->outwidth[i], + res_pos, + di_get_diph_res_mark(curr_diph,curr_mark,as->inwidth[i],db), + residual); + res_pos += as->outwidth[i]; + } + if (siod_get_lval("debug_reslpc",NULL) != NIL) + { +// residual.save("res.wav","nist"); + lpc_coeffs.save("lpc.ascii","ascii"); + } + // Reconstruct the waveform + if (siod_get_lval("lpc_fixedpoint",NULL) != NIL) + lpc_resynth_fixp(lpc_coeffs,residual,as,output); + else + lpc_resynth(lpc_coeffs,residual,as,output); + + wfree(residual); +} + +static void add_lpc_coeff(int nth,float *p_coeffs,EST_Track &lpc_coeff) +{ + // Add coeffs frame to lpc_coeff at position + + lpc_coeff.copy_frame_in(nth, p_coeffs); + +} + +static void add_residual(int insize, int outsize, int position, + short *p_residual, + short *residual) +{ + // Slice/pad this pitch's residual to outsize and add to full residual + // at position. The mid points should align + int i; + + int ibp = insize/2; + int obp = outsize/2; + if (insize < outsize) + { // its too short so place it with 50% points aligned + memmove(&residual[position+(obp-ibp)], + p_residual, + insize*sizeof(short)); + } + else // insize > outside + { // Its too long so only take part of it + memmove(&residual[position], + &p_residual[ibp-obp], + outsize*sizeof(short)); + // Do power reduction + float factor = (float)outsize/(float)insize; + for (i=0; i < outsize; i++) + residual[position+i] = (int)((float)residual[position+i]*factor); + } + +#if 0 + + // Windowing (see if it makes a difference) +#define DI_PI 3.14159265358979323846 + float *window = walloc(float,outsize); + int i; + for (i=0; i < outsize/5; i++) + window[i] = (0.5-(0.5*cos((double)(DI_PI*2*i)/(double)((2*outsize)/5-1)))); + for (i=outsize/5; i < outsize; i++) + window[i] = (0.5-(0.5*cos((double)(DI_PI*2*(i+(outsize/5)*3))/ + (double)(8*outsize/5-1)))); +#if 0 + for (i=0,f=0; i < outsize/2+1; i++,f+=incr) + window[i] = f; + incr = 2.0/((outsize/2)); + for (i=0,f=0; i < (outsize/2)+1; i++,f+=incr) + window[outsize-1-i] = f; +#endif + + for (i=0; i < outsize; i++) + { + residual.data()[position+i] = + (short)(window[i]*(float)residual.data()[position+i]); + } + wfree(window); +#endif + +} +void ref2lpc(const float *rfc, float *lpc, int order) +{ + // Here we use Christopher Longet Higgin's algorithm converted to + // an equivalent by awb. It doesn't have hte reverse order or + // negation requirement. + float a,b; + int n,k; + + for (n=0; n < order; n++) + { + lpc[n] = rfc[n]; + for (k=0; 2*(k+1) <= n+1; k++) + { + a = lpc[k]; + b = lpc[n-(k+1)]; + lpc[k] = a-b*lpc[n]; + lpc[n-(k+1)] = b-a*lpc[n]; + } + } +} + +static void lpc_resynth(EST_Track &lpc_coeffs, + short *residual, + DIPHONE_ACOUSTIC *as, + DIPHONE_OUTPUT *output) +{ + // Reconstruct the waveform from the coefficients and residual + // This is copied from Steve's donovan code with minor changes + float *ptbuf; + float *rbuf; + int i,j,ci,cz; + int zz,r; + + int NCOEFFS = lpc_coeffs.num_channels()-1; // lpc_power is 0 + rbuf = walloc(float,NCOEFFS); + + ptbuf = walloc(float,output->o_max); + memset(ptbuf,0,sizeof(float)*NCOEFFS); + zz=NCOEFFS; + r=0; + + float *tmm_coefs = new float[NCOEFFS+1]; + + for(i=0;ioutwidth[i] >= output->o_max) + { + cerr << "Diphone: output buffer overflow " << endl; + break; + } + for(j=0 ;j < as->outwidth[i];j++,zz++,r++) + { + ptbuf[zz] = (float)residual[r]; + + // This loop is where all the time is spent + for (ci=0,cz=zz-1; ci < NCOEFFS; ci++,cz--) + ptbuf[zz] += rbuf[ci] * ptbuf[cz]; + } + } + // Clipping is required because of some overflows which really points + // to a lower level problem. Probably windowing the residual is what + // I should really do. + for (i=NCOEFFS,r=0; i < zz; i++,r++) + if (ptbuf[i] > 32766) + output->track[r] = (short)32766; + else if (ptbuf[i] < -32766) + output->track[r] = (short)-32766; + else + output->track[r] = (short)ptbuf[i]; + output->o_sz = r; + wfree(ptbuf); + wfree(rbuf); + delete [] tmm_coefs; +} + +static void lpc_resynth_fixp(EST_Track &lpc_coeffs, + short *residual, + DIPHONE_ACOUSTIC *as, + DIPHONE_OUTPUT *output) +{ + // Reconstruct the waveform from the coefficients and residual + // This is copied from Steve's donovan code with minor changes + // This is the fixed point (sort of integer) version of the above + // The coeffs are in the range -32768 to +32767 and where possible + // integer calculations are done. This is primarily to allow + // this to work on machine swithout a floating point processor + // (i.e. awb's palmtop) + int *ptbuf; + int *rbuf; + int i,j,ci,cz; + int zz,r; + + int NCOEFFS = lpc_coeffs.num_channels()-1; // lpc_power is 0 + rbuf = walloc(int,NCOEFFS); + + ptbuf = walloc(int,output->o_max); + memset(ptbuf,0,sizeof(int)*NCOEFFS); + zz=NCOEFFS; + r=0; + + int *tmm_coefs = new int[NCOEFFS+1]; + + for(i=0;ioutwidth[i] >= output->o_max) + { + cerr << "Diphone: output buffer overflow " << endl; + break; + } + for(j=0 ;j < as->outwidth[i];j++,zz++,r++) + { + ptbuf[zz] = ((int)residual[r]); + + // This loop is where all the time is spent + for (ci=0,cz=zz-1; ci < NCOEFFS; ci++,cz--) + ptbuf[zz] += (rbuf[ci] * ptbuf[cz]) / 32768; + } + } + // Clipping is required because of some overflows which really points + // to a lower level problem. Probably windowing the residual is what + // I should really do. + for (i=NCOEFFS,r=0; i < zz; i++,r++) + { + if (ptbuf[i] > 32766) + output->track[r] = (short)32766; + else if (ptbuf[i] < -32766) + output->track[r] = (short)-32766; + else + output->track[r] = (short)ptbuf[i]; + } + output->o_sz = r; + wfree(ptbuf); + wfree(rbuf); + delete [] tmm_coefs; +} + +static int min(int a, int b) +{ + return((a +#include +#include "festival.h" +#include "diphone.h" + +static void bresenham_mod(DIPHONE_SPN *ps, DIPHONE_ACOUSTIC *as); +static void nochange_mod(DIPHONE_SPN *ps, DIPHONE_ACOUSTIC *as); + +void di_frame_select(DIPHONE_DATABASE *database, DIPHONE_SPN *ps, DIPHONE_ACOUSTIC *as) +{ + /* resist the temptation to call this select :-) */ + int i, j; + int bias = 0; + + for(i=0;ip_sz;i++) + { + ps->pm_req[i] = 0; + ps->pm_giv[i] = 0; + } + + /* count the needed pitch marks for each phoneme (or diph) -> ps */ + + j = 0; /* can j get too big ? */ + for(i=0;ip_sz;i++) + { + if((as->cum_pitch[i]>=ps->cum_dur[j]) && + (as->cum_pitch[i]cum_dur[j+1])) + ps->pm_req[j] += 1; + else if((as->cum_pitch[i]>=ps->cum_dur[j+1]) && + (as->cum_pitch[i]cum_dur[j+2])) + ps->pm_req[++j] += 1; + else + { + *cdebug << "Diphone: some sort of pitch error" << endl; + ps->pm_req[++j] += 1; + } + } + + + /* count the given pitch marks for each phoneme (or diph) -> ps */ + + ps->pm_chg_diph[0] = 0; + for(i=0;ip_sz-1;i++) + { + ps->pm_giv[i] += database->pm[ps->ref[i]]->lmark; + ps->pm_chg_diph[i+1] = database->pm[ps->ref[i]]->rmark; + ps->pm_giv[i+1] += database->pm[ps->ref[i]]->rmark; + } + + if (siod_get_lval("diphone_no_mods",NULL) != NIL) + nochange_mod(ps,as); + else + bresenham_mod(ps,as); + + /* there is something still to do in terms of selecting frames */ + /* diph_mark */ + + bias = 0; + for(i=0;ip_sz-1;i++) + { + if ((as->phon_mark[i] + bias) < 0) + as->diph_mark[i] = as->phon_mark[i]; + else if ((as->phon_mark[i] + bias) < + database->pm[as->diph_ref[i]]->nmark) + as->diph_mark[i] = as->phon_mark[i] + bias; + else if (as->phon_mark[i] < + database->pm[as->diph_ref[i]]->nmark) + as->diph_mark[i] = as->phon_mark[i]; + else // It needs less than one pitch mark; + as->diph_mark[i] = database->pm[as->diph_ref[i]]->nmark-1; + if ((as->diph_ref[i] != as->diph_ref[i+1]) && + (i+1 != as->p_sz-1)) + bias = - database->pm[as->diph_ref[i]]->rmark; + if ((as->phon_ref[i] != as->phon_ref[i+1]) && + (i+1 != as->p_sz-1)) + bias = database->pm[as->diph_ref[i+1]]->lmark; + } +} + +static void nochange_mod(DIPHONE_SPN *ps, DIPHONE_ACOUSTIC *as) +{ + // Does no duration or pitch modifications, useful in testing + int ph,i,j; + + j=0; + for(ph=0;php_sz;ph++) + { + for (i=0; i < ps->pm_giv[ph]; i++,j++) + { + as->phon_ref[j] = ph; + as->phon_mark[j] = i; + if (i >= ps->pm_chg_diph[ph]) + as->diph_ref[j] = ps->ref[ph]; + else + as->diph_ref[j] = ps->ref[ph-1]; + } + } + as->p_sz = j; +} + +static void bresenham_mod(DIPHONE_SPN *ps, DIPHONE_ACOUSTIC *as) +{ + int error = 0; + int i,x,y,deltax,deltay, ph; + + as->p_tmp = 0; + + for(ph=0;php_sz;ph++) + { + error = 0; + deltax = ps->pm_req[ph]; + deltay = ps->pm_giv[ph]; + x = 0; + y = 0; + if (deltax < deltay) + { + for (i=0;i=deltay) + { + as->phon_ref[as->p_tmp] = ph; + as->phon_mark[as->p_tmp] = i-1; + if ((i-1) >= ps->pm_chg_diph[ph]) + as->diph_ref[as->p_tmp] = ps->ref[ph]; + else + as->diph_ref[as->p_tmp] = ps->ref[ph-1]; + as->p_tmp++; + /* printf("phon [%d], corr[%d] = %d\n",ph,x,i-1); */ + x++; + error -= deltay; + } + } + } else + { + for(i=0;iphon_ref[as->p_tmp] = ph; + as->phon_mark[as->p_tmp] = y; + if(y >= ps->pm_chg_diph[ph]) + as-> diph_ref[as->p_tmp] = ps->ref[ph]; + else + as-> diph_ref[as->p_tmp] = ps->ref[ph-1]; + as->p_tmp++; + /* printf("phon [%d], corr[%d] = %d\n",ph,i,y); */ + error += deltay; + if(error>deltax) + { + y++; + error -= deltax; + } + } + } + } +} + diff --git a/src/modules/diphone/diphone.cc b/src/modules/diphone/diphone.cc new file mode 100644 index 0000000..f6b6768 --- /dev/null +++ b/src/modules/diphone/diphone.cc @@ -0,0 +1,837 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alistair Conkie */ +/* Date : August 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* A free version of diphone selection and concatenation supported the */ +/* standard CSTR diphone dbs */ +/* */ +/*=======================================================================*/ +#include +#include +#include "festival.h" +#include "ModuleDescription.h" +#include "diphone.h" + +static ModuleDescription di_description = +{ + "diphone", 1.0, + "CSTR", + "Alistair Conkie, Alan Black ", + { + "A free version of diphone selection and concatenation supported the", + "standard CSTR diphone dbs ", + NULL + }, + { { "Segment", "Segments fo the utterance." }, + { "Target", "Intonation targets" }, + { NULL, NULL } }, + { { NULL,NULL} }, + { { "Wave", "Synthesized waveform" }, + { NULL,NULL} }, + { {NULL,NULL,NULL,NULL} } +}; + +static int lookupd(DIPHONE_DATABASE *database,char *p1, char *p2); +static void di_diphones(DIPHONE_SPN *ps, DIPHONE_DATABASE *database); +static void make_silence(DIPHONE_DATABASE *db,DIPHONE_SPN *ps,DIPHONE_OUTPUT *o); +static void merge_silences(EST_Utterance &u); + +static void delete_diphone_db(DIPHONE_DATABASE *config); +static DIPHONE_SPN *make_spn(EST_Utterance &u); +static void delete_spn(DIPHONE_SPN *ps); +static DIPHONE_ACOUSTIC *make_as(DIPHONE_SPN *ps); +static void delete_as(DIPHONE_ACOUSTIC *ss); +static DIPHONE_OUTPUT *make_output(DIPHONE_DATABASE *database,int size); +static void delete_output(DIPHONE_OUTPUT *o); + +static LISP diphone_dbs = NIL; +static DIPHONE_DATABASE *di_db=0; +static EST_Regex RXdi_ud("[$_]"); + +VAL_REGISTER_TYPE(diphone_db,DD_STRUCT) +SIOD_REGISTER_CLASS(diphone_db,DD_STRUCT) + +LISP FT_Diphone_Load_Diphones(LISP params) +{ + DIPHONE_DATABASE *db; + EST_Pathname groupfile; + + db = make_diphone_db(); + + groupfile = get_param_str("group_file",params,""); + if (!streq(groupfile,"")) + { + db->gtype = di_grouped; + di_load_grouped_db(groupfile,db,params); + } + else + { + db->gtype = di_ungrouped; + di_general_parameters(db,params); + di_fixed_parameters(db,params); + di_load_database(db); // load in the diphones/index + } + + di_add_diphonedb(db); // add to db list, selects it too + + return NIL; +} + +void di_general_parameters(DIPHONE_DATABASE *db,LISP params) +{ + // These may be reset from those set in group file + + db->name = wstrdup(get_param_str("name",params,"none")); + db->default_diphone = wstrdup(get_param_str("default_diphone",params,"")); + db->alternates_before = get_param_lisp("alternates_before",params,NIL); + gc_protect(&db->alternates_before); + db->alternates_after = get_param_lisp("alternates_after",params,NIL); + gc_protect(&db->alternates_after); + db->sig_access_type_str = + wstrdup(get_param_str("access_type",params,"direct")); + if (streq(db->sig_access_type_str,"direct")) + db->sig_access_type = di_direct; + else if (streq(db->sig_access_type_str,"dynamic")) + db->sig_access_type = di_dynamic; + else if (streq(db->sig_access_type_str,"ondemand")) + db->sig_access_type = di_ondemand; + else + { + cerr << "Diphone: unknown access method " << + db->sig_access_type_str << endl; + festival_error(); + } + +} + +void di_fixed_parameters(DIPHONE_DATABASE *db,LISP params) +{ + // These cannot be changed in a compiled group file + int nphones; + + db->index_file = wstrdup(get_param_str("index_file",params,"index")); + db->type_str = wstrdup(get_param_str("type",params,"pcm")); + if (streq(db->type_str,"pcm")) + db->type = di_pcm; + else if (streq(db->type_str,"lpc")) + db->type = di_lpc; + else + { + cerr << "Diphone: unknown database type \"" << + db->type_str << "\"" << endl; + festival_error(); + } + db->signal_dir = wstrdup(get_param_str("signal_dir",params,"signal/")); + db->signal_ext = wstrdup(get_param_str("signal_ext",params,".vox")); + db->signal_type = wstrdup(get_param_str("signal_type",params,"vox")); + db->lpc_dir = wstrdup(get_param_str("lpc_dir",params,"lpc/")); + db->lpc_ext = wstrdup(get_param_str("lpc_ext",params,".lpc")); + db->lpc_res_ext = wstrdup(get_param_str("lpc_res_ext",params,".res")); + db->lpc_res_type = wstrdup(get_param_str("lpc_res_type",params,"esps")); + db->lpc_type = wstrdup(get_param_str("lpc_type",params,"htk")); + db->lpc_order = get_param_int("lpc_order",params,18); + db->lpc_frame_shift = get_param_float("lpc_frame_shift",params,0.010); + db->lpc_res_offset = get_param_float("lpc_res_offset",params, + db->lpc_frame_shift); + db->lpc_frame_offset = get_param_int("lpc_frame_offset",params,1); + if (streq("t",get_param_str("lpc_pitch_synch",params,"f"))) + db->lpc_pitch_synch = TRUE; + db->pitch_dir = wstrdup(get_param_str("pitch_dir",params,"pitch/")); + db->pitch_ext = wstrdup(get_param_str("pitch_ext",params,".pm")); + db->samp_freq = get_param_int("samp_freq",params,16000); + // old name was "group_type", new name "group_encoding" + db->group_encoding_str=wstrdup(get_param_str("group_type",params,"raw")); + db->group_encoding_str = wstrdup(get_param_str("group_encoding",params, + db->group_encoding_str)); + db->def_f0 = get_param_int("def_f0",params,125); + if (streq(db->group_encoding_str,"raw")) + db->group_encoding = di_raw; + else if (streq(db->group_encoding_str,"ulaw")) + db->group_encoding = di_ulaw; + else + { + cerr << "Diphone: unknown group encoding" << endl; + festival_error(); + } + db->phoneset = wstrdup(get_param_str("phoneset",params,"none")); + nphones = phoneset_name_to_set(db->phoneset)->num_phones(); + db->ndiphs = nphones*nphones; // max limit of diphones + db->ndiphs = get_param_int("num_diphones",params,db->ndiphs); + db->nindex = db->ndiphs; + db->sig_band = get_param_int("sig_band",params,400); +} + +static void delete_diphone_db(DIPHONE_DATABASE *db) +{ + int i,j; + + wfree(db->type_str); + wfree(db->name); + wfree(db->index_file); + wfree(db->signal_dir); + wfree(db->signal_ext); + wfree(db->signal_type); + wfree(db->lpc_dir); + wfree(db->lpc_ext); + wfree(db->lpc_res_ext); + wfree(db->pitch_dir); + wfree(db->pitch_ext); + wfree(db->phoneset); + wfree(db->group_encoding_str); + wfree(db->sig_access_type_str); + gc_unprotect(&db->alternates_before); + gc_unprotect(&db->alternates_after); + + if (db->gtype == di_grouped) + { + wfree(db->indx[0]->diph); // ptr to the whole diphname table + wfree(db->allsignal); + wfree(db->allulawsignal); + wfree(db->allframes); + } + for (i=0; i < db->nindex; i++) + { + if (db->gtype == di_ungrouped) + { + wfree(db->indx[i]->diph); + wfree(db->indx[i]->file); + } + wfree(db->indx[i]); + if ((db->gtype == di_ungrouped) || + (db->group_encoding != di_raw)) + wfree(db->vox[i]->signal); + wfree(db->vox[i]); + wfree(db->pm[i]->mark); + wfree(db->pm[i]); + if (db->lpc[i]->f != 0) + { + if ((db->gtype == di_ungrouped) || // when they are not ptrs + (db->group_encoding != di_raw)) // into db->allframes + for (j=0; j < db->lpc[i]->nframes; j++) + wfree(db->lpc[i]->f[j]); + wfree(db->lpc[i]->f); + } + wfree(db->lpc[i]); + } + wfree(db->indx); + wfree(db->vox); + wfree(db->pm); + wfree(db->lpc); + wfree(db->offsets); + if (db->gfd != NULL) + fclose(db->gfd); + +} + +DIPHONE_DATABASE *make_diphone_db(void) +{ + // Allocate a new database structure + DIPHONE_DATABASE *db = walloc(DIPHONE_DATABASE,1); + memset(db,0,sizeof(DIPHONE_DATABASE)); // all zeroed + + db->type = di_pcm; + db->gtype = di_ungrouped; + db->swap = FALSE; + + db->index_file = 0; + db->signal_dir = 0; + db->pitch_dir = 0; + db->xfd = 0; + db->samp_freq = 0; + db->sig_band = 0; + db->phoneset = 0; + db->alternates_before = NIL; + db->alternates_after = NIL; + db->allsignal = 0; + db->allulawsignal = 0; + db->offsets = 0; + db->gfd = 0; + db->default_diphone = 0; + db->lpc_pitch_synch = FALSE; + + db->nindex = 0; + + return db; +} + +LISP FT_Diphone_Synthesize_Utt(LISP utt) +{ + EST_Utterance *u = get_c_utt(utt); + DIPHONE_SPN *ps; + DIPHONE_ACOUSTIC *as; + DIPHONE_OUTPUT *output; + EST_Item *item=0; + + *cdebug << "Diphone module" << endl; + + if (di_db == 0) + { + cerr << "Diphone: no diphone database loaded" << endl; + festival_error(); + } + + // Find diphone names (_ and $ etc) if specified for this db + apply_hooks(siod_get_lval("diphone_module_hooks",NULL),utt); + + /* Build structure */ + merge_silences(*u); + ps = make_spn(*u); // get info from utterance structure + + output = make_output(di_db,(ps->p_sz != 0 ? ps->cum_dur[ps->p_sz-1] : 0)); + + if (ps->p_sz < 2) + { // just build a silence of the specified time + make_silence(di_db,ps,output); + } + else + { + as = make_as(ps); + di_diphones(ps,di_db); + di_calc_pitch(di_db,ps,as); + di_frame_select(di_db,ps,as); + if (di_db->type == di_lpc) + di_reslpc(di_db,as,output); // residual excited LPC + else if (di_db->type == di_pcm) + di_psola_tm(di_db,as,output); // can't be distributed + else + { + cerr << "Diphone: unsupported database form\n"; + festival_error(); + } + delete_as(as); + } + + delete_spn(ps); + + // Add wave into utterance + EST_Wave *w = new EST_Wave; + w->resize(output->o_sz,1,0); + w->set_sample_rate(di_db->samp_freq); + for (int i=0; ilength(); i++) + w->a_no_check(i) = output->track[i]; + + item = u->create_relation("Wave")->append(); + item->set_val("wave",est_val(w)); + + delete_output(output); + + return utt; +} + +LISP FT_reslpc_resynth(LISP file) +{ + EST_Utterance *u = new EST_Utterance; + EST_Item *item = 0; + DIPHONE_ACOUSTIC *as; + DIPHONE_OUTPUT *output; + + if (di_db == 0) + { + cerr << "Diphone: no diphone database loaded" << endl; + festival_error(); + } + + as = walloc(DIPHONE_ACOUSTIC,1); + memset(as,0,sizeof(DIPHONE_ACOUSTIC)); + output = make_output(di_db,0); + + reslpc_resynth(get_c_string(file),di_db,as,output); + + wfree(as); + + // Add wave into utterance + EST_Wave *w = new EST_Wave; + w->resize(output->o_sz-(2*di_db->sig_band),1); + for (int i=0; ilength(); i++) + w->a_no_check(i) = output->track[di_db->sig_band+i]; + w->set_sample_rate(di_db->samp_freq); + delete_output(output); + + item = u->create_relation("Wave")->append(); + item->set_val("wave",est_val(w)); + + return siod(u); +} + +static void make_silence(DIPHONE_DATABASE *db,DIPHONE_SPN *ps,DIPHONE_OUTPUT *o) +{ + // Make the output directly as silence + int sil_size = (int)((float)db->samp_freq * 0.020); + + if ((ps->p_sz > 0) && + (ps->cum_dur[ps->p_sz-1] > sil_size)) + sil_size = ps->cum_dur[ps->p_sz-1]; + + if (o->o_max < (sil_size+2*db->sig_band)) + { + wfree(o->track); + o->track = walloc(short,(sil_size+2*db->sig_band)); + o->o_max = (sil_size+2*db->sig_band); + } + + memset(o->track,0,o->o_max*sizeof(short)); + o->o_sz = o->o_max; + +} + +static void merge_silences(EST_Utterance &u) +{ + // Remove multiple silences + EST_Item *s, *ns; + + for (s=u.relation("Segment")->first(); s != 0; s=ns) + { + ns = s->next(); + if ((ns != 0) && + (ph_is_silence(s->name())) && + (s->name() == ns->name())) // same *type* of silence + { + // delete this segment + remove_item(s,"Segment"); + } + } +} + +static void di_diphones(DIPHONE_SPN *ps, DIPHONE_DATABASE *database) +{ + int ph=0; + + ps->cum_dur[0] = 0; + + for(ph=0;php_sz;ph++) + { + ps->cum_dur[ph+1] = ps->duration[ph] + ps->cum_dur[ph]; + } + + for(ph=0;php_sz-1;ph++) + { + sprintf(ps->diphs[ph],"%s-%s",ps->phons[ph],ps->phons[ph+1]); + ps->ref[ph] = lookupd(database,ps->phons[ph],ps->phons[ph+1]); + *cdebug << ps->diphs[ph] << " " << + ((database->indx[ps->ref[ph]]->file == 0) ? "grouped" : + database->indx[ps->ref[ph]]->file) << endl; + load_pitch_file(database,ps->ref[ph],di_direct); // ensure its loaded + } +} + +static int lookupd(DIPHONE_DATABASE *database,char *p1, char *p2) +{ + // This should be made much faster + int i; + char diphone[20]; + LISP alt1, alt2; + + sprintf(diphone,"%s-%s",p1,p2); + for(i=0;inindex;i++) + { + if(!strcmp(database->indx[i]->diph,diphone)) + return i; + } + + // Try alternates + alt2 = siod_assoc_str(p2,database->alternates_after); + if (alt2 != NIL) + { + sprintf(diphone,"%s-%s",p1,get_c_string(car(cdr(alt2)))); + for(i=0;inindex;i++) + if(!strcmp(database->indx[i]->diph,diphone)) + { + *cdebug << "Diphone alternate: " << diphone + << " substituted for " << p1 << "-" << p2 << endl; + return i; + } + } + alt1 = siod_assoc_str(p1,database->alternates_before); + if (alt1 != NIL) + { + sprintf(diphone,"%s-%s",get_c_string(car(cdr(alt1))),p2); + for(i=0;inindex;i++) + if(!strcmp(database->indx[i]->diph,diphone)) + { + *cdebug << "Diphone alternate: " << diphone + << " substituted for " << p1 << "-" << p2 << endl; + return i; + } + } + + if ((alt2 != NIL) && (alt1 != NIL)) + { + sprintf(diphone,"%s-%s", + get_c_string(car(cdr(alt1))), + get_c_string(car(cdr(alt2)))); + for(i=0;inindex;i++) + if(!strcmp(database->indx[i]->diph,diphone)) + { + *cdebug << "Diphone alternate: " << diphone + << " substituted for " << p1 << "-" << p2 << endl; + return i; + } + } + + // If underscores and dollars exist do it again without them + if ((EST_String(p1).contains(RXdi_ud)) || + (EST_String(p2).contains(RXdi_ud))) + { + EST_String dp1=p1, dp2=p2; + // fprintf(stddebug,"Diphone: removing _ and $\n"); + dp1.gsub(RXdi_ud,""); + dp2.gsub(RXdi_ud,""); + return lookupd(database,dp1,dp2); + } + + if ((database->default_diphone != NULL) && + (!streq(database->default_diphone,""))) + { + fprintf(stderr,"Diphone: using default diphone %s for %s-%s\n", + database->default_diphone,p1,p2); + for(i=0;inindex;i++) + if(!strcmp(database->indx[i]->diph,database->default_diphone)) + { + *cdebug << "Diphone alternate: " << diphone + << " substituted for " << p1 << "-" << p2 << endl; + return i; + } + } + + fprintf(stderr,"Diphone: diphone (or alternates) not found: %s\n",diphone); + festival_error(); + return 0; +} + +static DIPHONE_OUTPUT *make_output(DIPHONE_DATABASE *database,int samps) +{ + // Alloc the output buffer + int nos; + DIPHONE_OUTPUT *o = walloc(DIPHONE_OUTPUT,1); + + // estimate the number of samples + nos = (int)((float)samps*1.1)+(2*database->sig_band); + o->o_sz = 0; + o->o_max = nos; + o->track = walloc(short,nos); + + return o; +} + +static void delete_output(DIPHONE_OUTPUT *o) +{ + + wfree(o->track); + wfree(o); + +} + +static DIPHONE_SPN *make_spn(EST_Utterance &u) +{ + // Build initial structure for donovan code + DIPHONE_SPN *ps = walloc(DIPHONE_SPN,1); + EST_Relation *seg = u.relation("Segment"); + EST_Relation *targ = u.relation("Target"); + EST_Item *s; + LISP cps; + const char *ph_name; + EST_Item *rt; + int i; + float pos,seg_start,seg_dur,seg_end; + + ps->p_sz = seg->length(); + ps->p_max = ps->p_sz+1; + ps->t_sz = 0; + ps->phons = walloc(char *,ps->p_max); + ps->duration = walloc(int,ps->p_max); + ps->cum_dur = walloc(int,ps->p_max); + ps->diphs = walloc(char *,ps->p_max); + ps->ref = walloc(int,ps->p_max); + ps->pm_req = walloc(int,ps->p_max); + ps->pm_giv = walloc(int,ps->p_max); + ps->pm_chg_diph = walloc(int,ps->p_max); + for (i=0; ip_sz; i++) + ps->diphs[i] = walloc(char,20); + + ps->t_max = num_leaves(targ->head())+4; + ps->pc_targs = walloc(int,ps->t_max); + ps->targ_phon = walloc(int,ps->t_max); + ps->targ_freq = walloc(int,ps->t_max); + ps->abs_targ = walloc(int,ps->t_max); + + // Ensure there is a target at the start + if ((targ->length() == 0) || (targ->first_leaf()->F("pos") != 0)) + { + ps->targ_phon[0] = 0; + if (targ->length() == 0) + ps->targ_freq[0] = di_db->def_f0; + else + ps->targ_freq[0] = targ->first()->I("f0"); + ps->pc_targs[0] = 0; + ps->t_sz++; + } + seg_end = 0; + for (i=0,s=seg->first(); s != 0; s=s->next(),i++) + { + seg_start = seg_end; + seg_end = s->F("end"); + seg_dur = seg_end - seg_start; + EST_String pph_name; // possibly calculated phone name + if ((pph_name=s->f("diphone_phone_name").string()) != "0") + ph_name = pph_name; + else if (((cps=ft_get_param("PhoneSet")) == NIL) || + ((streq(get_c_string(cps),di_db->phoneset)))) + ph_name = s->name(); + else + ph_name = map_phone(s->name(),get_c_string(cps),di_db->phoneset); + ps->phons[i] = wstrdup(ph_name); + ps->duration[i] = (int)(seg_dur * di_db->samp_freq); /* in frames */ + if (i>0) + ps->cum_dur[i] = ps->cum_dur[i-1]; + else + ps->cum_dur[i] = 0; + ps->cum_dur[i] += ps->duration[i]; + for (rt = daughter1(s,"Target"); + rt != 0; + rt = rt->next(),ps->t_sz++) + { + ps->targ_phon[ps->t_sz] = i; + ps->targ_freq[ps->t_sz] = rt->I("f0"); + pos = ((rt->F("pos") - seg_start) / seg_dur) * 99.9; + ps->pc_targs[ps->t_sz] = (int)pos; + } + } + // Ensure there is a target at the end + if ((targ->length() == 0) || + (targ->last_leaf()->F("pos") != seg->last_leaf()->F("end"))) + { + ps->targ_phon[ps->t_sz] = i-1; + if (targ->length() == 0) + ps->targ_freq[ps->t_sz] = di_db->def_f0; + else + ps->targ_freq[ps->t_sz] = targ->last_leaf()->I("f0"); + ps->pc_targs[ps->t_sz] = 100; + ps->t_sz++; + } + + + return ps; + +} + +static void delete_spn(DIPHONE_SPN *ps) +{ + // claim back the space from ps + int i; + + if (ps == NULL) + return; + + for (i=0; ip_sz; i++) + { + wfree(ps->diphs[i]); + wfree(ps->phons[i]); + } + wfree(ps->diphs); + wfree(ps->phons); + wfree(ps->duration); + wfree(ps->cum_dur); + wfree(ps->ref); + wfree(ps->pm_req); + wfree(ps->pm_giv); + wfree(ps->pm_chg_diph); + + wfree(ps->pc_targs); + wfree(ps->targ_phon); + wfree(ps->targ_freq); + wfree(ps->abs_targ); + + wfree(ps); + + return; +} + +static DIPHONE_ACOUSTIC *make_as(DIPHONE_SPN *ps) +{ + DIPHONE_ACOUSTIC *as = walloc(DIPHONE_ACOUSTIC,1); + int npp = (int)((float)ps->cum_dur[ps->p_sz-1]/ + ((float)di_db->samp_freq/1000.0)); + + as->p_sz = 0; + as->p_max = npp; + + as->pitch = walloc(int,npp); + as->cum_pitch = walloc(int,npp+1); + as->inwidth = walloc(int,npp+1); + as->outwidth = walloc(int,npp+1); + as->phon_ref = walloc(int,npp); + as->phon_mark = walloc(int,npp); + as->diph_ref = walloc(int,npp); + as->diph_mark = walloc(int,npp); + + return as; +} + +static void delete_as(DIPHONE_ACOUSTIC *as) +{ + + if (as == NULL) + return; + wfree(as->pitch); + wfree(as->cum_pitch); + wfree(as->inwidth); + wfree(as->outwidth); + wfree(as->phon_ref); + wfree(as->phon_mark); + wfree(as->diph_ref); + wfree(as->diph_mark); + wfree(as); + + return; +} + +void di_add_diphonedb(DIPHONE_DATABASE *db) +{ + // Add this to list of loaded diphone dbs and select it + LISP lpair; + DIPHONE_DATABASE *ddb; + + if (diphone_dbs == NIL) + gc_protect(&diphone_dbs); + + lpair = siod_assoc_str(db->name,diphone_dbs); + + if (lpair == NIL) + { // new diphone db of this name + diphone_dbs = cons(cons(rintern(db->name), + cons(siod(db),NIL)), + diphone_dbs); + } + else + { // already one of this name, don't know howto free it + cerr << "Diphone: warning redefining diphone database " + << db->name << endl; + ddb = diphone_db(car(cdr(lpair))); + delete_diphone_db(ddb); + setcar(cdr(lpair),siod(db)); + } + + di_db = db; + +} + +LISP FT_Diphone_select(LISP name) +{ + // Select diphone set + LISP lpair; + + lpair = siod_assoc_str(get_c_string(name),diphone_dbs); + + if (lpair == NIL) + { + cerr << "Diphone: no diphone database named " << get_c_string(name) + << " defined\n"; + festival_error(); + } + else + { + DIPHONE_DATABASE *db = diphone_db(car(cdr(lpair))); + di_db = db; + } + + return name; +} + +LISP FT_Diphone_group(LISP name,LISP lfilename) +{ + + FT_Diphone_select(name); + di_save_grouped_db(get_c_string(lfilename),di_db); + + return NIL; +} + +static LISP FT_Diphone_list_dbs(void) +{ + // List names of currently loaded dbs + LISP names,n; + + for (names=NIL,n=diphone_dbs; n != NIL; n=cdr(n)) + names = cons(car(car(n)),names); + return reverse(names); +} + +void festival_diphone_init(void) +{ + + // New diphone synthesizer + proclaim_module("diphone", &di_description); +#ifdef SUPPORT_PSOLA_TM + // Only available for research + proclaim_module("di_psolaTM"); +#endif + + init_subr_1("Diphone_Init",FT_Diphone_Load_Diphones, + "(Diphone_Init PARAMS)\n\ + Initialize a general diphone database. PARAMS are an assoc list\n\ + of parameter name and value. [see Diphone_Init]"); + festival_def_utt_module("Diphone_Synthesize",FT_Diphone_Synthesize_Utt, + "(Diphone_Synthesize UTT)\n\ + Synthesize a waveform using the currently selected diphone database.\n\ + This is called from Synthesize when the Synth_Method Parameter has the\n\ + value Diphone. [see Diphone synthesizer]"); +// proclaim_module("Diphone_Synthesize",&di_description); +// init_module_subr("Diphone_Synthesize",FT_Diphone_Synthesize_Utt, +// &di_description); + init_subr_1("Diphone.select",FT_Diphone_select, + "(Diphone.select DB_NAME)\n\ + Select a preloaded diphone set named DB_NAME, diphone sets are\n\ + identified by their name parameter."); + init_subr_2("Diphone.group",FT_Diphone_group, + "(Diphone.group DB_NAME GROUPFILE)\n\ + Create a group file for DB_NAME in GROUPFILE. A group file is a saved\n\ + image of the database containing only the necessary information.\n\ + It is an efficient way for loading and using diphone databases\n\ + and is the recommended format for diphone databases for normal\n\ + use. [see Group files]"); + init_subr_1("reslpc_resynth",FT_reslpc_resynth, + "(reslpc_resynth FILENAME)\n\ + Resynthesize FILENAME using LPC plus residual. The current diphone\n\ + database must have definitions for LPC for this to work."); + init_subr_0("Diphone.list",FT_Diphone_list_dbs, + "(Diphone.list)\n\ + List the names of the currently loaded diphone databases."); + init_subr_2("diphone.oc",find_optimal_coupling, + "(diphone.oc TABLE WEIGHTS)\n\ + For a table of units (diphones) find the optimal coupling points. WEIGHTS\n\ + is a list of weights to apply to the vector coefficients used in measuring\n\ + the goodness of the joins."); + +} diff --git a/src/modules/diphone/diphone.h b/src/modules/diphone/diphone.h new file mode 100644 index 0000000..737c9a9 --- /dev/null +++ b/src/modules/diphone/diphone.h @@ -0,0 +1,202 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alistair Conkie */ +/* Date : August 1996 */ +/*-----------------------------------------------------------------------*/ + +#ifndef __DIPHONE_H_ +#define __DIPHONE_H_ + +#include "EST_Pathname.h" + +enum di_sigaccess_t {di_direct, di_dynamic, di_ondemand}; +enum di_db_type_t {di_pcm, di_lpc}; +enum di_group_encode_t {di_raw, di_ulaw }; +enum di_group_t {di_grouped, di_ungrouped}; + +typedef struct { + char *diph; + char *file; + float beg; + float mid; + float end; +} DI_INDEX; + +typedef struct { + unsigned short nsamples; + short *signal; +} DI_VOX; + +typedef struct { + unsigned short nmark; + unsigned short lmark; + unsigned short rmark; + unsigned short *mark; +} DI_PM; + +typedef struct { + unsigned short nframes; + float **f; +} DI_LPC; + +typedef struct DD_STRUCT { + char *name; + char *type_str; + enum di_db_type_t type; /* pcm or lpc */ + enum di_group_t gtype; /* grouped or ungrouped */ + char *index_file; + char *signal_dir; + char *signal_ext; + char *signal_type; + char *lpc_dir; + char *lpc_ext; + char *lpc_res_ext; + char *lpc_res_type; + float lpc_res_offset; + int lpc_frame_offset; + EST_String lpc_type; + int lpc_order; + float lpc_frame_shift; /* in seconds */ + char *pitch_dir; + char *pitch_ext; + char *sig_access_type_str; + enum di_sigaccess_t sig_access_type; + char *group_encoding_str; + enum di_group_encode_t group_encoding; + LISP alternates_before; + LISP alternates_after; + int def_f0; + FILE *xfd; + FILE *gfd; /* the group file */ + int gsignalbase; /* offset in group file where signals start */ + int gframebase; /* offset in group file where frames start */ + int samp_freq; + int sig_band; /* the safety zone in samples round each signal */ + char *phoneset; + int ndiphs; + int swap; + char *default_diphone; /* when all else fails */ + int lpc_pitch_synch; /* True if lpc frames are pitch synchronous */ + + short *allsignal; /* used in group files */ + unsigned char *allulawsignal; + float *allframes; + short *allframesshort; + + int nindex; + int zone; /* bytes */ + int *offsets; /* used for indexing signals in group files */ + int *frameoffsets; /* used for indexing frames in group files */ + DI_INDEX **indx; + DI_VOX **vox; + DI_PM **pm; + DI_LPC **lpc; + +} DIPHONE_DATABASE; + +typedef struct { + int p_sz; + int p_max; + int p_tmp; + int *pitch; + int *cum_pitch; + int *phon_ref; + int *phon_mark; + int *diph_ref; + int *diph_mark; + int *inwidth; + int *outwidth; +} DIPHONE_ACOUSTIC; + +typedef struct { + int p_sz; + int p_max; + char **phons; + int *duration; + int *cum_dur; + int *ref; + int *pm_req; + int *pm_giv; + int *pm_chg_diph; /* phoneme-based info */ + + char **diphs; + + int t_sz; + int t_max; + int *pc_targs; + int *targ_phon; + int *targ_freq; + int *abs_targ; /* maybe in samples */ + +} DIPHONE_SPN; + +typedef struct { + int o_sz; + int o_max; + short *track; +} DIPHONE_OUTPUT; + +void di_load_database(DIPHONE_DATABASE *database); +void di_calc_pitch(DIPHONE_DATABASE *config, + DIPHONE_SPN *ps, + DIPHONE_ACOUSTIC *as); +void di_psola_tm(DIPHONE_DATABASE *db, + DIPHONE_ACOUSTIC *as, + DIPHONE_OUTPUT *output); +void di_reslpc(DIPHONE_DATABASE *db, + DIPHONE_ACOUSTIC *as, + DIPHONE_OUTPUT *output); +void reslpc_resynth(const EST_String &file, + DIPHONE_DATABASE *db, + DIPHONE_ACOUSTIC *as, + DIPHONE_OUTPUT *output); +void di_frame_select(DIPHONE_DATABASE *db, + DIPHONE_SPN *ps, + DIPHONE_ACOUSTIC *as); +short *di_get_diph_signal(int diph,DIPHONE_DATABASE *db); +float *di_get_diph_lpc_mark(int diph,int mark,DIPHONE_DATABASE *db); +//short *di_get_diph_res_mark(int diph,int mark,DIPHONE_DATABASE *db); +short *di_get_diph_res_mark(int diph,int mark,int size,DIPHONE_DATABASE *db); + +void di_save_grouped_db(const EST_Pathname &filename, DIPHONE_DATABASE *db); +void di_load_grouped_db(const EST_Pathname &filename, DIPHONE_DATABASE *db, + LISP global_params); +void di_fixed_parameters(DIPHONE_DATABASE *db,LISP params); +void di_general_parameters(DIPHONE_DATABASE *db,LISP params); +void load_pitch_file(DIPHONE_DATABASE *database, int i, int mode); +LISP find_optimal_coupling(LISP table, LISP weights); + +void di_add_diphonedb(DIPHONE_DATABASE *db); +DIPHONE_DATABASE *make_diphone_db(void); + +#endif /* __DIPHONE_H__ */ diff --git a/src/modules/diphone/oc.cc b/src/modules/diphone/oc.cc new file mode 100644 index 0000000..a2446fe --- /dev/null +++ b/src/modules/diphone/oc.cc @@ -0,0 +1,304 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : December 1996 */ +/*-----------------------------------------------------------------------*/ +/* Optimal Coupling for two pieces of speech */ +/* */ +/* Given two Tracks find the best point (minimised weighted euclidean */ +/* distance between two vectors). */ +/* */ +/*=======================================================================*/ +#include +#include "festival.h" + +#define LEFT_PHONE(x) (car(x)) +#define RIGHT_PHONE(x) (car(cdr(x))) +#define FILEID(x) (car(cdr(cdr(x)))) +#define START(x) (car(cdr(cdr(cdr(x))))) +#define MID(x) (car(cdr(cdr(cdr(cdr(x)))))) +#define END(x) (car(cdr(cdr(cdr(cdr(cdr(x))))))) + +static int get_track_for_phone(EST_Track &a,const EST_String &fileid,float st,float end); +static LISP before_ds(LISP d1, LISP ds); +static LISP after_ds(LISP d1, LISP ds); +static float find_best_left(LISP d,LISP ds,LISP weights); +static float find_best_right(LISP d,LISP ds,LISP weights); +static float frametoms(int frame,float frame_shift); +static int mstoframe(float ms,float frame_shift); +static float frame_distance(EST_Track &a, int fa, + EST_Track &b, int fb, + int size, double *weights); + +static EST_String coeffs_dir = "coeffs/"; +static EST_String coeffs_ext = ".cep"; + +LISP find_optimal_coupling(LISP table, LISP weights) +{ + // For each diphone description in table find the best overall + // join point between it and all other diphone sthat can join with it + LISP d,newtab,newent; + float best_left,best_right; + + newtab = NIL; + gc_protect(&newtab); + coeffs_dir = get_c_string(siod_get_lval("oc_coeffs_dir","no coeffs dir")); + coeffs_ext = get_c_string(siod_get_lval("oc_coeffs_ext","no coeffs ext")); + + for (d=table; d != NIL; d=cdr(d)) + { + pprint(car(d)); + if (ph_is_silence(get_c_string(LEFT_PHONE(car(d))))) + best_left = get_c_float(START(car(d))); + else + best_left=find_best_left(car(d),before_ds(car(d),table),weights); + if (ph_is_silence(get_c_string(RIGHT_PHONE(car(d))))) + best_right = get_c_float(END(car(d))); + else + best_right=find_best_right(car(d),after_ds(car(d),table),weights); + newent = cons(LEFT_PHONE(car(d)), // left phone + cons(RIGHT_PHONE(car(d)), // right phone + cons(FILEID(car(d)), // file_id + cons(flocons(best_left), // left cut point + cons(MID(car(d)), // mid point + cons(flocons(best_right),NIL)))))); // right cut point + newtab = cons(newent,newtab); + } + newtab = reverse(newtab); + gc_unprotect(&newtab); + return newtab; +} + +static int get_track_for_phone(EST_Track &a, const EST_String &fileid, float st, float end) +{ + // Load EST_Track from fileid from st to end (in ms) + EST_Track whole; + int start_frame, end_frame, i; + + if (whole.load(coeffs_dir+fileid+coeffs_ext) != 0) + return -1; + start_frame = mstoframe(st-12.8,whole.shift()); + end_frame = mstoframe(end-12.8,whole.shift())+1; + a.resize(end_frame-start_frame,whole.num_channels()); + for (i=start_frame; i < end_frame; i++) + a.copy_frame_in(i-start_frame, + whole, i, 0, + 0, whole.num_channels()); + a.fill_time(whole.shift()); + + return 0; +} + +static LISP before_ds(LISP d1, LISP ds) +{ + // Return all entries in ds whose cadr equals d1's car + // i.e. all diphones which match d1's left phone + LISP m=NIL,l; + + for (l=ds; l != NIL; l=cdr(l)) + if (streq(get_c_string(car(d1)),get_c_string(car(cdr(car(l)))))) + m=cons(car(l),m); + return m; +} + +static LISP after_ds(LISP d1, LISP ds) +{ + // Return all entries in ds whose car equals d1's cadr + // i..e all diphones which match d1's right phone + LISP m=NIL,l; + + for (l=ds; l != NIL; l=cdr(l)) + if (streq(get_c_string(car(cdr(d1))),get_c_string(car(car(l))))) + m=cons(car(l),m); + return m; +} + +static float find_best_left(LISP d,LISP ds,LISP weights) +{ + // Find the best join point with each of phones described + // in d + EST_Track a,b; + LISP l; + int i,j,best,bestj;; + double b_dist,dist; + float best_pos; + + get_track_for_phone(a,get_c_string(FILEID(d)), + get_c_float(START(d)),get_c_float(MID(d))); + + // Cummulate the costs for each possible cut point + double *counts = new double[a.num_frames()]; + for (i=0; i < a.num_frames(); i++) + counts[i] = 0; + double *w = new double[siod_llength(weights)]; + for (l=weights,i=0; i < siod_llength(weights); i++,l=cdr(l)) + w[i] = get_c_float(car(l)); + + for (l=ds; l != NIL; l=cdr(l)) + { // for each matching phone + get_track_for_phone(b,get_c_string(FILEID(car(l))), + get_c_float(MID(car(l))),get_c_float(END(car(l)))); + best=1; + + b_dist = frame_distance(a, 1, b, 0, a.num_channels(),w); + + for (i=1; i < a.num_frames()-1; i++) + { + for (j=0; j < b.num_frames(); j++) + { + dist = frame_distance(a, i, b, j, a.num_channels(),w); + if (dist < b_dist) + { + b_dist = dist; + best = i; + bestj = j; + } + } + } + // You should probably find minimise the std +// printf("best pos %d %s-%s %s-%s\n",best, +// get_c_string(LEFT_PHONE(d)),get_c_string(RIGHT_PHONE(d)), +// get_c_string(LEFT_PHONE(car(l))),get_c_string(RIGHT_PHONE(car(l)))); + counts[best] += 1; // sum the best possible +// counts[best] += b_dist; // sum the best possible + } + + // Now find out the best position + best = 0; + for (i=0; i < a.num_frames(); i++) + { + if (counts[i] > counts[best]) + best = i; + } + + // Change frame number back to ms offset + best_pos = frametoms(mstoframe(get_c_float(START(d)),a.shift()) + + best,a.shift()); + delete counts; + delete w; + return best_pos; +} + +static float find_best_right(LISP d,LISP ds,LISP weights) +{ + // Find the best join point with each of phones described + // in d + EST_Track a,b; + LISP l; + int i,j,best,bestj;; + double b_dist,dist; + float best_pos; + + get_track_for_phone(a,get_c_string(FILEID(d)), + get_c_float(MID(d)),get_c_float(END(d))); + + // Cummulate the costs for each possible cut point + double *counts = new double[a.num_frames()]; + for (i=0; i < a.num_frames(); i++) + counts[i] = 0; + double *w = new double[siod_llength(weights)]; + for (l=weights,i=0; i < siod_llength(weights); i++,l=cdr(l)) + w[i] = get_c_float(car(l)); + + for (l=ds; l != NIL; l=cdr(l)) + { // for each matching phone + get_track_for_phone(b,get_c_string(FILEID(car(l))), + get_c_float(START(car(l))), + get_c_float(MID(car(l)))); + best=1; + b_dist = frame_distance( a, 1, b, 0, a.num_channels(),w); + for (i=1; i < a.num_frames()-1; i++) + { + for (j=0; j < b.num_frames(); j++) + { + dist = frame_distance( a, i, b, j, a.num_channels(),w); + if (dist < b_dist) + { + b_dist = dist; + best = i; + bestj = j; + } + } + } + // You should probably find minimise the std + counts[best] += 1; // sum the best possible +// counts[best] += b_dist; // sum the best possible + } + + // Now find out the best position + best = 0; + for (i=0; i < a.num_frames(); i++) + { + if (counts[i] > counts[best]) + best = i; + } + + // Change frame number back to ms offset + best_pos = frametoms(mstoframe(get_c_float(MID(d)),a.shift()) + + best,a.shift()); + delete counts; + delete w; + return best_pos; +} + +static float frametoms(int frame,float frame_shift) +{ + return (frame*frame_shift)*1000.0; +} + +static int mstoframe(float ms,float frame_shift) +{ + return (int)((ms/1000.0)/frame_shift); +} + +// RJC - change for Track reorg. + +static float frame_distance(EST_Track &a, int fa, + EST_Track &b, int fb, + int size, double *weights) +{ + float cost = 0.0,diff; + int i; + + for (i=0; i < size; i++) + { + if (weights[i] != 0.0) + { + diff = (a(fa,i)-b(fb,i)); + cost += diff*diff*weights[i]; + } + } + + return cost; +} + diff --git a/src/modules/donovan/Makefile b/src/modules/donovan/Makefile new file mode 100644 index 0000000..b3c9a03 --- /dev/null +++ b/src/modules/donovan/Makefile @@ -0,0 +1,55 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996,1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## The code in this directory was developed by ## +## Steve Isard and Alistair Conkie ## +########################################################################### +TOP=../../.. +DIRNAME=src/modules/donovan +H = t2s.h donovan.h +SRCSC = makewav.c load_diphs.c coeffs.c excitation.c pitch.c durations.c +SRCSCXX = donovan.cc +SRCS = $(SRCSC) $(SRCSCXX) + +OBJS = $(SRCSC:.c=.o) $(SRCSCXX:.cc=.o) + +FILES = Makefile $(SRCS) $(H) + +LOCAL_INCLUDES = -I../include + +INLIB = $(TOP)/src/lib/libFestival.a + +ALL = .buildlib + +include $(TOP)/config/common_make_rules + + diff --git a/src/modules/donovan/coeffs.c b/src/modules/donovan/coeffs.c new file mode 100644 index 0000000..5ce888c --- /dev/null +++ b/src/modules/donovan/coeffs.c @@ -0,0 +1,72 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Steve Isard */ +/* Date : 1984 */ +/* This version was gifted by Steve for this new */ +/* copyright, the original retains their original copyright */ +/* */ +/*************************************************************************/ +#include +#include +#include "t2s.h" + +/* transform a set of reflection coefficients to linear prediction + * coefficients using algorithm from HCL-H + */ + +void rfctolpc(float *buf) +{ + float a,b; + register float *cptr; + register int n,k; + + /* HCL-H algorithm goes through coeffs in reverse of order in which they + * appear here, working on buffer numbered from 1 upwards. To get same + * effect, we make cptr point at space just after last coeff and let n and + * k go down from -1 instead of up from 1. + */ + cptr = buf + NCOEFFS; /* There should be NCOEFFS coeffs. Point just + * after last one. + */ + + for(n = -1; n >= -NCOEFFS; n--) { + *(cptr+n) = -(*(cptr+n)); + for(k = -1; 2*k >= n; k--) { + a = *(cptr+k); + b = *(cptr+n-k); + *(cptr+k) = a - b * *(cptr+n); + *(cptr+n-k) = b - a * *(cptr+n); + } + } +} + diff --git a/src/modules/donovan/donovan.cc b/src/modules/donovan/donovan.cc new file mode 100644 index 0000000..18e8cb7 --- /dev/null +++ b/src/modules/donovan/donovan.cc @@ -0,0 +1,360 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black (Steve Isard and Alistair Conkie) */ +/* Date : July 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Interface to Donovan (Isard) LPC diphone code */ +/* */ +/* Uses the FreeSpeech version of the code. Those .c files are */ +/* with both systems and hopefully will continue to be so. Only the */ +/* necessary parts of FreeSpeech are included (the waveform synthesizer) */ +/* The rest is plugged in from this file */ +/* */ +/* Note the FreeSpeech code is GNU Copyright not like the above */ +/* */ +/*=======================================================================*/ +#include +#include +#include "festival.h" +#include "donovan.h" + +static CONFIG *make_config(void); +static void delete_config(CONFIG *config); +static SPN *make_spn(EST_Utterance &u); +static void delete_spn(SPN *ps); +static ACOUSTIC *make_as(SPN *ps); +static void delete_as(ACOUSTIC *ss); + +static CONFIG *don_config=0; +static short *outbuff = 0; +static int outpos = 0; +static int outmax = 0; + +static void don_make_silence(int duration); + +LISP FT_Donovan_Load_Diphones(LISP params) +{ + if (don_config != 0) + delete_config(don_config); + don_config = make_config(); + + don_config->index_file=wstrdup(get_param_str("index_file",params,"index")); + don_config->diphone_file = + wstrdup(get_param_str("diphone_file",params,"diphs")); + + if (load_speech(don_config) != 0) // load in the diphones + festival_error(); + + return NIL; +} + +static void delete_config(CONFIG *config) +{ + (void) config; +} + +static CONFIG *make_config(void) +{ + // Allocate a new config structure + CONFIG *config = walloc(CONFIG,1); + config->input_file = 0; + config->output_file = 0; + config->index_file = 0; + config->diphone_file = 0; + config->hash_file = 0; + config->format = 0; + config->ifd = 0; + config->ofd = 0; + config->xfd = 0; + config->dfd = 0; + + return config; +} + +LISP FT_Donovan_Synthesize_Utt(LISP utt) +{ + EST_Utterance *u = get_c_utt(utt); + EST_Item *item = 0; + SPN *ps; + ACOUSTIC *as; + + if (nindex == 0) + { + cerr << "Festival: no donovan diphones loaded\n"; + festival_error(); + } + + don_random_seed = 1; // so resynthesizing an utterance is always the same + + /* Build structure */ + ps = make_spn(*u); // get info from utterance structure + + /* This doesn't work if there are less than two phones so */ + /* deal with those cases explicitly */ + if (ps->p_sz < 1) + outpos = 0; + else if (ps->p_sz < 2) + don_make_silence(ps->duration[0]); + else + { + as = make_as(ps); + + /* do the actual synthesis */ + phonstoframes(ps,as); + durations(ps,as); + calc_pitch(ps,as); + makewave(don_config,as); + + delete_as(as); + } + delete_spn(ps); + + // Add wave as stream into utterance + EST_Wave *w = new EST_Wave; + w->resize(outpos,1); + for (int i=0; ilength(); i++) + w->a_no_check(i) = outbuff[i]; + w->set_sample_rate(SR); + + item = u->create_relation("Wave")->append(); + item->set_val("wave",est_val(w)); + + return utt; +} + +static void don_make_silence(int duration) +{ + // Make silence for size of the first (and only) phone + // duation is in sample points + int i; + short *buff = walloc(short,duration); + for (i=0; i < duration; i++) + buff[i] = 0; + audio_play(buff,sizeof(short),duration,NULL); + wfree(buff); +} + +static SPN *make_spn(EST_Utterance &u) +{ + // Build initial structure for donovan code + SPN *ps = walloc(SPN,1); + EST_Relation *seg = u.relation("Segment"); + EST_Relation *targ = u.relation("Target"); + EST_Item *s; + LISP cps; + const char *ph_name; + EST_Item *rt; + int i,j; + float pos,seg_start,seg_dur; + + ps->p_sz = seg->length(); + ps->p_max = ps->p_sz+1; + ps->t_sz = num_leaves(targ->head()); + ps->t_max = ps->t_sz+1; + ps->phons = walloc(char *,ps->p_max); + ps->duration = walloc(int,ps->p_max); + ps->cum_dur = walloc(int,ps->p_max); + ps->pb = walloc(int,ps->p_max); + ps->scale = walloc(float,ps->p_max); + ps->diphs = walloc(char *,ps->p_max); + for (i=0; ip_sz; i++) + ps->diphs[i] = walloc(char,8); + + ps->pc_targs = walloc(int,ps->t_max); + ps->targ_phon = walloc(int,ps->t_max); + ps->targ_freq = walloc(int,ps->t_max); + ps->abs_targ = walloc(int,ps->t_max); + + for (j=i=0,s=seg->first(); s != 0; s=s->next(),i++) + { + if (((cps=ft_get_param("PhoneSet")) == NIL) || + ((streq(get_c_string(cps),"holmes")))) + ph_name = s->name(); + else + ph_name = map_phone(s->name(),get_c_string(cps),"holmes"); + ps->phons[i] = wstrdup(ph_name); + seg_start = ffeature(s,"segment_start"); + seg_dur = ffeature(s,"segment_duration"); + ps->duration[i] = (int)(seg_dur * 10000); /* in frames */ + if (i>0) + ps->cum_dur[i] = ps->cum_dur[i-1]; + else + ps->cum_dur[i] = 0; + ps->cum_dur[i] += ps->duration[i]; + for (rt = daughter1(s,"Target"); + rt != 0; + rt = rt->next(),j++) + { + ps->targ_phon[j] = i; + ps->targ_freq[j] = rt->I("f0"); + pos = ((rt->F("pos") - seg_start) / seg_dur) * 99.9; + ps->pc_targs[j] = (int)pos; + } + } + + // makewave thinks its writing to a file but its actually writting + // to an incore buffer. We allocate it here and make it bigger if + // necessary + if (outbuff != NULL) + wfree(outbuff); + if (i == 0) + outmax = 10; + else + outmax = (int)(ps->cum_dur[i-1]*1.1); + outbuff = walloc(short,outmax); + outpos = 0; + + return ps; + +} + +static void delete_spn(SPN *ps) +{ + // claim back the space from ps + int i; + + if (ps == NULL) + return; + + for (i=0; ip_sz; i++) + { + wfree(ps->diphs[i]); + wfree(ps->phons[i]); + } + wfree(ps->phons); + wfree(ps->duration); + wfree(ps->cum_dur); + wfree(ps->pb); + wfree(ps->scale); + wfree(ps->diphs); + + wfree(ps->pc_targs); + wfree(ps->targ_phon); + wfree(ps->targ_freq); + wfree(ps->abs_targ); + + wfree(ps); + + return; +} + +static ACOUSTIC *make_as(SPN *ps) +{ + ACOUSTIC *as = walloc(ACOUSTIC,1); + int nframes = ps->cum_dur[ps->p_sz-1]; + int npp = nframes * 2; + + as->p_sz = 0; + as->f_sz = 0; + as->p_max = npp; + as->f_max = nframes; + + as->mcebuf = walloc(FRAME*,nframes); + as->duration = walloc(short,nframes); + as->pitch = walloc(short,npp); + + return as; +} + +static void delete_as(ACOUSTIC *as) +{ + + if (as == NULL) + return; + wfree(as->mcebuf); + wfree(as->duration); + wfree(as->pitch); + wfree(as); + + return; +} + +void as_realloc(int nframes, int npp, ACOUSTIC *as) +{ + // I don't think this will ever be called in Festival + (void)nframes; + (void)npp; + (void)as; + + cerr << "Donovan diphones: as_realloc called unexpectedly\n"; + festival_error(); + +} + +void audio_play(short *start,int sz,int number,CONFIG *config) +{ + // The lower level system thinks its calling an original function + // but here we intercept the output and put in a EST_Wave class + // This function will be called a number of times, but we know + // (roughly) what the maximum size will be, but just in case + // we make it bigger if necessary + (void)config; + + if (outpos+number > outmax) + { + int noutmax = (int)((float)(outpos+number)*1.1); + short *noutbuf = walloc(short,noutmax); + memmove(noutbuf,outbuff,sizeof(short)*outpos); + wfree(outbuff); + outbuff = noutbuf; + outmax = noutmax; + } + + memmove(&outbuff[outpos],start,number*sz); + outpos += number; +} + +void festival_donovan_init(void) +{ + + // Donovan (Isard) LPC diphone set + + proclaim_module("donovan"); + + init_subr_1("Donovan_Init",FT_Donovan_Load_Diphones, + "(Donovan_Init PARAMS)\n\ + Initialize the Donovan LPC diphone database. PARAMS are an assoc list\n\ + of parameter name and value. The two parameters are index_file (value is\n\ + a pathname for \"diphlocs.txt\") and diphone_file (value is a pathname\n\ + for \"lpcdiphs.bin\"). [see LPC diphone synthesizer]"); + festival_def_utt_module("Donovan_Synthesize",FT_Donovan_Synthesize_Utt, + "(Donovan_Synthesize UTT)\n\ + Synthesize a waveform using the Donovan LPC diphone synthesizer.\n\ + This is called from Synthesize when the Synth_Method Parameter has the\n\ + value Donovan. [see LPC diphone synthesizer]"); + +} + + + diff --git a/src/modules/donovan/donovan.h b/src/modules/donovan/donovan.h new file mode 100644 index 0000000..58c7d2d --- /dev/null +++ b/src/modules/donovan/donovan.h @@ -0,0 +1,56 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alan W Black */ +/* Date : July 1996 */ +/*-----------------------------------------------------------------------*/ +/* */ +/* Interface to Donovan (Isard) LPC diphone code */ +/* */ +/*=======================================================================*/ +#ifndef __DONOVAN_H__ +#define __DONOVAN_H__ + +#ifdef __cplusplus +extern "C" { +#endif + +/* Maybe cut this down a bit */ +#include "t2s.h" + +extern int don_random_seed; + +#ifdef __cplusplus +} +#endif + +#endif diff --git a/src/modules/donovan/durations.c b/src/modules/donovan/durations.c new file mode 100644 index 0000000..c9723df --- /dev/null +++ b/src/modules/donovan/durations.c @@ -0,0 +1,87 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alistair Conkie */ +/* Date : 1996 */ +/* This version was gifted by Alistair for this new */ +/* copyright, the original retains their original copyright */ +/* */ +/*************************************************************************/ +#include +#include "t2s.h" + +static int min(int a, int b); +static float fmax(float a, float b); + +void durations(SPN *ps, ACOUSTIC *as) +{ + int durdist; + int interdist; + float multiplier_i; + float proportion; + int i; + int j; + + for(i=0;ip_sz;i++) + ps->scale[i] = (float)ps->duration[i] / + (float)((ps->pb[i+1]-ps->pb[i])*FR_SZ); + + ps->cum_dur[0] = 0; /* do cumulative at same time */ + for(i=0,j=0;if_sz;i++) { + if(i == ps->pb[j]) { + if(j != 0) { + ps->cum_dur[j] = ps->duration[j-1] + ps->cum_dur[j-1]; + } + as->duration[i] = FR_SZ; + ps->duration[j] = FR_SZ; /* saves adding later */ + j++; + } else { + durdist = min(i-ps->pb[j-1],ps->pb[j]-i); + interdist = ps->pb[j] - ps->pb[j-1]; + proportion = (float)durdist/(float)interdist; + multiplier_i = fmax(0.01,4.0*proportion*(ps->scale[j-1]-1.0)+1.0); + as->duration[i] = FR_SZ*multiplier_i; + ps->duration[j-1] += as->duration[i]; + } + } +} + +static int min(int a, int b) +{ + return((ab)?a:b); +} + diff --git a/src/modules/donovan/excitation.c b/src/modules/donovan/excitation.c new file mode 100644 index 0000000..7e115cc --- /dev/null +++ b/src/modules/donovan/excitation.c @@ -0,0 +1,94 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Steve Isard and Alistair Conkie */ +/* Date : 1984 and 1996 */ +/* This version was gifted by Steve for this new */ +/* copyright, the original retains their original copyright */ +/* */ +/*************************************************************************/ +#include +#include +#include "t2s.h" + +#define SQRTSIX 0.408248 + +int don_random_seed = 1; + +static short ial(void); + +/* + wkspace[0] pitch period index + wkspace[1] where in pitch period +*/ + +float iexc(short voiced, ACOUSTIC *as, short *wkspace) +{ + switch(wkspace[1]) { + case 0: + wkspace[1] = as->pitch[wkspace[0]++] - 1; + if(voiced) + return((float)SQRTSIX); + break; + case 1: + wkspace[1]--; + if(voiced) + return((float)2*SQRTSIX); + break; + case 2: + wkspace[1]--; + if(voiced) + return((float)SQRTSIX); + break; + default: + wkspace[1]--; + if(voiced) + return((float)0); + break; + } + return( (float)(ial()) - 0.5); + +} + +static short ial() /* random number generator */ +{ + int seed = don_random_seed; + short i1,i2,i3; + + i1 = 1&(seed); + i2 = (((seed)&4) >> 2); + i3 = i1^i2; + seed = (((seed) >> 1) + (i3 << 10)); + don_random_seed = seed; + return(i3); +} + diff --git a/src/modules/donovan/load_diphs.c b/src/modules/donovan/load_diphs.c new file mode 100644 index 0000000..fb97856 --- /dev/null +++ b/src/modules/donovan/load_diphs.c @@ -0,0 +1,186 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alistair Conkie */ +/* Date : 1996 */ +/* This version was gifted by Alistair for this new */ +/* copyright, the original retains their original copyright */ +/* */ +/*************************************************************************/ +#include +#include +#include "EST_cutils.h" +#include "t2s.h" + +static int load_index(CONFIG *config); +static int load_diphs(CONFIG *config); + +/* awb added */ +int nindex = 0; +static ENTRY *indx = 0; +static FRAME *dico = 0; + +int load_speech(CONFIG *config) +{ + + if (load_index(config) != 0) /* in alphabetical order */ + return -1; + if (load_diphs(config) != 0) + return -1; + return 0; +} + +static int load_index(CONFIG *config) +{ + char s[100]; + int i; + + if (indx == 0) + indx = walloc(ENTRY,NDIPHS); + + if((config->xfd=fopen(config->index_file,"rb")) == NULL) { + (void)fprintf(stderr,"Can't open file %s\n",config->index_file); + return -1; + } + + for(i=0;(fgets(s,100,config->xfd) != NULL) && (i < NDIPHS);i++) { + sscanf(s,"%s %d %d %d",indx[i].diph,&indx[i].beg,&indx[i].mid,&indx[i].end); + } + nindex = i; + + fclose(config->xfd); + return 0; +} + +static int load_diphs(CONFIG *config) +{ + int i,j; + + if (dico == 0) + dico = walloc(FRAME,NFRAMES); + + if((config->dfd=fopen(config->diphone_file,"rb")) == NULL) { + fprintf(stderr,"Can't open file %s\n",config->diphone_file); + return -1; + } + + /* zero the first one... */ + for(i=0;idfd) != 0) && + (i < NFRAMES);i++) + { + ; + } + + /* check the first little bit is as we expect... */ + if ((dico[1].frame[0] != 181) || (dico[1].frame[1] != 176)) + { + if ((SWAPSHORT(dico[1].frame[0]) == 181) && + (SWAPSHORT(dico[1].frame[1]) == 176)) + { /* Its bytes swapped */ + for (j=1;jdiphone_file); + fclose(config->dfd); + return -1; + } + } + + fclose(config->dfd); + return 0; +} + +int lookup(char *diph) +{ + int low, high, mid; + + + low = 0; + high = nindex-1; + while(low <= high) { + mid = (low+high) / 2; + if(strcmp(diph,indx[mid].diph)<0) + high = mid-1; + else if(strcmp(diph,indx[mid].diph)>0) + low = mid+1; + else + return(mid); + } + return(-1); +} + +void phonstoframes(SPN *ps, ACOUSTIC *as) +{ + int i,j; + int ref; + + as->f_sz = 0; + + for(i=0;ip_sz-1;i++) + sprintf(ps->diphs[i],"%s-%s",ps->phons[i],ps->phons[i+1]); + + ps->pb[0] = 0; /* Gets treated a little bit specially */ + + /* insert zero frame */ + as->mcebuf[as->f_sz++] = &dico[0]; + + for(i=0;ip_sz-1;i++) { + ref = lookup(ps->diphs[i]); /* gives back the reference no. */ + if(ref == -1) { + (void)fprintf(stderr,"Diphone not found - %s\n",ps->diphs[i]); + ref = 0; + } + if(as->f_sz+50 > as->f_max) { + as_realloc(as->f_max*2,as->p_max,as); + } + for(j=indx[ref].beg;j<=indx[ref].end;j++) { + if(j==indx[ref].mid) + ps->pb[i+1] = as->f_sz; + as->mcebuf[as->f_sz++] = &dico[j]; + } + } + as->mcebuf[as->f_sz++] = &dico[0]; + as->mcebuf[as->f_sz++] = &dico[0]; + as->mcebuf[as->f_sz++] = &dico[0]; + + ps->pb[ps->p_sz] = as->f_sz-1; + +} + diff --git a/src/modules/donovan/makewav.c b/src/modules/donovan/makewav.c new file mode 100644 index 0000000..b26f31e --- /dev/null +++ b/src/modules/donovan/makewav.c @@ -0,0 +1,128 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Steve Isard and Alistair Conkie */ +/* 1984 and 1996 */ +/* This version was gifted by Steve and Alistair for this new */ +/* copyright, the original retains their original copyright */ +/* */ +/*************************************************************************/ +#include +#include +#include "t2s.h" + +void makewave(CONFIG *config, ACOUSTIC *as) +{ + short *ptptr; + short ptbuf[OUT_BUF+NCOEFFS]; + float rbuf[NCOEFFS],amp; + int i,j,k,l; + int voiced; + int zz = 0; + short *mcebuf; + short u_cache, s_cache = 0; + short wkspace[2] = {0,0}; + + for(zz=0;zzf_sz;i++) { + + mcebuf = &(as->mcebuf[i]->frame[0]); + + voiced = mcebuf[1]/2; + + if(voiced == 0) { + amp = 2*(float)sqrt((double)mcebuf[0]); + } else { + amp = (float) sqrt((double) (mcebuf[0] * voiced) ); + } + + for(k=FR_DATA-NCOEFFS;k < FR_DATA;k++) + rbuf[k-(FR_DATA-NCOEFFS)] = (float)mcebuf[k]/32767; + + rfctolpc(rbuf); /* convert to lpc coeffs */ + + for(j=0;j < as->duration[i];j++,ptptr++) { + register float v,*cptr; + register short *backptr; + short *endptr; + float exc; + + exc = iexc(voiced,as,wkspace); + v = (float)(exc == 0 ? 0 : amp*10*exc); /* 10 is NEW */ + + /* following loop depends on an initial NCOEFFS zeros preceding + * ptbuf to make *(--backptr) zero the first time through. + * We are depending on the ILS header before ptbuf to supply + * these zeros. If waves are ever to be synthesized without + * an ILS header on the front, special provision will have to be + * made. + */ + + endptr = ptptr - NCOEFFS; + backptr = ptptr; + cptr = rbuf + NCOEFFS; /* point at (empty) last cell of rbuf, + * because we are going to use coeffs in + * reverse order + */ + for(; backptr > endptr;) + v += (float)( *(--backptr) * *(--cptr) ); + + /* this is where we need to concern ourselves with */ + /* flushing the buffer from time to time */ + ptbuf[zz++] = (short)(v); + if(zz>=(OUT_BUF+NCOEFFS)) { + for(l=zz-NCOEFFS;l +#include "t2s.h" + +static int interpolated_freq(int k, SPN *ps); +static int interpolate(int a,int b,int c,int d,int e); + +void calc_pitch(SPN *ps, ACOUSTIC *as) +{ + int j,k; + int y; + int l = 0; + int k_old = 0; + int k_fine = 0; + int x = 0; + + for(j=0;jt_sz;j++) + ps->abs_targ[j] = ps->cum_dur[ps->targ_phon[j]] + + ps->pc_targs[j]*ps->duration[ps->targ_phon[j]]/100.0; + + for(k=0;kcum_dur[ps->p_sz];k+=100) { + y = interpolated_freq(k,ps); + x += 100*y; + while(x>SR) { + k_fine = k + interpolate(x-100*y,0,x,100,10000); + x -= SR; + as->pitch[l++] = k_fine-k_old; + if(l == as->p_max) { + as_realloc(as->f_max,as->p_max*2,as); + } + k_old = k_fine; + } + } + as->p_sz = l; + as->pitch[0] += FR_SZ/2; /* to compensate for mismatch */ +} + +static int interpolated_freq(int k, SPN *ps) +{ + int i; + int freq; + + if(!ps->t_sz) + return(DEF_F0); + else if(kabs_targ[0]) + return(ps->targ_freq[0]); + else if(k>=ps->abs_targ[ps->t_sz-1]) + return(ps->targ_freq[ps->t_sz-1]); + for(i=1;it_sz;i++) { + if((kabs_targ[i]) && (k>=ps->abs_targ[i-1])) + { + freq = interpolate(ps->abs_targ[i-1], + ps->targ_freq[i-1], + ps->abs_targ[i], + ps->targ_freq[i],k); + return(freq); + } + } + return(-1); /* should never arrive here */ +} + +static int interpolate(int a,int b,int c,int d,int e) +{ + int f; + + f = (c*b + d*e - e*b -a*d)/(c-a); + + return(f); +} diff --git a/src/modules/donovan/t2s.h b/src/modules/donovan/t2s.h new file mode 100644 index 0000000..195e948 --- /dev/null +++ b/src/modules/donovan/t2s.h @@ -0,0 +1,284 @@ +/*************************************************************************/ +/* */ +/* Centre for Speech Technology Research */ +/* University of Edinburgh, UK */ +/* Copyright (c) 1996,1997 */ +/* All Rights Reserved. */ +/* */ +/* Permission is hereby granted, free of charge, to use and distribute */ +/* this software and its documentation without restriction, including */ +/* without limitation the rights to use, copy, modify, merge, publish, */ +/* distribute, sublicense, and/or sell copies of this work, and to */ +/* permit persons to whom this work is furnished to do so, subject to */ +/* the following conditions: */ +/* 1. The code must retain the above copyright notice, this list of */ +/* conditions and the following disclaimer. */ +/* 2. Any modifications must be clearly marked as such. */ +/* 3. Original authors' names are not deleted. */ +/* 4. The authors' names are not used to endorse or promote products */ +/* derived from this software without specific prior written */ +/* permission. */ +/* */ +/* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ +/* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ +/* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ +/* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ +/* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ +/* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ +/* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ +/* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ +/* THIS SOFTWARE. */ +/* */ +/*************************************************************************/ +/* Author : Alistair Conkie and Steve Isard */ +/*-----------------------------------------------------------------------*/ + +#ifndef _T2S_H +#define _T2S_H_ +#define NDIPHS 3000 +#define NFRAMES 23000 +#define FR_DATA 16 /* shorts per frame, coeffs + assorted */ + +#define FW 1 +#define CW 2 +#define PUNCT 3 + +#define DEF_F0 125 +#define SR 10000 /* sample rate */ +#define FR_SZ 132 /* standard frame size */ + +/* malloc defaults */ +#define DEF_BUFFER 1024 +#define DEF_LING_LIST 100 +#define DEF_SPL 100 +#define DEF_PHONS 100 +#define DEF_TARGS 100 +#define DEF_FRAMES 100 +#define DEF_PM 100 + +#define PHON_SZ 5 +#define DIPH_SZ 10 + +#define OUT_BUF 2048 + +#define NCOEFFS 12 + +/* non-rhotic vowel classification for assim.c */ +#define V_DEL_R 0 +#define V_AIR 1 +#define V_EER 2 +#define V_OOR 3 +#define V_R2SCHWA 4 + +/* various typedefs */ + +typedef struct { + char *input_file; + char *output_file; + char *index_file; + char *diphone_file; + char *hash_file; + char *format; + int type; /* format by any other name */ + FILE *ifd; + FILE *ofd; + FILE *xfd; + FILE *dfd; + void *db; + int fw_num; + int sonority_num; + int dur0_num; +} CONFIG; + +typedef struct { + int max; + int sz; + char *ptr; +} BUFFER; + +typedef struct { + char *word; + int type; + char *transcription; +} LING; + +typedef struct { + int max; + int sz; + LING **text; +} LING_LIST; + +typedef struct key { + char *keyword; + int keycount; +} KEY; + +typedef struct { + char phoneme[5]; + int syll; + int dur; + char *sprosod1; + char *sprosod2; + float strength1; + float strength2; /* for combined elements */ +} SPROSOD; + +typedef struct { + int max; + int sz; + SPROSOD **phoneme; +} SPROSOD_LIST; + +typedef struct { + char diph[10]; + int beg; + int mid; + int end; +} ENTRY; + +typedef struct { + short frame[FR_DATA]; +} FRAME; + +typedef struct { + int p_sz; + int p_max; + int t_sz; + int t_max; + char **phons; + int *duration; + int *cum_dur; + int *pc_targs; + int *targ_phon; + int *targ_freq; + int *abs_targ; /* maybe in samples */ + int *pb; + float *scale; + char **diphs; +} SPN; + +typedef struct { + int f_sz; + int p_sz; + int f_max; + int p_max; + FRAME **mcebuf; + short *duration; /* since variants may be required */ + short *pitch; +} ACOUSTIC; + + +extern KEY fw[]; +extern KEY son[]; +extern KEY dur0[]; + +/* now definitions of global data */ + +/* awb -- deleted */ +/* extern ENTRY indx[NDIPHS]; */ +/* extern FRAME dico[NFRAMES]; */ +extern int nindex; +extern char *dbName; + +/* program prototypes */ + +/* audio.c */ +void audio_open(CONFIG *config); +void audio_play(short *start,int sz,int number,CONFIG *config); +void audio_close(CONFIG *config); +void audio_flush(CONFIG *config); + +/* makewave.c */ +void makewave(CONFIG *config, ACOUSTIC *as); + +/* coeffs.c */ +void rfctolpc(float *buf); + +/* conv.c */ +void conv(CONFIG *config, LING_LIST *ling_list, SPROSOD_LIST *spl); +void spl_cpy(int index,int syll, char *phon, int dur, char *type, float strength, SPROSOD_LIST *spl); +void spl_cat(int index,char *type, float strength, SPROSOD_LIST *spl); +int vowel(char *ph) ; + +/* durations.c */ +void durations(SPN *ps, ACOUSTIC *as); + +/* excitation.c */ +float iexc(short voiced, ACOUSTIC *as, short *wkspace); + +/* go.c */ +void go(CONFIG *config, BUFFER *buffer, LING_LIST *ling_list, SPROSOD_LIST *spl, SPN *ps, ACOUSTIC *as); + +/* grammar.c */ +void grammar(LING_LIST *ling_list); + +/* interface.c */ +char *nrl_rules(char *in); + +/* load_diphs.c */ +int load_speech(CONFIG *config); +int lookup(char *diph); +void phonstoframes(SPN *ps, ACOUSTIC *as); + +/* nrl_edin.c */ +void nrl_edin_conv(char *str, char *str2); + +/* pitch.c */ +void calc_pitch(SPN *ps, ACOUSTIC *as); + +/* prosody.c */ +void prosody(SPROSOD_LIST *spl, SPN *ps); + +/* space.c */ +void init(CONFIG *config, BUFFER *buffer, LING_LIST *ling_list, SPROSOD_LIST *spl, SPN *ps, ACOUSTIC *as); +void terminate(CONFIG *config, BUFFER *buffer, LING_LIST *ling_list, SPROSOD_LIST *spl, SPN *ps, ACOUSTIC *as); +void buffer_malloc(int num,BUFFER *buffer); +void buffer_realloc(int num, BUFFER *buffer); +void buffer_free(BUFFER *buffer); +void ling_list_malloc(int num, LING_LIST *ling_list); +void ling_list_realloc(int num, LING_LIST *ling_list); +void ling_list_free(LING_LIST *ling_list); +void spl_malloc(int num, SPROSOD_LIST *spl); +void spl_realloc(int num, SPROSOD_LIST *spl); +void spl_free(SPROSOD_LIST *spl); +void ps_malloc(int nphons, int ntargs, SPN *ps); +void ps_realloc(int nphons, int ntargs, SPN *ps); +void ps_free(SPN *ps); +void as_malloc(int nframes, int npp, ACOUSTIC *as); +void as_realloc(int nframes, int npp, ACOUSTIC *as); +void as_free(ACOUSTIC *as); + +/* syllab.c */ +char *syllabify(char *string, CONFIG *config); +char *stress(char *input); + +/* t2s.c */ +void process_sentence(CONFIG *config, BUFFER *buffer, LING_LIST *ling_list, SPROSOD_LIST *spl, SPN *ps, ACOUSTIC *as); + +/* tags.c */ +void tags(CONFIG *config, BUFFER *buffer, LING_LIST *ling_list); + +/* transcribe.c */ +void transcribe(CONFIG *config, LING_LIST *ling_list); + +/* ulaw.c */ +unsigned char linear2ulaw(int sample); +int ulaw2linear(unsigned char ulawbyte); + +/* utils.c */ +char **split(char *in); +void tidy_split(char **root); +KEY *binary(char *word, KEY tab[], int n); + +/* library prototypes +int fprintf(FILE *stream, char *format, ... ); +int printf(const char *format, ... ); +int getopt(int argc,char **argv, char *optstring); +int sscanf(char *s,char * format, ... ); +int fread (char *ptr, int size, int nitems, FILE *stream); +int fwrite (char *ptr, int size, int nitems, FILE *stream); +int fclose(FILE *stream); +*/ + + +#endif /* _T2S_H_ */ diff --git a/src/modules/hts_engine/AUTHORS b/src/modules/hts_engine/AUTHORS new file mode 100644 index 0000000..8cd43d2 --- /dev/null +++ b/src/modules/hts_engine/AUTHORS @@ -0,0 +1,13 @@ +The festopt_hts_engine is HMM-based speech synthesis module. This software is +released under the New and Simplified BSD license. See the COPYING file in the +same directory as this file for the license. + +The festopt_hts_engine has been developed by several members of HTS Working +Group and some graduate students in Nagoya Institute of Technology: + + Keiichi Tokuda http://www.sp.nitech.ac.jp/~tokuda/ + (Produce and Design) + Keiichiro Oura http://www.sp.nitech.ac.jp/~uratec/ + (Design and Development, Main Maintainer) + Junichi Yamagishi http://homepages.inf.ed.ac.uk/jyamagis/ + Alan W. Black http://www.cs.cmu.edu/~awb/ diff --git a/src/modules/hts_engine/COPYING b/src/modules/hts_engine/COPYING new file mode 100644 index 0000000..23abfed --- /dev/null +++ b/src/modules/hts_engine/COPYING @@ -0,0 +1,43 @@ +/* ----------------------------------------------------------------- */ +/* The HMM-Based Speech Synthesis System (HTS) */ +/* festopt_hts_engine developed by HTS Working Group */ +/* http://hts-engine.sourceforge.net/ */ +/* ----------------------------------------------------------------- */ +/* */ +/* Copyright (c) 2001-2010 Nagoya Institute of Technology */ +/* Department of Computer Science */ +/* */ +/* 2001-2008 Tokyo Institute of Technology */ +/* Interdisciplinary Graduate School of */ +/* Science and Engineering */ +/* */ +/* All rights reserved. */ +/* */ +/* Redistribution and use in source and binary forms, with or */ +/* without modification, are permitted provided that the following */ +/* conditions are met: */ +/* */ +/* - Redistributions of source code must retain the above copyright */ +/* notice, this list of conditions and the following disclaimer. */ +/* - Redistributions in binary form must reproduce the above */ +/* copyright notice, this list of conditions and the following */ +/* disclaimer in the documentation and/or other materials provided */ +/* with the distribution. */ +/* - Neither the name of the HTS working group nor the names of its */ +/* contributors may be used to endorse or promote products derived */ +/* from this software without specific prior written permission. */ +/* */ +/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND */ +/* CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, */ +/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */ +/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */ +/* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS */ +/* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, */ +/* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED */ +/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, */ +/* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON */ +/* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, */ +/* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY */ +/* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */ +/* POSSIBILITY OF SUCH DAMAGE. */ +/* ----------------------------------------------------------------- */ diff --git a/src/modules/hts_engine/HTS_audio.c b/src/modules/hts_engine/HTS_audio.c new file mode 100644 index 0000000..bd555b1 --- /dev/null +++ b/src/modules/hts_engine/HTS_audio.c @@ -0,0 +1,249 @@ +/* ----------------------------------------------------------------- */ +/* The HMM-Based Speech Synthesis System (HTS) */ +/* hts_engine API developed by HTS Working Group */ +/* http://hts-engine.sourceforge.net/ */ +/* ----------------------------------------------------------------- */ +/* */ +/* Copyright (c) 2001-2010 Nagoya Institute of Technology */ +/* Department of Computer Science */ +/* */ +/* 2001-2008 Tokyo Institute of Technology */ +/* Interdisciplinary Graduate School of */ +/* Science and Engineering */ +/* */ +/* All rights reserved. */ +/* */ +/* Redistribution and use in source and binary forms, with or */ +/* without modification, are permitted provided that the following */ +/* conditions are met: */ +/* */ +/* - Redistributions of source code must retain the above copyright */ +/* notice, this list of conditions and the following disclaimer. */ +/* - Redistributions in binary form must reproduce the above */ +/* copyright notice, this list of conditions and the following */ +/* disclaimer in the documentation and/or other materials provided */ +/* with the distribution. */ +/* - Neither the name of the HTS working group nor the names of its */ +/* contributors may be used to endorse or promote products derived */ +/* from this software without specific prior written permission. */ +/* */ +/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND */ +/* CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, */ +/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */ +/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */ +/* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS */ +/* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, */ +/* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED */ +/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, */ +/* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON */ +/* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, */ +/* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY */ +/* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */ +/* POSSIBILITY OF SUCH DAMAGE. */ +/* ----------------------------------------------------------------- */ + +#ifndef HTS_AUDIO_C +#define HTS_AUDIO_C + +#ifdef __cplusplus +#define HTS_AUDIO_C_START extern "C" { +#define HTS_AUDIO_C_END } +#else +#define HTS_AUDIO_C_START +#define HTS_AUDIO_C_END +#endif /* __CPLUSPLUS */ + +HTS_AUDIO_C_START; + +/* hts_engine libralies */ +#include "HTS_hidden.h" + +/* for windows & windows mobile */ +#if defined(AUDIO_PLAY_WIN32) || defined(AUDIO_PLAY_WINCE) + +#define AUDIO_WAIT_BUFF_MS 10 /* wait time (ms) */ +#define AUDIO_CHANNEL 1 /* monoral */ + +/* HTS_Audio_callback_function: callback function from audio device */ +static void CALLBACK HTS_Audio_callback_function(HWAVEOUT hwaveout, UINT msg, + DWORD user_data, DWORD param1, + DWORD param2) +{ + WAVEHDR *wavehdr = (WAVEHDR *) param1; + HTS_Audio *as = (HTS_Audio *) user_data; + + if (msg == MM_WOM_DONE && wavehdr && (wavehdr->dwFlags & WHDR_DONE)) { + if (as->now_buff_1 == TRUE && wavehdr == &(as->buff_1)) { + as->now_buff_1 = FALSE; + } else if (as->now_buff_2 == TRUE && wavehdr == &(as->buff_2)) { + as->now_buff_2 = FALSE; + } + } +} + +/* HTS_Audio_write_buffer: send buffer to audio device */ +static void HTS_Audio_write_buffer(HTS_Audio * as) +{ + MMRESULT error; + + if (as->which_buff == 1) { + while (as->now_buff_1 == TRUE) + Sleep(AUDIO_WAIT_BUFF_MS); + as->now_buff_1 = TRUE; + as->which_buff = 2; + memcpy(as->buff_1.lpData, as->buff, as->buff_size * sizeof(short)); + as->buff_1.dwBufferLength = as->buff_size * sizeof(short); + error = waveOutWrite(as->hwaveout, &(as->buff_1), sizeof(WAVEHDR)); + } else { + while (as->now_buff_2 == TRUE) + Sleep(AUDIO_WAIT_BUFF_MS); + as->now_buff_2 = TRUE; + as->which_buff = 1; + memcpy(as->buff_2.lpData, as->buff, as->buff_size * sizeof(short)); + as->buff_2.dwBufferLength = as->buff_size * sizeof(short); + error = waveOutWrite(as->hwaveout, &(as->buff_2), sizeof(WAVEHDR)); + } + + if (error != MMSYSERR_NOERROR) + HTS_error(0, + "hts_engine: Cannot send datablocks to your output audio device to play waveform.\n"); +} + +/* HTS_Audio_open: open audio device */ +void HTS_Audio_open(HTS_Audio * as, int sampling_rate, int max_buff_size) +{ + MMRESULT error; + + /* queue */ + as->which_buff = 1; + as->now_buff_1 = FALSE; + as->now_buff_2 = FALSE; + as->max_buff_size = max_buff_size; + as->buff = (short *) HTS_calloc(max_buff_size, sizeof(short)); + + /* format */ + as->waveformatex.wFormatTag = WAVE_FORMAT_PCM; + as->waveformatex.nChannels = AUDIO_CHANNEL; + as->waveformatex.nSamplesPerSec = sampling_rate; + as->waveformatex.wBitsPerSample = sizeof(short) * 8; + as->waveformatex.nBlockAlign = AUDIO_CHANNEL + * as->waveformatex.wBitsPerSample / 8; + as->waveformatex.nAvgBytesPerSec = sampling_rate + * as->waveformatex.nBlockAlign; + /* open */ + error = + waveOutOpen(&as->hwaveout, WAVE_MAPPER, &as->waveformatex, + (DWORD) HTS_Audio_callback_function, (DWORD) as, + CALLBACK_FUNCTION); + if (error != MMSYSERR_NOERROR) + HTS_error(0, + "hts_engine: Failed to open your output audio device to play waveform.\n"); + + /* prepare */ + as->buff_1.lpData = (LPSTR) HTS_calloc(max_buff_size, sizeof(short)); + as->buff_1.dwBufferLength = max_buff_size * sizeof(short); + as->buff_1.dwFlags = WHDR_BEGINLOOP | WHDR_ENDLOOP; + as->buff_1.dwLoops = 1; + as->buff_1.lpNext = 0; + as->buff_1.reserved = 0; + error = waveOutPrepareHeader(as->hwaveout, &(as->buff_1), sizeof(WAVEHDR)); + if (error != MMSYSERR_NOERROR) + HTS_error(0, + "hts_engine: Cannot initialize audio datablocks to play waveform.\n"); + as->buff_2.lpData = (LPSTR) HTS_calloc(max_buff_size, sizeof(short)); + as->buff_2.dwBufferLength = max_buff_size * sizeof(short); + as->buff_2.dwFlags = WHDR_BEGINLOOP | WHDR_ENDLOOP; + as->buff_2.dwLoops = 1; + as->buff_2.lpNext = 0; + as->buff_2.reserved = 0; + error = waveOutPrepareHeader(as->hwaveout, &(as->buff_2), sizeof(WAVEHDR)); + if (error != MMSYSERR_NOERROR) + HTS_error(0, + "hts_engine: Cannot initialize audio datablocks to play waveform.\n"); +} + +/* HTS_Audio_write: send data to audio device */ +void HTS_Audio_write(HTS_Audio * as, short data) +{ + as->buff[as->buff_size] = data; + as->buff_size++; + if (as->buff_size == as->max_buff_size) { + HTS_Audio_write_buffer(as); + as->buff_size = 0; + } +} + +/* HTS_Audio_close: close audio device */ +void HTS_Audio_close(HTS_Audio * as) +{ + MMRESULT error; + + if (as->buff_size != 0) + HTS_Audio_write_buffer(as); + while (as->now_buff_1 == TRUE) + Sleep(AUDIO_WAIT_BUFF_MS); + while (as->now_buff_2 == TRUE) + Sleep(AUDIO_WAIT_BUFF_MS); + /* stop audio */ + error = waveOutReset(as->hwaveout); + if (error != MMSYSERR_NOERROR) + HTS_error(0, + "hts_engine: Cannot stop and reset your output audio device.\n"); + /* unprepare */ + error = waveOutUnprepareHeader(as->hwaveout, &(as->buff_1), sizeof(WAVEHDR)); + if (error != MMSYSERR_NOERROR) + HTS_error(0, + "hts_engine: Cannot cleanup the audio datablocks to play waveform.\n"); + error = waveOutUnprepareHeader(as->hwaveout, &(as->buff_2), sizeof(WAVEHDR)); + if (error != MMSYSERR_NOERROR) + HTS_error(0, + "hts_engine: Cannot cleanup the audio datablocks to play waveform.\n"); + /* close */ + error = waveOutClose(as->hwaveout); + if (error != MMSYSERR_NOERROR) + HTS_error(0, "hts_engine: Failed to close your output audio device.\n"); + HTS_free(as->buff_1.lpData); + HTS_free(as->buff_2.lpData); + HTS_free(as->buff); +} +#endif /* AUDIO_PLAY_WIN32 || AUDIO_PLAY_WINCE */ + +/* for ALSA on Linux */ +#ifdef AUDIO_PLAY_ALSA +/* HTS_Audio_open: open audio device (dummy) */ +void HTS_Audio_open(HTS_Audio * as, int sampling_rate, int max_buff_size) +{ +} + +/* HTS_Audio_write: send data to audio device (dummy) */ +void HTS_Audio_write(HTS_Audio * as, short data) +{ +} + +/* HTS_Audio_close: close audio device */ +void HTS_Audio_close(HTS_Audio * as) +{ +} +#endif /* AUDIO_PLAY_ALSA */ + +/* for others */ +#ifdef AUDIO_PLAY_NONE +/* HTS_Audio_open: open audio device (dummy) */ +void HTS_Audio_open(HTS_Audio * as, int sampling_rate, int max_buff_size) +{ +} + +/* HTS_Audio_write: send data to audio device (dummy) */ +void HTS_Audio_write(HTS_Audio * as, short data) +{ +} + +/* HTS_Audio_close: close audio device (dummy) */ +void HTS_Audio_close(HTS_Audio * as) +{ +} +#endif /* AUDIO_PLAY_NONE */ + +HTS_AUDIO_C_END; + +#endif /* !HTS_AUDIO_C */ diff --git a/src/modules/hts_engine/HTS_engine.c b/src/modules/hts_engine/HTS_engine.c new file mode 100644 index 0000000..5f2dfed --- /dev/null +++ b/src/modules/hts_engine/HTS_engine.c @@ -0,0 +1,836 @@ +/* ----------------------------------------------------------------- */ +/* The HMM-Based Speech Synthesis System (HTS) */ +/* hts_engine API developed by HTS Working Group */ +/* http://hts-engine.sourceforge.net/ */ +/* ----------------------------------------------------------------- */ +/* */ +/* Copyright (c) 2001-2010 Nagoya Institute of Technology */ +/* Department of Computer Science */ +/* */ +/* 2001-2008 Tokyo Institute of Technology */ +/* Interdisciplinary Graduate School of */ +/* Science and Engineering */ +/* */ +/* All rights reserved. */ +/* */ +/* Redistribution and use in source and binary forms, with or */ +/* without modification, are permitted provided that the following */ +/* conditions are met: */ +/* */ +/* - Redistributions of source code must retain the above copyright */ +/* notice, this list of conditions and the following disclaimer. */ +/* - Redistributions in binary form must reproduce the above */ +/* copyright notice, this list of conditions and the following */ +/* disclaimer in the documentation and/or other materials provided */ +/* with the distribution. */ +/* - Neither the name of the HTS working group nor the names of its */ +/* contributors may be used to endorse or promote products derived */ +/* from this software without specific prior written permission. */ +/* */ +/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND */ +/* CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, */ +/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */ +/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */ +/* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS */ +/* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, */ +/* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED */ +/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, */ +/* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON */ +/* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, */ +/* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY */ +/* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */ +/* POSSIBILITY OF SUCH DAMAGE. */ +/* ----------------------------------------------------------------- */ + +#ifndef HTS_ENGINE_C +#define HTS_ENGINE_C + +#ifdef __cplusplus +#define HTS_ENGINE_C_START extern "C" { +#define HTS_ENGINE_C_END } +#else +#define HTS_ENGINE_C_START +#define HTS_ENGINE_C_END +#endif /* __CPLUSPLUS */ + +HTS_ENGINE_C_START; + +#include /* for strcpy() */ + +/* hts_engine libraries */ +#include "HTS_hidden.h" + +/* HTS_Engine_initialize: initialize engine */ +void HTS_Engine_initialize(HTS_Engine * engine, int nstream) +{ + int i; + + /* default value for control parameter */ + engine->global.stage = 0; + engine->global.use_log_gain = FALSE; + engine->global.sampling_rate = 16000; + engine->global.fperiod = 80; + engine->global.alpha = 0.42; + engine->global.beta = 0.0; + engine->global.audio_buff_size = 0; + engine->global.msd_threshold = + (double *) HTS_calloc(nstream, sizeof(double)); + for (i = 0; i < nstream; i++) + engine->global.msd_threshold[i] = 0.5; + + /* interpolation weight */ + engine->global.parameter_iw = + (double **) HTS_calloc(nstream, sizeof(double *)); + engine->global.gv_iw = (double **) HTS_calloc(nstream, sizeof(double *)); + engine->global.duration_iw = NULL; + for (i = 0; i < nstream; i++) + engine->global.parameter_iw[i] = NULL; + for (i = 0; i < nstream; i++) + engine->global.gv_iw[i] = NULL; + + /* GV weight */ + engine->global.gv_weight = (double *) HTS_calloc(nstream, sizeof(double)); + for (i = 0; i < nstream; i++) + engine->global.gv_weight[i] = 1.0; + + /* stop flag */ + engine->global.stop = FALSE; + /* volume */ + engine->global.volume = 1.0; + + /* initialize model set */ + HTS_ModelSet_initialize(&engine->ms, nstream); + /* initialize label list */ + HTS_Label_initialize(&engine->label); + /* initialize state sequence set */ + HTS_SStreamSet_initialize(&engine->sss); + /* initialize pstream set */ + HTS_PStreamSet_initialize(&engine->pss); + /* initialize gstream set */ + HTS_GStreamSet_initialize(&engine->gss); +} + +/* HTS_Engine_load_duratin_from_fn: load duration pdfs, trees and number of state from file names */ +void HTS_Engine_load_duration_from_fn(HTS_Engine * engine, char **pdf_fn, + char **tree_fn, int interpolation_size) +{ + int i; + FILE **pdf_fp, **tree_fp; + + pdf_fp = (FILE **) HTS_calloc(interpolation_size, sizeof(FILE *)); + tree_fp = (FILE **) HTS_calloc(interpolation_size, sizeof(FILE *)); + for (i = 0; i < interpolation_size; i++) { + pdf_fp[i] = HTS_get_fp(pdf_fn[i], "rb"); + tree_fp[i] = HTS_get_fp(tree_fn[i], "r"); + } + HTS_Engine_load_duration_from_fp(engine, pdf_fp, tree_fp, + interpolation_size); + for (i = 0; i < interpolation_size; i++) { + fclose(pdf_fp[i]); + fclose(tree_fp[i]); + } + HTS_free(pdf_fp); + HTS_free(tree_fp); +} + +/* HTS_Engine_load_duration_from_fp: load duration pdfs, trees and number of state from file pointers */ +void HTS_Engine_load_duration_from_fp(HTS_Engine * engine, FILE ** pdf_fp, + FILE ** tree_fp, int interpolation_size) +{ + int i; + + HTS_ModelSet_load_duration(&engine->ms, pdf_fp, tree_fp, interpolation_size); + engine->global.duration_iw = + (double *) HTS_calloc(interpolation_size, sizeof(double)); + for (i = 0; i < interpolation_size; i++) + engine->global.duration_iw[i] = 1.0 / interpolation_size; +} + +/* HTS_Engine_load_parameter_from_fn: load parameter pdfs, trees and windows from file names */ +void HTS_Engine_load_parameter_from_fn(HTS_Engine * engine, char **pdf_fn, + char **tree_fn, char **win_fn, + int stream_index, HTS_Boolean msd_flag, + int window_size, int interpolation_size) +{ + int i; + FILE **pdf_fp, **tree_fp, **win_fp; + + pdf_fp = (FILE **) HTS_calloc(interpolation_size, sizeof(FILE *)); + tree_fp = (FILE **) HTS_calloc(interpolation_size, sizeof(FILE *)); + win_fp = (FILE **) HTS_calloc(window_size, sizeof(FILE *)); + for (i = 0; i < interpolation_size; i++) { + pdf_fp[i] = HTS_get_fp(pdf_fn[i], "rb"); + tree_fp[i] = HTS_get_fp(tree_fn[i], "r"); + } + for (i = 0; i < window_size; i++) + win_fp[i] = HTS_get_fp(win_fn[i], "r"); + HTS_Engine_load_parameter_from_fp(engine, pdf_fp, tree_fp, win_fp, + stream_index, msd_flag, + window_size, interpolation_size); + for (i = 0; i < interpolation_size; i++) { + fclose(pdf_fp[i]); + fclose(tree_fp[i]); + } + for (i = 0; i < window_size; i++) + fclose(win_fp[i]); + HTS_free(pdf_fp); + HTS_free(tree_fp); + HTS_free(win_fp); +} + +/* HTS_Engine_load_parameter_from_fp: load parameter pdfs, trees and windows from file pointers */ +void HTS_Engine_load_parameter_from_fp(HTS_Engine * engine, FILE ** pdf_fp, + FILE ** tree_fp, FILE ** win_fp, + int stream_index, HTS_Boolean msd_flag, + int window_size, int interpolation_size) +{ + int i; + + HTS_ModelSet_load_parameter(&engine->ms, pdf_fp, tree_fp, win_fp, + stream_index, msd_flag, + window_size, interpolation_size); + engine->global.parameter_iw[stream_index] = + (double *) HTS_calloc(interpolation_size, sizeof(double)); + for (i = 0; i < interpolation_size; i++) + engine->global.parameter_iw[stream_index][i] = 1.0 / interpolation_size; +} + +/* HTS_Engine_load_gv_from_fn: load GV pdfs and trees from file names */ +void HTS_Engine_load_gv_from_fn(HTS_Engine * engine, char **pdf_fn, + char **tree_fn, int stream_index, + int interpolation_size) +{ + int i; + FILE **pdf_fp, **tree_fp; + + pdf_fp = (FILE **) HTS_calloc(interpolation_size, sizeof(FILE *)); + if (tree_fn) + tree_fp = (FILE **) HTS_calloc(interpolation_size, sizeof(FILE *)); + else + tree_fp = NULL; + for (i = 0; i < interpolation_size; i++) { + pdf_fp[i] = HTS_get_fp(pdf_fn[i], "rb"); + if (tree_fn) { + if (tree_fn[i]) + tree_fp[i] = HTS_get_fp(tree_fn[i], "r"); + else + tree_fp[i] = NULL; + } + } + HTS_Engine_load_gv_from_fp(engine, pdf_fp, tree_fp, stream_index, + interpolation_size); + for (i = 0; i < interpolation_size; i++) { + fclose(pdf_fp[i]); + if (tree_fp && tree_fp[i]) + fclose(tree_fp[i]); + } + HTS_free(pdf_fp); + if (tree_fp) + HTS_free(tree_fp); +} + +/* HTS_Engine_load_gv_from_fp: load GV pdfs and trees from file pointers */ +void HTS_Engine_load_gv_from_fp(HTS_Engine * engine, FILE ** pdf_fp, + FILE ** tree_fp, int stream_index, + int interpolation_size) +{ + int i; + + HTS_ModelSet_load_gv(&engine->ms, pdf_fp, tree_fp, stream_index, + interpolation_size); + engine->global.gv_iw[stream_index] = + (double *) HTS_calloc(interpolation_size, sizeof(double)); + for (i = 0; i < interpolation_size; i++) + engine->global.gv_iw[stream_index][i] = 1.0 / interpolation_size; +} + +/* HTS_Engine_load_gv_switch_from_fn: load GV switch from file name */ +void HTS_Engine_load_gv_switch_from_fn(HTS_Engine * engine, char *fn) +{ + FILE *fp = HTS_get_fp(fn, "r"); + + HTS_Engine_load_gv_switch_from_fp(engine, fp); + fclose(fp); +} + +/* HTS_Engine_load_gv_switch_from_fp: load GV switch from file pointer */ +void HTS_Engine_load_gv_switch_from_fp(HTS_Engine * engine, FILE * fp) +{ + HTS_ModelSet_load_gv_switch(&engine->ms, fp); +} + +/* HTS_Engine_set_sampling_rate: set sampling rate */ +void HTS_Engine_set_sampling_rate(HTS_Engine * engine, int i) +{ + if (i < 1) + i = 1; + if (i > 48000) + i = 48000; + engine->global.sampling_rate = i; +} + +/* HTS_Engine_get_sampling_rate: get sampling rate */ +int HTS_Engine_get_sampling_rate(HTS_Engine * engine) +{ + return engine->global.sampling_rate; +} + +/* HTS_Engine_set_fperiod: set frame shift */ +void HTS_Engine_set_fperiod(HTS_Engine * engine, int i) +{ + if (i < 1) + i = 1; + if (i > 48000) + i = 48000; + engine->global.fperiod = i; +} + +/* HTS_Engine_get_fperiod: get frame shift */ +int HTS_Engine_get_fperiod(HTS_Engine * engine) +{ + return engine->global.fperiod; +} + +/* HTS_Engine_set_alpha: set alpha */ +void HTS_Engine_set_alpha(HTS_Engine * engine, double f) +{ + if (f < 0.0) + f = 0.0; + if (f > 1.0) + f = 1.0; + engine->global.alpha = f; +} + +/* HTS_Engine_set_gamma: set gamma (Gamma = -1/i: if i=0 then Gamma=0) */ +void HTS_Engine_set_gamma(HTS_Engine * engine, int i) +{ + if (i < 0) + i = 0; + engine->global.stage = i; +} + +/* HTS_Engine_set_log_gain: set log gain flag (for LSP) */ +void HTS_Engine_set_log_gain(HTS_Engine * engine, HTS_Boolean i) +{ + engine->global.use_log_gain = i; +} + +/* HTS_Engine_set_beta: set beta */ +void HTS_Engine_set_beta(HTS_Engine * engine, double f) +{ + if (f < -0.8) + f = -0.8; + if (f > 0.8) + f = 0.8; + engine->global.beta = f; +} + +/* HTS_Engine_set_audio_buff_size: set audio buffer size */ +void HTS_Engine_set_audio_buff_size(HTS_Engine * engine, int i) +{ + if (i < 0) + i = 0; + if (i > 48000) + i = 48000; + engine->global.audio_buff_size = i; +} + +/* HTS_Engine_get_audio_buff_size: get audio buffer size */ +int HTS_Engine_get_audio_buff_size(HTS_Engine * engine) +{ + return engine->global.audio_buff_size; +} + +/* HTS_Egnine_set_msd_threshold: set MSD threshold */ +void HTS_Engine_set_msd_threshold(HTS_Engine * engine, int stream_index, + double f) +{ + if (f < 0.0) + f = 0.0; + if (f > 1.0) + f = 1.0; + engine->global.msd_threshold[stream_index] = f; +} + +/* HTS_Engine_set_duration_interpolation_weight: set interpolation weight for duration */ +void HTS_Engine_set_duration_interpolation_weight(HTS_Engine * engine, + int interpolation_index, + double f) +{ + engine->global.duration_iw[interpolation_index] = f; +} + +/* HTS_Engine_set_parameter_interpolation_weight: set interpolation weight for parameter */ +void HTS_Engine_set_parameter_interpolation_weight(HTS_Engine * engine, + int stream_index, + int interpolation_index, + double f) +{ + engine->global.parameter_iw[stream_index][interpolation_index] = f; +} + +/* HTS_Engine_set_gv_interpolation_weight: set interpolation weight for GV */ +void HTS_Engine_set_gv_interpolation_weight(HTS_Engine * engine, + int stream_index, + int interpolation_index, double f) +{ + engine->global.gv_iw[stream_index][interpolation_index] = f; +} + +/* HTS_Engine_set_gv_weight: set GV weight */ +void HTS_Engine_set_gv_weight(HTS_Engine * engine, int stream_index, double f) +{ + if (f < 0.0) + f = 0.0; + if (f > 2.0) + f = 2.0; + engine->global.gv_weight[stream_index] = f; +} + +/* HTS_Engine_set_stop_flag: set stop flag */ +void HTS_Engine_set_stop_flag(HTS_Engine * engine, HTS_Boolean b) +{ + engine->global.stop = b; +} + +/* HTS_Engine_set_volume: set volume */ +void HTS_Engine_set_volume(HTS_Engine * engine, double f) +{ + if (f < 0.0) + f = 0.0; + engine->global.volume = f; +} + +/* HTS_Engine_get_total_state: get total number of state */ +int HTS_Engine_get_total_state(HTS_Engine * engine) +{ + return HTS_SStreamSet_get_total_state(&engine->sss); +} + +/* HTS_Engine_set_state_mean: set mean value of state */ +void HTS_Engine_set_state_mean(HTS_Engine * engine, int stream_index, + int state_index, int vector_index, double f) +{ + HTS_SStreamSet_set_mean(&engine->sss, stream_index, state_index, + vector_index, f); +} + +/* HTS_Engine_get_state_mean: get mean value of state */ +double HTS_Engine_get_state_mean(HTS_Engine * engine, int stream_index, + int state_index, int vector_index) +{ + return HTS_SStreamSet_get_mean(&engine->sss, stream_index, state_index, + vector_index); +} + +/* HTS_Engine_get_state_duration: get state duration */ +int HTS_Engine_get_state_duration(HTS_Engine * engine, int state_index) +{ + return HTS_SStreamSet_get_duration(&engine->sss, state_index); +} + +/* HTS_Engine_get_nstate: get number of state */ +int HTS_Engine_get_nstate(HTS_Engine * engine) +{ + return HTS_ModelSet_get_nstate(&engine->ms); +} + +/* HTS_Engine_load_label_from_fn: load label from file name */ +void HTS_Engine_load_label_from_fn(HTS_Engine * engine, char *fn) +{ + HTS_Label_load_from_fn(&engine->label, engine->global.sampling_rate, + engine->global.fperiod, fn); +} + +/* HTS_Engine_load_label_from_fp: load label from file pointer */ +void HTS_Engine_load_label_from_fp(HTS_Engine * engine, FILE * fp) +{ + HTS_Label_load_from_fp(&engine->label, engine->global.sampling_rate, + engine->global.fperiod, fp); +} + +/* HTS_Engine_load_label_from_string: load label from string */ +void HTS_Engine_load_label_from_string(HTS_Engine * engine, char *data) +{ + HTS_Label_load_from_string(&engine->label, engine->global.sampling_rate, + engine->global.fperiod, data); +} + +/* HTS_Engine_load_label_from_string_list: load label from string list */ +void HTS_Engine_load_label_from_string_list(HTS_Engine * engine, char **data, + int size) +{ + HTS_Label_load_from_string_list(&engine->label, engine->global.sampling_rate, + engine->global.fperiod, data, size); +} + +/* HTS_Engine_create_sstream: parse label and determine state duration */ +void HTS_Engine_create_sstream(HTS_Engine * engine) +{ + HTS_SStreamSet_create(&engine->sss, &engine->ms, &engine->label, + engine->global.duration_iw, + engine->global.parameter_iw, engine->global.gv_iw); +} + +/* HTS_Engine_create_pstream: generate speech parameter vector sequence */ +void HTS_Engine_create_pstream(HTS_Engine * engine) +{ + HTS_PStreamSet_create(&engine->pss, &engine->sss, + engine->global.msd_threshold, + engine->global.gv_weight); +} + +/* HTS_Engine_create_gstream: synthesis speech */ +void HTS_Engine_create_gstream(HTS_Engine * engine) +{ + HTS_GStreamSet_create(&engine->gss, &engine->pss, engine->global.stage, + engine->global.use_log_gain, + engine->global.sampling_rate, engine->global.fperiod, + engine->global.alpha, engine->global.beta, + &engine->global.stop, engine->global.volume, + engine->global.audio_buff_size); +} + +/* HTS_Engine_save_information: output trace information */ +void HTS_Engine_save_information(HTS_Engine * engine, FILE * fp) +{ + int i, j, k, l, m, n; + double temp; + HTS_Global *global = &engine->global; + HTS_ModelSet *ms = &engine->ms; + HTS_Label *label = &engine->label; + HTS_SStreamSet *sss = &engine->sss; + HTS_PStreamSet *pss = &engine->pss; + + /* global parameter */ + fprintf(fp, "[Global parameter]\n"); + fprintf(fp, "Sampring frequency -> %8d(Hz)\n", + global->sampling_rate); + fprintf(fp, "Frame period -> %8d(point)\n", + global->fperiod); + fprintf(fp, " %8.5f(msec)\n", + 1e+3 * global->fperiod / global->sampling_rate); + fprintf(fp, "All-pass constant -> %8.5f\n", + (float) global->alpha); + fprintf(fp, "Gamma -> %8.5f\n", + (float) (global->stage == 0 ? 0.0 : -1.0 / global->stage)); + if (global->stage != 0) + fprintf(fp, "Log gain flag -> %s\n", + global->use_log_gain ? "TRUE" : "FALSE"); + fprintf(fp, "Postfiltering coefficient -> %8.5f\n", + (float) global->beta); + fprintf(fp, "Audio buffer size -> %8d(sample)\n", + global->audio_buff_size); + fprintf(fp, "\n"); + + /* duration parameter */ + fprintf(fp, "[Duration parameter]\n"); + fprintf(fp, "Number of states -> %8d\n", + HTS_ModelSet_get_nstate(ms)); + fprintf(fp, " Interpolation -> %8d\n", + HTS_ModelSet_get_duration_interpolation_size(ms)); + /* check interpolation */ + for (i = 0, temp = 0.0; + i < HTS_ModelSet_get_duration_interpolation_size(ms); i++) + temp += global->duration_iw[i]; + for (i = 0; i < HTS_ModelSet_get_duration_interpolation_size(ms); i++) + if (global->duration_iw[i] != 0.0) + global->duration_iw[i] /= temp; + for (i = 0; i < HTS_ModelSet_get_duration_interpolation_size(ms); i++) + fprintf(fp, + " Interpolation weight[%2d] -> %8.0f(%%)\n", i, + (float) (100 * global->duration_iw[i])); + fprintf(fp, "\n"); + + fprintf(fp, "[Stream parameter]\n"); + for (i = 0; i < HTS_ModelSet_get_nstream(ms); i++) { + /* stream parameter */ + fprintf(fp, "Stream[%2d] vector length -> %8d\n", i, + HTS_ModelSet_get_vector_length(ms, i)); + fprintf(fp, " Dynamic window size -> %8d\n", + HTS_ModelSet_get_window_size(ms, i)); + /* interpolation */ + fprintf(fp, " Interpolation -> %8d\n", + HTS_ModelSet_get_parameter_interpolation_size(ms, i)); + for (j = 0, temp = 0.0; + j < HTS_ModelSet_get_parameter_interpolation_size(ms, i); j++) + temp += global->parameter_iw[i][j]; + for (j = 0; j < HTS_ModelSet_get_parameter_interpolation_size(ms, i); j++) + if (global->parameter_iw[i][j] != 0.0) + global->parameter_iw[i][j] /= temp; + for (j = 0; j < HTS_ModelSet_get_parameter_interpolation_size(ms, i); j++) + fprintf(fp, + " Interpolation weight[%2d] -> %8.0f(%%)\n", j, + (float) (100 * global->parameter_iw[i][j])); + /* MSD */ + if (HTS_ModelSet_is_msd(ms, i)) { /* for MSD */ + fprintf(fp, " MSD flag -> TRUE\n"); + fprintf(fp, " MSD threshold -> %8.5f\n", + global->msd_threshold[i]); + } else { /* for non MSD */ + fprintf(fp, " MSD flag -> FALSE\n"); + } + /* GV */ + if (HTS_ModelSet_use_gv(ms, i)) { + fprintf(fp, " GV flag -> TRUE\n"); + if (HTS_ModelSet_have_gv_switch(ms)) { + if (HTS_ModelSet_have_gv_tree(ms, i)) { + fprintf(fp, + " GV type -> CDGV\n"); + fprintf(fp, + " -> +SWITCH\n"); + } else + fprintf(fp, + " GV type -> SWITCH\n"); + } else { + if (HTS_ModelSet_have_gv_tree(ms, i)) + fprintf(fp, + " GV type -> CDGV\n"); + else + fprintf(fp, + " GV type -> NORMAL\n"); + } + fprintf(fp, " GV weight -> %8.0f(%%)\n", + (float) (100 * global->gv_weight[i])); + fprintf(fp, " GV interpolation size -> %8d\n", + HTS_ModelSet_get_gv_interpolation_size(ms, i)); + /* interpolation */ + for (j = 0, temp = 0.0; + j < HTS_ModelSet_get_gv_interpolation_size(ms, i); j++) + temp += global->gv_iw[i][j]; + for (j = 0; j < HTS_ModelSet_get_gv_interpolation_size(ms, i); j++) + if (global->gv_iw[i][j] != 0.0) + global->gv_iw[i][j] /= temp; + for (j = 0; j < HTS_ModelSet_get_gv_interpolation_size(ms, i); j++) + fprintf(fp, + " GV interpolation weight[%2d] -> %8.0f(%%)\n", j, + (float) (100 * global->gv_iw[i][j])); + } else { + fprintf(fp, " GV flag -> FALSE\n"); + } + } + fprintf(fp, "\n"); + + /* generated sequence */ + fprintf(fp, "[Generated sequence]\n"); + fprintf(fp, "Number of HMMs -> %8d\n", + HTS_Label_get_size(label)); + fprintf(fp, "Number of stats -> %8d\n", + HTS_Label_get_size(label) * HTS_ModelSet_get_nstate(ms)); + fprintf(fp, "Length of this speech -> %8.3f(sec)\n", + (float) ((double) HTS_PStreamSet_get_total_frame(pss) * + global->fperiod / global->sampling_rate)); + fprintf(fp, " -> %8.3d(frames)\n", + HTS_PStreamSet_get_total_frame(pss) * global->fperiod); + + for (i = 0; i < HTS_Label_get_size(label); i++) { + fprintf(fp, "HMM[%2d]\n", i); + fprintf(fp, " Name -> %s\n", + HTS_Label_get_string(label, i)); + fprintf(fp, " Duration\n"); + for (j = 0; j < HTS_ModelSet_get_duration_interpolation_size(ms); j++) { + fprintf(fp, " Interpolation[%2d]\n", j); + HTS_ModelSet_get_duration_index(ms, HTS_Label_get_string(label, i), &k, + &l, j); + fprintf(fp, " Tree index -> %8d\n", k); + fprintf(fp, " PDF index -> %8d\n", l); + } + for (j = 0; j < HTS_ModelSet_get_nstate(ms); j++) { + fprintf(fp, " State[%2d]\n", j + 2); + fprintf(fp, " Length -> %8d(frames)\n", + HTS_SStreamSet_get_duration(sss, + i * HTS_ModelSet_get_nstate(ms) + + j)); + for (k = 0; k < HTS_ModelSet_get_nstream(ms); k++) { + fprintf(fp, " Stream[%2d]\n", k); + if (HTS_ModelSet_is_msd(ms, k)) { + if (HTS_SStreamSet_get_msd + (sss, k, + i * HTS_ModelSet_get_nstate(ms) + j) > + global->msd_threshold[k]) + fprintf(fp, + " MSD flag -> TRUE\n"); + else + fprintf(fp, + " MSD flag -> FALSE\n"); + } + for (l = 0; + l < HTS_ModelSet_get_parameter_interpolation_size(ms, k); + l++) { + fprintf(fp, " Interpolation[%2d]\n", l); + HTS_ModelSet_get_parameter_index(ms, + HTS_Label_get_string(label, i), + &m, &n, k, j + 2, l); + fprintf(fp, " Tree index -> %8d\n", + m); + fprintf(fp, " PDF index -> %8d\n", + n); + } + } + } + } +} + +/* HTS_Engine_save_label: output label with time */ +void HTS_Engine_save_label(HTS_Engine * engine, FILE * fp) +{ + int i, j; + int frame, state, duration; + + HTS_Label *label = &engine->label; + HTS_SStreamSet *sss = &engine->sss; + const int nstate = HTS_ModelSet_get_nstate(&engine->ms); + const double rate = + engine->global.fperiod * 1e+7 / engine->global.sampling_rate; + + for (i = 0, state = 0, frame = 0; i < HTS_Label_get_size(label); i++) { + for (j = 0, duration = 0; j < nstate; j++) + duration += HTS_SStreamSet_get_duration(sss, state++); + /* in HTK & HTS format */ + fprintf(fp, "%d %d %s\n", (int) (frame * rate), + (int) ((frame + duration) * rate), + HTS_Label_get_string(label, i)); + frame += duration; + } +} + +#ifndef HTS_EMBEDDED +/* HTS_Engine_save_generated_parameter: output generated parameter */ +void HTS_Engine_save_generated_parameter(HTS_Engine * engine, FILE * fp, + int stream_index) +{ + int i, j; + float temp; + HTS_GStreamSet *gss = &engine->gss; + + for (i = 0; i < HTS_GStreamSet_get_total_frame(gss); i++) + for (j = 0; j < HTS_GStreamSet_get_static_length(gss, stream_index); j++) { + temp = (float) HTS_GStreamSet_get_parameter(gss, stream_index, i, j); + fwrite(&temp, sizeof(float), 1, fp); + } +} +#endif /* !HTS_EMBEDDED */ + +/* HTS_Engine_save_generated_speech: output generated speech */ +void HTS_Engine_save_generated_speech(HTS_Engine * engine, FILE * fp) +{ + int i; + short temp; + HTS_GStreamSet *gss = &engine->gss; + + for (i = 0; i < HTS_GStreamSet_get_total_nsample(gss); i++) { + temp = HTS_GStreamSet_get_speech(gss, i); + fwrite(&temp, sizeof(short), 1, fp); + } +} + +/* HTS_Engine_save_riff: output RIFF format file */ +void HTS_Engine_save_riff(HTS_Engine * engine, FILE * fp) +{ + int i; + short temp; + + HTS_GStreamSet *gss = &engine->gss; + char data_01_04[] = { 'R', 'I', 'F', 'F' }; + int data_05_08 = HTS_GStreamSet_get_total_nsample(gss) * sizeof(short) + 36; + char data_09_12[] = { 'W', 'A', 'V', 'E' }; + char data_13_16[] = { 'f', 'm', 't', ' ' }; + int data_17_20 = 16; + short data_21_22 = 1; /* PCM */ + short data_23_24 = 1; /* monoral */ + int data_25_28 = engine->global.sampling_rate; + int data_29_32 = engine->global.sampling_rate * sizeof(short); + short data_33_34 = sizeof(short); + short data_35_36 = (short) (sizeof(short) * 8); + char data_37_40[] = { 'd', 'a', 't', 'a' }; + int data_41_44 = HTS_GStreamSet_get_total_nsample(gss) * sizeof(short); + + /* write header */ + HTS_fwrite_little_endian(data_01_04, sizeof(char), 4, fp); + HTS_fwrite_little_endian(&data_05_08, sizeof(int), 1, fp); + HTS_fwrite_little_endian(data_09_12, sizeof(char), 4, fp); + HTS_fwrite_little_endian(data_13_16, sizeof(char), 4, fp); + HTS_fwrite_little_endian(&data_17_20, sizeof(int), 1, fp); + HTS_fwrite_little_endian(&data_21_22, sizeof(short), 1, fp); + HTS_fwrite_little_endian(&data_23_24, sizeof(short), 1, fp); + HTS_fwrite_little_endian(&data_25_28, sizeof(int), 1, fp); + HTS_fwrite_little_endian(&data_29_32, sizeof(int), 1, fp); + HTS_fwrite_little_endian(&data_33_34, sizeof(short), 1, fp); + HTS_fwrite_little_endian(&data_35_36, sizeof(short), 1, fp); + HTS_fwrite_little_endian(data_37_40, sizeof(char), 4, fp); + HTS_fwrite_little_endian(&data_41_44, sizeof(int), 1, fp); + /* write data */ + for (i = 0; i < HTS_GStreamSet_get_total_nsample(gss); i++) { + temp = HTS_GStreamSet_get_speech(gss, i); + HTS_fwrite_little_endian(&temp, sizeof(short), 1, fp); + } +} + +/* HTS_Engine_refresh: free model per one time synthesis */ +void HTS_Engine_refresh(HTS_Engine * engine) +{ + /* free generated parameter stream set */ + HTS_GStreamSet_clear(&engine->gss); + /* free parameter stream set */ + HTS_PStreamSet_clear(&engine->pss); + /* free state stream set */ + HTS_SStreamSet_clear(&engine->sss); + /* free label list */ + HTS_Label_clear(&engine->label); + /* stop flag */ + engine->global.stop = FALSE; +} + +/* HTS_Engine_clear: free engine */ +void HTS_Engine_clear(HTS_Engine * engine) +{ + int i; + + HTS_free(engine->global.msd_threshold); + HTS_free(engine->global.duration_iw); + for (i = 0; i < HTS_ModelSet_get_nstream(&engine->ms); i++) { + HTS_free(engine->global.parameter_iw[i]); + if (engine->global.gv_iw[i]) + HTS_free(engine->global.gv_iw[i]); + } + HTS_free(engine->global.parameter_iw); + HTS_free(engine->global.gv_iw); + HTS_free(engine->global.gv_weight); + + HTS_ModelSet_clear(&engine->ms); +} + +/* HTS_get_copyright: write copyright to string */ +void HTS_get_copyright(char *str) +{ + int i, nCopyright = HTS_NCOPYRIGHT; + char url[] = HTS_URL, version[] = HTS_VERSION; + char *copyright[] = { HTS_COPYRIGHT }; + + sprintf(str, "\nThe HMM-based speech synthesis system (HTS)\n"); + sprintf(str, "%shts_engine API version %s (%s)\n", str, version, url); + for (i = 0; i < nCopyright; i++) { + if (i == 0) + sprintf(str, "%sCopyright (C) %s\n", str, copyright[i]); + else + sprintf(str, "%s %s\n", str, copyright[i]); + } + sprintf(str, "%sAll rights reserved.\n", str); + + return; +} + +/* HTS_show_copyright: write copyright to file pointer */ +void HTS_show_copyright(FILE * fp) +{ + char buf[HTS_MAXBUFLEN]; + + HTS_get_copyright(buf); + fprintf(fp, "%s", buf); + + return; +} + +HTS_ENGINE_C_END; + +#endif /* !HTS_ENGINE_C */ diff --git a/src/modules/hts_engine/HTS_engine.h b/src/modules/hts_engine/HTS_engine.h new file mode 100644 index 0000000..a63847c --- /dev/null +++ b/src/modules/hts_engine/HTS_engine.h @@ -0,0 +1,863 @@ +/* ----------------------------------------------------------------- */ +/* The HMM-Based Speech Synthesis System (HTS) */ +/* hts_engine API developed by HTS Working Group */ +/* http://hts-engine.sourceforge.net/ */ +/* ----------------------------------------------------------------- */ +/* */ +/* Copyright (c) 2001-2010 Nagoya Institute of Technology */ +/* Department of Computer Science */ +/* */ +/* 2001-2008 Tokyo Institute of Technology */ +/* Interdisciplinary Graduate School of */ +/* Science and Engineering */ +/* */ +/* All rights reserved. */ +/* */ +/* Redistribution and use in source and binary forms, with or */ +/* without modification, are permitted provided that the following */ +/* conditions are met: */ +/* */ +/* - Redistributions of source code must retain the above copyright */ +/* notice, this list of conditions and the following disclaimer. */ +/* - Redistributions in binary form must reproduce the above */ +/* copyright notice, this list of conditions and the following */ +/* disclaimer in the documentation and/or other materials provided */ +/* with the distribution. */ +/* - Neither the name of the HTS working group nor the names of its */ +/* contributors may be used to endorse or promote products derived */ +/* from this software without specific prior written permission. */ +/* */ +/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND */ +/* CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, */ +/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */ +/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */ +/* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS */ +/* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, */ +/* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED */ +/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, */ +/* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON */ +/* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, */ +/* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY */ +/* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */ +/* POSSIBILITY OF SUCH DAMAGE. */ +/* ----------------------------------------------------------------- */ + +#ifndef HTS_ENGINE_H +#define HTS_ENGINE_H + +#ifdef __cplusplus +#define HTS_ENGINE_H_START extern "C" { +#define HTS_ENGINE_H_END } +#else +#define HTS_ENGINE_H_START +#define HTS_ENGINE_H_END +#endif /* __CPLUSPLUS */ + +HTS_ENGINE_H_START; + +#include + +/* ------------------------ copyright ---------------------------- */ + +#ifdef PACKAGE_VERSION +#define HTS_VERSION PACKAGE_VERSION +#else +#define HTS_VERSION "1.04" +#endif +#define HTS_URL "http://hts-engine.sourceforge.net/" +#define HTS_COPYRIGHT "2001-2010 Nagoya Institute of Technology", \ + "2001-2008 Tokyo Institute of Technology" +#define HTS_NCOPYRIGHT 2 + +/* HTS_show_copyright: write copyright to file pointer */ +void HTS_show_copyright(FILE * fp); + +/* HTS_get_copyright: write copyright to string */ +void HTS_get_copyright(char *str); + +/* -------------------------- common ----------------------------- */ + +typedef int HTS_Boolean; +#ifndef TRUE +#define TRUE 1 +#endif /* !TRUE */ +#ifndef FALSE +#define FALSE 0 +#endif /* !FALSE */ + +#define ZERO 1.0e-10 /* ~(0) */ +#define LZERO (-1.0e+10) /* ~log(0) */ +#define LTPI 1.83787706640935 /* log(2*PI) */ + +/* -------------------------- model ------------------------------ */ + +/* HTS_Window: Window coefficients to calculate dynamic features. */ +typedef struct _HTS_Window { + int size; /* number of windows (static + deltas) */ + int *l_width; /* left width of windows */ + int *r_width; /* right width of windows */ + double **coefficient; /* window coefficient */ + int max_width; /* maximum width of windows */ +} HTS_Window; + +/* HTS_Pattern: List of patterns in a question and a tree. */ +typedef struct _HTS_Pattern { + char *string; /* pattern string */ + struct _HTS_Pattern *next; /* pointer to the next pattern */ +} HTS_Pattern; + +/* HTS_Question: List of questions in a tree. */ +typedef struct _HTS_Question { + char *string; /* name of this question */ + HTS_Pattern *head; /* pointer to the head of pattern list */ + struct _HTS_Question *next; /* pointer to the next question */ +} HTS_Question; + +/* HTS_Node: List of tree nodes in a tree. */ +typedef struct _HTS_Node { + int index; /* index of this node */ + int pdf; /* index of PDF for this node ( leaf node only ) */ + struct _HTS_Node *yes; /* pointer to its child node (yes) */ + struct _HTS_Node *no; /* pointer to its child node (no) */ + struct _HTS_Node *next; /* pointer to the next node */ + HTS_Question *quest; /* question applied at this node */ +} HTS_Node; + +/* HTS_Tree: List of decision trees in a model. */ +typedef struct _HTS_Tree { + HTS_Pattern *head; /* pointer to the head of pattern list for this tree */ + struct _HTS_Tree *next; /* pointer to next tree */ + HTS_Node *root; /* root node of this tree */ + int state; /* state index of this tree */ +} HTS_Tree; + +/* HTS_Model: Set of PDFs, decision trees and questions. */ +typedef struct _HTS_Model { + int vector_length; /* vector length (include static and dynamic features) */ + int ntree; /* # of trees */ + int *npdf; /* # of PDFs at each tree */ + double ***pdf; /* PDFs */ + HTS_Tree *tree; /* pointer to the list of trees */ + HTS_Question *question; /* pointer to the list of questions */ +} HTS_Model; + +/* HTS_Stream: Set of models and a window. */ +typedef struct _HTS_Stream { + int vector_length; /* vector_length (include static and dynamic features) */ + HTS_Model *model; /* models */ + HTS_Window window; /* window coefficients */ + HTS_Boolean msd_flag; /* flag for MSD */ + int interpolation_size; /* # of models for interpolation */ +} HTS_Stream; + +/* HTS_ModelSet: Set of duration models, HMMs and GV models. */ +typedef struct _HTS_ModelSet { + HTS_Stream duration; /* duration PDFs and trees */ + HTS_Stream *stream; /* parameter PDFs, trees and windows */ + HTS_Stream *gv; /* GV PDFs */ + HTS_Model gv_switch; /* GV switch */ + int nstate; /* # of HMM states */ + int nstream; /* # of stream */ +} HTS_ModelSet; + +/* ----------------------- model method -------------------------- */ + +/* HTS_ModelSet_initialize: initialize model set */ +void HTS_ModelSet_initialize(HTS_ModelSet * ms, int nstream); + +/* HTS_ModelSet_load_duration: load duration model and number of state */ +void HTS_ModelSet_load_duration(HTS_ModelSet * ms, FILE ** pdf_fp, + FILE ** tree_fp, int interpolation_size); + +/* HTS_ModelSet_load_parameter: load parameter model */ +void HTS_ModelSet_load_parameter(HTS_ModelSet * ms, FILE ** pdf_fp, + FILE ** tree_fp, FILE ** win_fp, + int stream_index, HTS_Boolean msd_flag, + int window_size, int interpolation_size); + +/* HTS_ModelSet_load_gv: load GV model */ +void HTS_ModelSet_load_gv(HTS_ModelSet * ms, FILE ** pdf_fp, FILE ** tree_fp, + int stream_index, int interpolation_size); + +/* HTS_ModelSet_have_gv_tree: if context-dependent GV is used, return true */ +HTS_Boolean HTS_ModelSet_have_gv_tree(HTS_ModelSet * ms, int stream_index); + +/* HTS_ModelSet_load_gv_switch: load GV switch */ +void HTS_ModelSet_load_gv_switch(HTS_ModelSet * ms, FILE * fp); + +/* HTS_ModelSet_have_gv_switch: if GV switch is used, return true */ +HTS_Boolean HTS_ModelSet_have_gv_switch(HTS_ModelSet * ms); + +/* HTS_ModelSet_get_nstate: get number of state */ +int HTS_ModelSet_get_nstate(HTS_ModelSet * ms); + +/* HTS_ModelSet_get_nstream: get number of stream */ +int HTS_ModelSet_get_nstream(HTS_ModelSet * ms); + +/* HTS_ModelSet_get_vector_length: get vector length */ +int HTS_ModelSet_get_vector_length(HTS_ModelSet * ms, int stream_index); + +/* HTS_ModelSet_is_msd: get MSD flag */ +HTS_Boolean HTS_ModelSet_is_msd(HTS_ModelSet * ms, int stream_index); + +/* HTS_ModelSet_get_window_size: get dynamic window size */ +int HTS_ModelSet_get_window_size(HTS_ModelSet * ms, int stream_index); + +/* HTS_ModelSet_get_window_left_width: get left width of dynamic window */ +int HTS_ModelSet_get_window_left_width(HTS_ModelSet * ms, int stream_index, + int window_index); + +/* HTS_ModelSet_get_window_right_width: get right width of dynamic window */ +int HTS_ModelSet_get_window_right_width(HTS_ModelSet * ms, int stream_index, + int window_index); + +/* HTS_ModelSet_get_window_coefficient: get coefficient of dynamic window */ +double HTS_ModelSet_get_window_coefficient(HTS_ModelSet * ms, int stream_index, + int window_index, + int coefficient_index); + +/* HTS_ModelSet_get_window_max_width: get max width of dynamic window */ +int HTS_ModelSet_get_window_max_width(HTS_ModelSet * ms, int stream_index); + +/* HTS_ModelSet_get_duration_interpolation_size: get interpolation size (duration model) */ +int HTS_ModelSet_get_duration_interpolation_size(HTS_ModelSet * ms); + +/* HTS_ModelSet_get_parameter_interpolation_size: get interpolation size (parameter model) */ +int HTS_ModelSet_get_parameter_interpolation_size(HTS_ModelSet * ms, + int stream_index); + +/* HTS_ModelSet_get_gv_interpolation_size: get interpolation size (GV model) */ +int HTS_ModelSet_get_gv_interpolation_size(HTS_ModelSet * ms, int stream_index); + +/* HTS_ModelSet_use_gv: get GV flag */ +HTS_Boolean HTS_ModelSet_use_gv(HTS_ModelSet * ms, int index); + +/* HTS_ModelSet_get_duration_index: get index of duration tree and PDF */ +void HTS_ModelSet_get_duration_index(HTS_ModelSet * ms, char *string, + int *tree_index, int *pdf_index, + int interpolation_index); + +/* HTS_ModelSet_get_duration: get duration using interpolation weight */ +void HTS_ModelSet_get_duration(HTS_ModelSet * ms, char *string, double *mean, + double *vari, double *iw); + +/* HTS_ModelSet_get_parameter_index: get index of parameter tree and PDF */ +void HTS_ModelSet_get_parameter_index(HTS_ModelSet * ms, char *string, + int *tree_index, int *pdf_index, + int stream_index, int state_index, + int interpolation_index); + +/* HTS_ModelSet_get_parameter: get parameter using interpolation weight */ +void HTS_ModelSet_get_parameter(HTS_ModelSet * ms, char *string, double *mean, + double *vari, double *msd, int stream_index, + int state_index, double *iw); + +/* HTS_ModelSet_get_gv: get GV using interpolation weight */ +void HTS_ModelSet_get_gv(HTS_ModelSet * ms, char *string, double *mean, + double *vari, int stream_index, double *iw); + +/* HTS_ModelSet_get_gv_switch: get GV switch */ +HTS_Boolean HTS_ModelSet_get_gv_switch(HTS_ModelSet * ms, char *string); + +/* HTS_ModelSet_clear: free model set */ +void HTS_ModelSet_clear(HTS_ModelSet * ms); + +/* -------------------------- label ------------------------------ */ + +/* HTS_LabelString: individual label string with time information */ +typedef struct _HTS_LabelString { + struct _HTS_LabelString *next; /* pointer to next label string */ + char *name; /* label string */ + double start; /* start frame specified in the given label */ + double end; /* end frame specified in the given label */ +} HTS_LabelString; + +/* HTS_Label: list of label strings */ +typedef struct _HTS_Label { + HTS_LabelString *head; /* pointer to the head of label string */ + int size; /* # of label strings */ + HTS_Boolean frame_flag; /* flag for frame length modification */ + double speech_speed; /* speech speed rate */ +} HTS_Label; + +/* ----------------------- label method -------------------------- */ + +/* HTS_Label_initialize: initialize label */ +void HTS_Label_initialize(HTS_Label * label); + +/* HTS_Label_load_from_fn: load label from file name */ +void HTS_Label_load_from_fn(HTS_Label * label, int sampling_rate, int fperiod, + char *fn); + +/* HTS_Label_load_from_fp: load label list from file pointer */ +void HTS_Label_load_from_fp(HTS_Label * label, int sampling_rate, int fperiod, + FILE * fp); + +/* HTS_Label_load_from_string: load label from string */ +void HTS_Label_load_from_string(HTS_Label * label, int sampling_rate, + int fperiod, char *data); + +/* HTS_Label_load_from_string_list: load label list from string list */ +void HTS_Label_load_from_string_list(HTS_Label * label, int sampling_rate, + int fperiod, char **data, int size); + +/* HTS_Label_set_speech_speed: set speech speed rate */ +void HTS_Label_set_speech_speed(HTS_Label * label, double f); + +/* HTS_Label_set_frame_specified_flag: set frame specified flag */ +void HTS_Label_set_frame_specified_flag(HTS_Label * label, HTS_Boolean i); + +/* HTS_Label_get_size: get number of label string */ +int HTS_Label_get_size(HTS_Label * label); + +/* HTS_Label_get_string: get label string */ +char *HTS_Label_get_string(HTS_Label * label, int string_index); + +/* HTS_Label_get_frame_specified_flag: get frame specified flag */ +HTS_Boolean HTS_Label_get_frame_specified_flag(HTS_Label * label); + +/* HTS_Label_get_start_frame: get start frame */ +double HTS_Label_get_start_frame(HTS_Label * label, int string_index); + +/* HTS_Label_get_end_frame: get end frame */ +double HTS_Label_get_end_frame(HTS_Label * label, int string_index); + +/* HTS_Label_get_speech_speed: get speech speed rate */ +double HTS_Label_get_speech_speed(HTS_Label * label); + +/* HTS_Label_clear: free label */ +void HTS_Label_clear(HTS_Label * label); + +/* -------------------------- sstream ---------------------------- */ + +/* HTS_SStream: individual state stream */ +typedef struct _HTS_SStream { + int vector_length; /* vector length (include static and dynamic features) */ + double **mean; /* mean vector sequence */ + double **vari; /* variance vector sequence */ + double *msd; /* MSD parameter sequence */ + int win_size; /* # of windows (static + deltas) */ + int *win_l_width; /* left width of windows */ + int *win_r_width; /* right width of windows */ + double **win_coefficient; /* window cofficients */ + int win_max_width; /* maximum width of windows */ + double *gv_mean; /* mean vector of GV */ + double *gv_vari; /* variance vector of GV */ + HTS_Boolean *gv_switch; /* GV flag sequence */ +} HTS_SStream; + +/* HTS_SStreamSet: set of state stream */ +typedef struct _HTS_SStreamSet { + HTS_SStream *sstream; /* state streams */ + int nstream; /* # of streams */ + int nstate; /* # of states */ + int *duration; /* duration sequence */ + int total_state; /* total state */ + int total_frame; /* total frame */ +} HTS_SStreamSet; + +/* ----------------------- sstream method ------------------------ */ + +/* HTS_SStreamSet_initialize: initialize state stream set */ +void HTS_SStreamSet_initialize(HTS_SStreamSet * sss); + +/* HTS_SStreamSet_create: parse label and determine state duration */ +void HTS_SStreamSet_create(HTS_SStreamSet * sss, HTS_ModelSet * ms, + HTS_Label * label, double *duration_iw, + double **parameter_iw, double **gv_iw); + +/* HTS_SStreamSet_get_nstream: get number of stream */ +int HTS_SStreamSet_get_nstream(HTS_SStreamSet * sss); + +/* HTS_SStreamSet_get_vector_length: get vector length */ +int HTS_SStreamSet_get_vector_length(HTS_SStreamSet * sss, int stream_index); + +/* HTS_SStreamSet_is_msd: get MSD flag */ +HTS_Boolean HTS_SStreamSet_is_msd(HTS_SStreamSet * sss, int stream_index); + +/* HTS_SStreamSet_get_total_state: get total number of state */ +int HTS_SStreamSet_get_total_state(HTS_SStreamSet * sss); + +/* HTS_SStreamSet_get_total_frame: get total number of frame */ +int HTS_SStreamSet_get_total_frame(HTS_SStreamSet * sss); + +/* HTS_SStreamSet_get_msd: get msd parameter */ +double HTS_SStreamSet_get_msd(HTS_SStreamSet * sss, int stream_index, + int state_index); + +/* HTS_SStreamSet_window_size: get dynamic window size */ +int HTS_SStreamSet_get_window_size(HTS_SStreamSet * sss, int stream_index); + +/* HTS_SStreamSet_get_window_left_width: get left width of dynamic window */ +int HTS_SStreamSet_get_window_left_width(HTS_SStreamSet * sss, int stream_index, + int window_index); + +/* HTS_SStreamSet_get_window_right_width: get right width of dynamic window */ +int HTS_SStreamSet_get_window_right_width(HTS_SStreamSet * sss, + int stream_index, int window_index); + +/* HTS_SStreamSet_get_window_coefficient: get coefficient of dynamic window */ +double HTS_SStreamSet_get_window_coefficient(HTS_SStreamSet * sss, + int stream_index, int window_index, + int coefficient_index); + +/* HTS_SStreamSet_get_window_max_width: get max width of dynamic window */ +int HTS_SStreamSet_get_window_max_width(HTS_SStreamSet * sss, int stream_index); + +/* HTS_SStreamSet_use_gv: get GV flag */ +HTS_Boolean HTS_SStreamSet_use_gv(HTS_SStreamSet * sss, int stream_index); + +/* HTS_SStreamSet_get_duration: get state duration */ +int HTS_SStreamSet_get_duration(HTS_SStreamSet * sss, int state_index); + +/* HTS_SStreamSet_get_mean: get mean parameter */ +double HTS_SStreamSet_get_mean(HTS_SStreamSet * sss, int stream_index, + int state_index, int vector_index); + +/* HTS_SStreamSet_set_mean: set mean parameter */ +void HTS_SStreamSet_set_mean(HTS_SStreamSet * sss, int stream_index, + int state_index, int vector_index, double f); + +/* HTS_SStreamSet_get_vari: get variance parameter */ +double HTS_SStreamSet_get_vari(HTS_SStreamSet * sss, int stream_index, + int state_index, int vector_index); + +/* HTS_SStreamSet_set_vari: set variance parameter */ +void HTS_SStreamSet_set_vari(HTS_SStreamSet * sss, int stream_index, + int state_index, int vector_index, double f); + +/* HTS_SStreamSet_get_gv_mean: get GV mean parameter */ +double HTS_SStreamSet_get_gv_mean(HTS_SStreamSet * sss, int stream_index, + int vector_index); + +/* HTS_SStreamSet_get_gv_mean: get GV variance parameter */ +double HTS_SStreamSet_get_gv_vari(HTS_SStreamSet * sss, int stream_index, + int vector_index); + +/* HTS_SStreamSet_set_gv_switch: set GV switch */ +void HTS_SStreamSet_set_gv_switch(HTS_SStreamSet * sss, int stream_index, + int state_index, HTS_Boolean i); + +/* HTS_SStreamSet_get_gv_switch: get GV switch */ +HTS_Boolean HTS_SStreamSet_get_gv_switch(HTS_SStreamSet * sss, int stream_index, + int state_index); + +/* HTS_SStreamSet_clear: free state stream set */ +void HTS_SStreamSet_clear(HTS_SStreamSet * sss); + +/* -------------------------- pstream ---------------------------- */ + +/* HTS_SMatrices: Matrices/Vectors used in the speech parameter generation algorithm. */ +typedef struct _HTS_SMatrices { + double **mean; /* mean vector sequence */ + double **ivar; /* inverse diag variance sequence */ + double *g; /* vector used in the forward substitution */ + double **wuw; /* W' U^-1 W */ + double *wum; /* W' U^-1 mu */ +} HTS_SMatrices; + +/* HTS_PStream: Individual PDF stream. */ +typedef struct _HTS_PStream { + int vector_length; /* vector length (include static and dynamic features) */ + int static_length; /* static features length */ + int length; /* stream length */ + int width; /* width of dynamic window */ + double **par; /* output parameter vector */ + HTS_SMatrices sm; /* matrices for parameter generation */ + int win_size; /* # of windows (static + deltas) */ + int *win_l_width; /* left width of windows */ + int *win_r_width; /* right width of windows */ + double **win_coefficient; /* window coefficients */ + HTS_Boolean *msd_flag; /* Boolean sequence for MSD */ + double *gv_mean; /* mean vector of GV */ + double *gv_vari; /* variance vector of GV */ + HTS_Boolean *gv_switch; /* GV flag sequence */ + int gv_length; /* frame length for GV calculation */ +} HTS_PStream; + +/* HTS_PStreamSet: Set of PDF streams. */ +typedef struct _HTS_PStreamSet { + HTS_PStream *pstream; /* PDF streams */ + int nstream; /* # of PDF streams */ + int total_frame; /* total frame */ +} HTS_PStreamSet; + +/* ----------------------- pstream method ------------------------ */ + +/* HTS_PStreamSet_initialize: initialize parameter stream set */ +void HTS_PStreamSet_initialize(HTS_PStreamSet * pss); + +/* HTS_PStreamSet_create: parameter generation using GV weight */ +void HTS_PStreamSet_create(HTS_PStreamSet * pss, HTS_SStreamSet * sss, + double *msd_threshold, double *gv_weight); + +/* HTS_PStreamSet_get_nstream: get number of stream */ +int HTS_PStreamSet_get_nstream(HTS_PStreamSet * pss); + +/* HTS_PStreamSet_get_static_length: get static features length */ +int HTS_PStreamSet_get_static_length(HTS_PStreamSet * pss, int stream_index); + +/* HTS_PStreamSet_get_total_frame: get total number of frame */ +int HTS_PStreamSet_get_total_frame(HTS_PStreamSet * pss); + +/* HTS_PStreamSet_get_parameter: get parameter */ +double HTS_PStreamSet_get_parameter(HTS_PStreamSet * pss, int stream_index, + int frame_index, int vector_index); + +/* HTS_PStreamSet_get_parameter_vector: get parameter vector */ +double *HTS_PStreamSet_get_parameter_vector(HTS_PStreamSet * pss, + int stream_index, int frame_index); + +/* HTS_PStreamSet_get_msd_flag: get generated MSD flag per frame */ +HTS_Boolean HTS_PStreamSet_get_msd_flag(HTS_PStreamSet * pss, int stream_index, + int frame_index); + +/* HTS_PStreamSet_is_msd: get MSD flag */ +HTS_Boolean HTS_PStreamSet_is_msd(HTS_PStreamSet * pss, int stream_index); + +/* HTS_PStreamSet_clear: free parameter stream set */ +void HTS_PStreamSet_clear(HTS_PStreamSet * pss); + +/* -------------------------- gstream ---------------------------- */ + +#ifndef HTS_EMBEDDED +/* HTS_GStream: Generated parameter stream. */ +typedef struct _HTS_GStream { + int static_length; /* static features length */ + double **par; /* generated parameter */ +} HTS_GStream; +#endif /* !HTS_EMBEDDED */ + +/* HTS_GStreamSet: Set of generated parameter stream. */ +typedef struct _HTS_GStreamSet { + int total_nsample; /* total sample */ + int total_frame; /* total frame */ + int nstream; /* # of streams */ +#ifndef HTS_EMBEDDED + HTS_GStream *gstream; /* generated parameter streams */ +#endif /* !HTS_EMBEDDED */ + short *gspeech; /* generated speech */ +} HTS_GStreamSet; + +/* ----------------------- gstream method ------------------------ */ + +/* HTS_GStreamSet_initialize: initialize generated parameter stream set */ +void HTS_GStreamSet_initialize(HTS_GStreamSet * gss); + +/* HTS_GStreamSet_create: generate speech */ +void HTS_GStreamSet_create(HTS_GStreamSet * gss, HTS_PStreamSet * pss, + int stage, HTS_Boolean use_log_gain, + int sampling_rate, int fperiod, double alpha, + double beta, HTS_Boolean * stop, double volume, + int audio_buff_size); + +/* HTS_GStreamSet_get_total_nsample: get total number of sample */ +int HTS_GStreamSet_get_total_nsample(HTS_GStreamSet * gss); + +/* HTS_GStreamSet_get_total_frame: get total number of frame */ +int HTS_GStreamSet_get_total_frame(HTS_GStreamSet * gss); + +#ifndef HTS_EMBEDDED +/* HTS_GStreamSet_get_static_length: get static features length */ +int HTS_GStreamSet_get_static_length(HTS_GStreamSet * gss, int stream_index); +#endif /* !HTS_EMBEDDED */ + +/* HTS_GStreamSet_get_speech: get synthesized speech parameter */ +short HTS_GStreamSet_get_speech(HTS_GStreamSet * gss, int sample_index); + +#ifndef HTS_EMBEDDED +/* HTS_GStreamSet_get_parameter: get generated parameter */ +double HTS_GStreamSet_get_parameter(HTS_GStreamSet * gss, int stream_index, + int frame_index, int vector_index); +#endif /* !HTS_EMBEDDED */ + +/* HTS_GStreamSet_clear: free generated parameter stream set */ +void HTS_GStreamSet_clear(HTS_GStreamSet * gss); + +/* -------------------------- engine ----------------------------- */ + +/* HTS_Global: Global settings. */ +typedef struct _HTS_Global { + int stage; /* Gamma=-1/stage: if stage=0 then Gamma=0 */ + HTS_Boolean use_log_gain; /* log gain flag (for LSP) */ + int sampling_rate; /* sampling rate */ + int fperiod; /* frame period */ + double alpha; /* all-pass constant */ + double beta; /* postfiltering coefficient */ + int audio_buff_size; /* audio buffer size (for audio device) */ + double *msd_threshold; /* MSD thresholds */ + double *duration_iw; /* weights for duration interpolation */ + double **parameter_iw; /* weights for parameter interpolation */ + double **gv_iw; /* weights for GV interpolation */ + double *gv_weight; /* GV weights */ + HTS_Boolean stop; /* stop flag */ + double volume; /* volume */ +} HTS_Global; + +/* HTS_Engine: Engine itself. */ +typedef struct _HTS_Engine { + HTS_Global global; /* global settings */ + HTS_ModelSet ms; /* set of duration models, HMMs and GV models */ + HTS_Label label; /* label */ + HTS_SStreamSet sss; /* set of state streams */ + HTS_PStreamSet pss; /* set of PDF streams */ + HTS_GStreamSet gss; /* set of generated parameter streams */ +} HTS_Engine; + +/* ----------------------- engine method ------------------------- */ + +/* HTS_Engine_initialize: initialize engine */ +void HTS_Engine_initialize(HTS_Engine * engine, int nstream); + +/* HTS_engine_load_duration_from_fn: load duration pdfs ,trees and number of state from file names */ +void HTS_Engine_load_duration_from_fn(HTS_Engine * engine, char **pdf_fn, + char **tree_fn, int interpolation_size); + +/* HTS_Engine_load_duration_from_fp: load duration pdfs, trees and number of state from file pointers */ +void HTS_Engine_load_duration_from_fp(HTS_Engine * engine, FILE ** pdf_fp, + FILE ** tree_fp, int interpolation_size); + +/* HTS_Engine_load_parameter_from_fn: load parameter pdfs, trees and windows from file names */ +void HTS_Engine_load_parameter_from_fn(HTS_Engine * engine, char **pdf_fn, + char **tree_fn, char **win_fn, + int stream_index, HTS_Boolean msd_flag, + int window_size, int interpolation_size); + +/* HTS_Engine_load_parameter_from_fp: load parameter pdfs, trees and windows from file pointers */ +void HTS_Engine_load_parameter_from_fp(HTS_Engine * engine, FILE ** pdf_fp, + FILE ** tree_fp, FILE ** win_fp, + int stream_index, HTS_Boolean msd_flag, + int window_size, int interpolation_size); + +/* HTS_Engine_load_gv_from_fn: load GV pdfs and trees from file names */ +void HTS_Engine_load_gv_from_fn(HTS_Engine * engine, char **pdf_fn, + char **tree_fn, int stream_index, + int interpolation_size); + +/* HTS_Engine_load_gv_from_fp: load GV pdfs and trees from file pointers */ +void HTS_Engine_load_gv_from_fp(HTS_Engine * engine, FILE ** pdf_fp, + FILE ** tree_fp, int stream_index, + int interpolation_size); + +/* HTS_Engine_load_gv_switch_from_fn: load GV switch from file names */ +void HTS_Engine_load_gv_switch_from_fn(HTS_Engine * engine, char *fn); + +/* HTS_Engine_load_gv_switch_from_fp: load GV switch from file pointers */ +void HTS_Engine_load_gv_switch_from_fp(HTS_Engine * engine, FILE * fp); + +/* HTS_Engine_set_sampling_rate: set sampling rate */ +void HTS_Engine_set_sampling_rate(HTS_Engine * engine, int i); + +/* HTS_Engine_get_sampling_rate: get sampling rate */ +int HTS_Engine_get_sampling_rate(HTS_Engine * engine); + +/* HTS_Engine_set_fperiod: set frame shift */ +void HTS_Engine_set_fperiod(HTS_Engine * engine, int i); + +/* HTS_Engine_get_fperiod: get frame shift */ +int HTS_Engine_get_fperiod(HTS_Engine * engine); + +/* HTS_Engine_set_alpha: set alpha */ +void HTS_Engine_set_alpha(HTS_Engine * engine, double f); + +/* HTS_Engine_set_gamma: set gamma (Gamma=-1/i: if i=0 then Gamma=0) */ +void HTS_Engine_set_gamma(HTS_Engine * engine, int i); + +/* HTS_Engine_set_log_gain: set log gain flag (for LSP) */ +void HTS_Engine_set_log_gain(HTS_Engine * engine, HTS_Boolean i); + +/* HTS_Engine_set_beta: set beta */ +void HTS_Engine_set_beta(HTS_Engine * engine, double f); + +/* HTS_Engine_set_audio_buff_size: set audio buffer size */ +void HTS_Engine_set_audio_buff_size(HTS_Engine * engine, int i); + +/* HTS_Engine_get_audio_buff_size: get audio buffer size */ +int HTS_Engine_get_audio_buff_size(HTS_Engine * engine); + +/* HTS_Egnine_set_msd_threshold: set MSD threshold */ +void HTS_Engine_set_msd_threshold(HTS_Engine * engine, int stream_index, + double f); + +/* HTS_Engine_set_duration_interpolation_weight: set interpolation weight for duration */ +void HTS_Engine_set_duration_interpolation_weight(HTS_Engine * engine, + int interpolation_index, + double f); + +/* HTS_Engine_set_parameter_interpolation_weight: set interpolation weight for parameter */ +void HTS_Engine_set_parameter_interpolation_weight(HTS_Engine * engine, + int stream_index, + int interpolation_index, + double f); + +/* HTS_Engine_set_gv_interpolation_weight: set interpolation weight for GV */ +void HTS_Engine_set_gv_interpolation_weight(HTS_Engine * engine, + int stream_index, + int interpolation_index, double f); + +/* HTS_Engine_set_gv_weight: set GV weight */ +void HTS_Engine_set_gv_weight(HTS_Engine * engine, int stream_index, double f); + +/* HTS_Engine_set_stop_flag: set stop flag */ +void HTS_Engine_set_stop_flag(HTS_Engine * engine, HTS_Boolean b); + +/* HTS_Engine_set_volume: set volume */ +void HTS_Engine_set_volume(HTS_Engine * engine, double f); + +/* HTS_Engine_get_total_state: get total number of state */ +int HTS_Engine_get_total_state(HTS_Engine * engine); + +/* HTS_Engine_set_state_mean: set mean value of state */ +void HTS_Engine_set_state_mean(HTS_Engine * engine, int stream_index, + int state_index, int vector_index, double f); + +/* HTS_Engine_get_state_mean: get mean value of state */ +double HTS_Engine_get_state_mean(HTS_Engine * engine, int stream_index, + int state_index, int vector_index); + +/* HTS_Engine_get_state_duration: get state duration */ +int HTS_Engine_get_state_duration(HTS_Engine * engine, int state_index); + +/* HTS_Engine_get_nstate: get number of state */ +int HTS_Engine_get_nstate(HTS_Engine * engine); + +/* HTS_Engine_load_label_from_fn: load label from file pointer */ +void HTS_Engine_load_label_from_fn(HTS_Engine * engine, char *fn); + +/* HTS_Engine_load_label_from_fp: load label from file name */ +void HTS_Engine_load_label_from_fp(HTS_Engine * engine, FILE * fp); + +/* HTS_Engine_load_label_from_string: load label from string */ +void HTS_Engine_load_label_from_string(HTS_Engine * engine, char *data); + +/* HTS_Engine_load_label_from_string_list: load label from string list */ +void HTS_Engine_load_label_from_string_list(HTS_Engine * engine, char **data, + int size); + +/* HTS_Engine_create_sstream: parse label and determine state duration */ +void HTS_Engine_create_sstream(HTS_Engine * engine); + +/* HTS_Engine_create_pstream: generate speech parameter vector sequence */ +void HTS_Engine_create_pstream(HTS_Engine * engine); + +/* HTS_Engine_create_gstream: synthesis speech */ +void HTS_Engine_create_gstream(HTS_Engine * engine); + +/* HTS_Engine_save_information: output trace information */ +void HTS_Engine_save_information(HTS_Engine * engine, FILE * fp); + +/* HTS_Engine_save_label: output label with time */ +void HTS_Engine_save_label(HTS_Engine * engine, FILE * fp); + +#ifndef HTS_EMBEDDED +/* HTS_Engine_save_generated_parameter: output generated parameter */ +void HTS_Engine_save_generated_parameter(HTS_Engine * engine, FILE * fp, + int stream_index); +#endif /* !HTS_EMBEDDED */ + +/* HTS_Engine_save_generated_speech: output generated speech */ +void HTS_Engine_save_generated_speech(HTS_Engine * engine, FILE * fp); + +/* HTS_Engine_save_riff: output RIFF format file */ +void HTS_Engine_save_riff(HTS_Engine * engine, FILE * wavfp); + +/* HTS_Engine_refresh: free memory per one time synthesis */ +void HTS_Engine_refresh(HTS_Engine * engine); + +/* HTS_Engine_clear: free engine */ +void HTS_Engine_clear(HTS_Engine * engine); + +/* -------------------------- audio ------------------------------ */ + +#if !defined(AUDIO_PLAY_WINCE) && !defined(AUDIO_PLAY_WIN32) && !defined(AUDIO_PLAY_NONE) +#if defined(__WINCE__) || defined(_WINCE) || defined(_WINCE) || defined(__WINCE) +#define AUDIO_PLAY_WINCE +#elif defined(__WIN32__) || defined(__WIN32) || defined(_WIN32) || defined(WIN32) || defined(__CYGWIN__) || defined(__MINGW32__) +#define AUDIO_PLAY_WIN32 +#else +#define AUDIO_PLAY_NONE +#endif /* WINCE || WIN32 */ +#endif /* !AUDIO_PLAY_WINCE && !AUDIO_PLAY_WIN32 && !AUDIO_PLAY_NONE */ + +/* HTS_Audio: For MS Windows (Windows Mobile) audio output device. */ +#if defined (AUDIO_PLAY_WIN32) || defined(AUDIO_PLAY_WINCE) +#include +#include +typedef struct _HTS_Audio { + HWAVEOUT hwaveout; /* audio device handle */ + WAVEFORMATEX waveformatex; /* wave formatex */ + short *buff; /* current buffer */ + int buff_size; /* current buffer size */ + int which_buff; /* double buffering flag */ + HTS_Boolean now_buff_1; /* double buffering flag */ + HTS_Boolean now_buff_2; /* double buffering flag */ + WAVEHDR buff_1; /* buffer */ + WAVEHDR buff_2; /* buffer */ + int max_buff_size; /* buffer size of audio output device */ +} HTS_Audio; +#endif /* AUDIO_PLAY_WIN32 || AUDIO_PLAY_WINCE */ + +/* HTS_Audio: For Linux, etc. */ +#ifdef AUDIO_PLAY_NONE +typedef struct _HTS_Audio { + int i; /* make compiler happy */ +} HTS_Audio; +#endif /* AUDIO_PLAY_NONE */ + +/* -------------------------- vocoder ---------------------------- */ + +/* HTS_Vocoder: structure for setting of vocoder */ +typedef struct _HTS_Vocoder { + int stage; /* Gamma=-1/stage: if stage=0 then Gamma=0 */ + double gamma; /* Gamma */ + HTS_Boolean use_log_gain; /* log gain flag (for LSP) */ + int fprd; /* frame shift */ + int iprd; /* interpolation period */ + int seed; /* seed of random generator */ + unsigned long next; /* temporary variable for random generator */ + HTS_Boolean gauss; /* flag to use Gaussian noise */ + double rate; /* sampling rate */ + double p1; /* used in excitation generation */ + double pc; /* used in excitation generation */ + double p; /* used in excitation generation */ + double inc; /* used in excitation generation */ + int sw; /* switch used in random generator */ + int x; /* excitation signal */ + HTS_Audio *audio; /* pointer for audio device */ + double *freqt_buff; /* used in freqt */ + int freqt_size; /* buffer size for freqt */ + double *spectrum2en_buff; /* used in spectrum2en */ + int spectrum2en_size; /* buffer size for spectrum2en */ + double r1, r2, s; /* used in random generator */ + double *postfilter_buff; /* used in postfiltering */ + int postfilter_size; /* buffer size for postfiltering */ + double *c, *cc, *cinc, *d1; /* used in the MLSA/MGLSA filter */ + double *pade; /* used in mlsadf */ + double *lsp2lpc_buff; /* used in lsp2lpc */ + int lsp2lpc_size; /* buffer size of lsp2lpc */ + double *gc2gc_buff; /* used in gc2gc */ + int gc2gc_size; /* buffer size for gc2gc */ +} HTS_Vocoder; + +/* ----------------------- vocoder method ------------------------ */ + +/* HTS_Vocoder_initialize: initialize vocoder */ +void HTS_Vocoder_initialize(HTS_Vocoder * v, const int m, const int stage, + HTS_Boolean use_log_gain, const int rate, + const int fperiod, int buff_size); + +/* HTS_Vocoder_synthesize: pulse/noise excitation and MLSA/MGLSA filster based waveform synthesis */ +void HTS_Vocoder_synthesize(HTS_Vocoder * v, const int m, double lf0, + double *spectrum, double alpha, double beta, + double volume, short *rawdata); + +/* HTS_Vocoder_postfilter_mcp: postfilter for MCP */ +void HTS_Vocoder_postfilter_mcp(HTS_Vocoder * v, double *mcp, const int m, + double alpha, double beta); + +/* HTS_Vocoder_clear: clear vocoder */ +void HTS_Vocoder_clear(HTS_Vocoder * v); + +HTS_ENGINE_H_END; + +#endif /* !HTS_ENGINE_H */ diff --git a/src/modules/hts_engine/HTS_gstream.c b/src/modules/hts_engine/HTS_gstream.c new file mode 100644 index 0000000..4a606bf --- /dev/null +++ b/src/modules/hts_engine/HTS_gstream.c @@ -0,0 +1,230 @@ +/* ----------------------------------------------------------------- */ +/* The HMM-Based Speech Synthesis System (HTS) */ +/* hts_engine API developed by HTS Working Group */ +/* http://hts-engine.sourceforge.net/ */ +/* ----------------------------------------------------------------- */ +/* */ +/* Copyright (c) 2001-2010 Nagoya Institute of Technology */ +/* Department of Computer Science */ +/* */ +/* 2001-2008 Tokyo Institute of Technology */ +/* Interdisciplinary Graduate School of */ +/* Science and Engineering */ +/* */ +/* All rights reserved. */ +/* */ +/* Redistribution and use in source and binary forms, with or */ +/* without modification, are permitted provided that the following */ +/* conditions are met: */ +/* */ +/* - Redistributions of source code must retain the above copyright */ +/* notice, this list of conditions and the following disclaimer. */ +/* - Redistributions in binary form must reproduce the above */ +/* copyright notice, this list of conditions and the following */ +/* disclaimer in the documentation and/or other materials provided */ +/* with the distribution. */ +/* - Neither the name of the HTS working group nor the names of its */ +/* contributors may be used to endorse or promote products derived */ +/* from this software without specific prior written permission. */ +/* */ +/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND */ +/* CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, */ +/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */ +/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */ +/* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS */ +/* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, */ +/* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED */ +/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, */ +/* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON */ +/* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, */ +/* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY */ +/* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */ +/* POSSIBILITY OF SUCH DAMAGE. */ +/* ----------------------------------------------------------------- */ + +#ifndef HTS_GSTREAM_C +#define HTS_GSTREAM_C + +#ifdef __cplusplus +#define HTS_GSTREAM_C_START extern "C" { +#define HTS_GSTREAM_C_END } +#else +#define HTS_GSTREAM_C_START +#define HTS_GSTREAM_C_END +#endif /* __CPLUSPLUS */ + +HTS_GSTREAM_C_START; + +/* hts_engine libraries */ +#include "HTS_hidden.h" + +/* HTS_GStreamSet_initialize: initialize generated parameter stream set */ +void HTS_GStreamSet_initialize(HTS_GStreamSet * gss) +{ + gss->nstream = 0; + gss->total_frame = 0; + gss->total_nsample = 0; +#ifndef HTS_EMBEDDED + gss->gstream = NULL; +#endif /* !HTS_EMBEDDED */ + gss->gspeech = NULL; +} + +/* HTS_GStreamSet_create: generate speech */ +/* (stream[0] == spectrum && stream[1] == lf0) */ +void HTS_GStreamSet_create(HTS_GStreamSet * gss, HTS_PStreamSet * pss, + int stage, HTS_Boolean use_log_gain, + int sampling_rate, int fperiod, double alpha, + double beta, HTS_Boolean * stop, double volume, + int audio_buff_size) +{ + int i, j, k; +#ifdef HTS_EMBEDDED + double lf0; +#endif /* HTS_EMBEDDED */ + int msd_frame; + HTS_Vocoder v; + + /* check */ +#ifdef HTS_EMBEDDED + if (gss->gspeech) +#else + if (gss->gstream || gss->gspeech) +#endif /* HTS_EMBEDDED */ + HTS_error(1, + "HTS_GStreamSet_create: HTS_GStreamSet is not initialized.\n"); + + /* initialize */ + gss->nstream = HTS_PStreamSet_get_nstream(pss); + gss->total_frame = HTS_PStreamSet_get_total_frame(pss); + gss->total_nsample = fperiod * gss->total_frame; +#ifndef HTS_EMBEDDED + gss->gstream = (HTS_GStream *) HTS_calloc(gss->nstream, sizeof(HTS_GStream)); + for (i = 0; i < gss->nstream; i++) { + gss->gstream[i].static_length = HTS_PStreamSet_get_static_length(pss, i); + gss->gstream[i].par = + (double **) HTS_calloc(gss->total_frame, sizeof(double *)); + for (j = 0; j < gss->total_frame; j++) + gss->gstream[i].par[j] = + (double *) HTS_calloc(gss->gstream[i].static_length, + sizeof(double)); + } +#endif /* !HTS_EMBEDDED */ + gss->gspeech = (short *) HTS_calloc(gss->total_nsample, sizeof(short)); + +#ifndef HTS_EMBEDDED + /* copy generated parameter */ + for (i = 0; i < gss->nstream; i++) { + if (HTS_PStreamSet_is_msd(pss, i)) { /* for MSD */ + for (j = 0, msd_frame = 0; j < gss->total_frame; j++) + if (HTS_PStreamSet_get_msd_flag(pss, i, j)) { + for (k = 0; k < gss->gstream[i].static_length; k++) + gss->gstream[i].par[j][k] = + HTS_PStreamSet_get_parameter(pss, i, msd_frame, k); + msd_frame++; + } else + for (k = 0; k < gss->gstream[i].static_length; k++) + gss->gstream[i].par[j][k] = LZERO; + } else { /* for non MSD */ + for (j = 0; j < gss->total_frame; j++) + for (k = 0; k < gss->gstream[i].static_length; k++) + gss->gstream[i].par[j][k] = + HTS_PStreamSet_get_parameter(pss, i, j, k); + } + } +#endif /* !HTS_EMBEDDED */ + + /* check */ + if (gss->nstream != 2) + HTS_error(1, + "HTS_GStreamSet_create: The number of streams should be 2.\n"); + if (HTS_PStreamSet_get_static_length(pss, 1) != 1) + HTS_error(1, + "HTS_GStreamSet_create: The size of lf0 static vector should be 1.\n"); + + /* synthesize speech waveform */ +#ifdef HTS_EMBEDDED + HTS_Vocoder_initialize(&v, HTS_PStreamSet_get_static_length(pss, 0) - 1, + stage, use_log_gain, sampling_rate, fperiod, + audio_buff_size); + for (i = 0, msd_frame = 0; i < gss->total_frame && (*stop) == FALSE; i++) { + lf0 = LZERO; + if (HTS_PStreamSet_get_msd_flag(pss, 1, i)) + lf0 = HTS_PStreamSet_get_parameter(pss, 1, msd_frame++, 0); + HTS_Vocoder_synthesize(&v, HTS_PStreamSet_get_static_length(pss, 0) - 1, + lf0, HTS_PStreamSet_get_parameter_vector(pss, 0, + i), alpha, + beta, volume, &gss->gspeech[i * fperiod]); + } +#else + HTS_Vocoder_initialize(&v, gss->gstream[0].static_length - 1, stage, + use_log_gain, sampling_rate, fperiod, + audio_buff_size); + for (i = 0; i < gss->total_frame && (*stop) == FALSE; i++) { + HTS_Vocoder_synthesize(&v, gss->gstream[0].static_length - 1, + gss->gstream[1].par[i][0], + &gss->gstream[0].par[i][0], alpha, beta, volume, + &gss->gspeech[i * fperiod]); + } +#endif /* HTS_EMBEDDED */ + HTS_Vocoder_clear(&v); +} + +/* HTS_GStreamSet_get_total_nsample: get total number of sample */ +int HTS_GStreamSet_get_total_nsample(HTS_GStreamSet * gss) +{ + return gss->total_nsample; +} + +/* HTS_GStreamSet_get_total_frame: get total number of frame */ +int HTS_GStreamSet_get_total_frame(HTS_GStreamSet * gss) +{ + return gss->total_frame; +} + +#ifndef HTS_EMBEDDED +/* HTS_GStreamSet_get_static_length: get static features length */ +int HTS_GStreamSet_get_static_length(HTS_GStreamSet * gss, int stream_index) +{ + return gss->gstream[stream_index].static_length; +} +#endif /* !HTS_EMBEDDED */ + +/* HTS_GStreamSet_get_speech: get synthesized speech parameter */ +short HTS_GStreamSet_get_speech(HTS_GStreamSet * gss, int sample_index) +{ + return gss->gspeech[sample_index]; +} + +#ifndef HTS_EMBEDDED +/* HTS_GStreamSet_get_parameter: get generated parameter */ +double HTS_GStreamSet_get_parameter(HTS_GStreamSet * gss, int stream_index, + int frame_index, int vector_index) +{ + return gss->gstream[stream_index].par[frame_index][vector_index]; +} +#endif /* !HTS_EMBEDDED */ + +/* HTS_GStreamSet_clear: free generated parameter stream set */ +void HTS_GStreamSet_clear(HTS_GStreamSet * gss) +{ + int i, j; + +#ifndef HTS_EMBEDDED + if (gss->gstream) { + for (i = 0; i < gss->nstream; i++) { + for (j = 0; j < gss->total_frame; j++) + HTS_free(gss->gstream[i].par[j]); + HTS_free(gss->gstream[i].par); + } + HTS_free(gss->gstream); + } +#endif /* !HTS_EMBEDDED */ + if (gss->gspeech) + HTS_free(gss->gspeech); + HTS_GStreamSet_initialize(gss); +} + +HTS_GSTREAM_C_END; + +#endif /* !HTS_GSTREAM_C */ diff --git a/src/modules/hts_engine/HTS_hidden.h b/src/modules/hts_engine/HTS_hidden.h new file mode 100644 index 0000000..43e73a0 --- /dev/null +++ b/src/modules/hts_engine/HTS_hidden.h @@ -0,0 +1,173 @@ +/* ----------------------------------------------------------------- */ +/* The HMM-Based Speech Synthesis System (HTS) */ +/* hts_engine API developed by HTS Working Group */ +/* http://hts-engine.sourceforge.net/ */ +/* ----------------------------------------------------------------- */ +/* */ +/* Copyright (c) 2001-2010 Nagoya Institute of Technology */ +/* Department of Computer Science */ +/* */ +/* 2001-2008 Tokyo Institute of Technology */ +/* Interdisciplinary Graduate School of */ +/* Science and Engineering */ +/* */ +/* All rights reserved. */ +/* */ +/* Redistribution and use in source and binary forms, with or */ +/* without modification, are permitted provided that the following */ +/* conditions are met: */ +/* */ +/* - Redistributions of source code must retain the above copyright */ +/* notice, this list of conditions and the following disclaimer. */ +/* - Redistributions in binary form must reproduce the above */ +/* copyright notice, this list of conditions and the following */ +/* disclaimer in the documentation and/or other materials provided */ +/* with the distribution. */ +/* - Neither the name of the HTS working group nor the names of its */ +/* contributors may be used to endorse or promote products derived */ +/* from this software without specific prior written permission. */ +/* */ +/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND */ +/* CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, */ +/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */ +/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */ +/* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS */ +/* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, */ +/* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED */ +/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, */ +/* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON */ +/* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, */ +/* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY */ +/* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */ +/* POSSIBILITY OF SUCH DAMAGE. */ +/* ----------------------------------------------------------------- */ + +#ifndef HTS_HIDDEN_H +#define HTS_HIDDEN_H + +#ifdef __cplusplus +#define HTS_HIDDEN_H_START extern "C" { +#define HTS_HIDDEN_H_END } +#else +#define HTS_HIDDEN_H_START +#define HTS_HIDDEN_H_END +#endif /* __CPLUSPLUS */ + +HTS_HIDDEN_H_START; + +/* hts_engine libraries */ +#include "HTS_engine.h" + +/* -------------------------- misc ------------------------------- */ + +#if !defined(WORDS_BIGENDIAN) && !defined(WORDS_LITTLEENDIAN) +#define WORDS_LITTLEENDIAN +#endif /* !WORDS_BIGENDIAN && !WORDS_LITTLEENDIAN */ +#if defined(WORDS_BIGENDIAN) && defined(WORDS_LITTLEENDIAN) +#undef WORDS_BIGENDIAN +#endif /* WORDS_BIGENDIAN && WORDS_LITTLEENDIAN */ + +#define HTS_MAXBUFLEN 1024 + +/* HTS_error: output error message */ +void HTS_error(const int error, char *message, ...); + +/* HTS_get_fp: wrapper for fopen */ +FILE *HTS_get_fp(const char *name, const char *opt); + +/* HTS_get_pattern_token: get pattern token */ +void HTS_get_pattern_token(FILE * fp, char *buff); + +/* HTS_get_token: get token (separator are space,tab,line break) */ +HTS_Boolean HTS_get_token(FILE * fp, char *buff); + +/* HTS_get_token_from_string: get token from string (separator are space,tab,line break) */ +HTS_Boolean HTS_get_token_from_string(char *string, int *index, char *buff); + +/* HTS_fwrite_little_endian: fwrite with byteswap */ +int HTS_fwrite_little_endian(void *p, const int size, const int num, FILE * fp); + +/* HTS_fread_big_endiana: fread with byteswap */ +int HTS_fread_big_endian(void *p, const int size, const int num, FILE * fp); + +/* HTS_calloc: wrapper for calloc */ +char *HTS_calloc(const size_t num, const size_t size); + +/* HTS_strdup: wrapper for strdup */ +char *HTS_strdup(const char *string); + +/* HTS_calloc_matrix: allocate double matrix */ +double **HTS_alloc_matrix(const int x, const int y); + +/* HTS_free_matrix: free double matrix */ +void HTS_free_matrix(double **p, const int x); + +/* HTS_Free: wrapper for free */ +void HTS_free(void *p); + +/* -------------------------- pstream ---------------------------- */ + +/* check variance in finv() */ +#define INFTY ((double) 1.0e+38) +#define INFTY2 ((double) 1.0e+19) +#define INVINF ((double) 1.0e-38) +#define INVINF2 ((double) 1.0e-19) + +/* GV */ +#define STEPINIT 0.1 +#define STEPDEC 0.5 +#define STEPINC 1.2 +#define W1 1.0 +#define W2 1.0 +#define GV_MAX_ITERATION 5 + +/* -------------------------- audio ------------------------------ */ + +/* HTS_Audio_open: open audio device */ +void HTS_Audio_open(HTS_Audio * as, int sampling_rate, int max_buff_size); + +/* HTS_Audio_write: send data to audio device */ +void HTS_Audio_write(HTS_Audio * as, short data); + +/* HTS_Audio_close: close audio device */ +void HTS_Audio_close(HTS_Audio * as); + +/* -------------------------- vocoder ---------------------------- */ + +#ifndef PI +#define PI 3.14159265358979323846 +#endif /* !PI */ +#ifndef PI2 +#define PI2 6.28318530717958647692 +#endif /* !PI2 */ + +#define RANDMAX 32767 + +#define IPERIOD 1 +#define SEED 1 +#define B0 0x00000001 +#define B28 0x10000000 +#define B31 0x80000000 +#define B31_ 0x7fffffff +#define Z 0x00000000 + +#ifdef HTS_EMBEDDED +#define GAUSS FALSE +#define PADEORDER 4 /* pade order (for MLSA filter) */ +#define IRLENG 64 /* length of impulse response */ +#else +#define GAUSS TRUE +#define PADEORDER 5 +#define IRLENG 96 +#endif /* HTS_EMBEDDED */ + +/* for MGLSA filter */ +#define NORMFLG1 TRUE +#define NORMFLG2 FALSE +#define MULGFLG1 TRUE +#define MULGFLG2 FALSE +#define NGAIN FALSE + +HTS_HIDDEN_H_END; + +#endif /* !HTS_HIDDEN_H */ diff --git a/src/modules/hts_engine/HTS_label.c b/src/modules/hts_engine/HTS_label.c new file mode 100644 index 0000000..969d5a8 --- /dev/null +++ b/src/modules/hts_engine/HTS_label.c @@ -0,0 +1,330 @@ +/* ----------------------------------------------------------------- */ +/* The HMM-Based Speech Synthesis System (HTS) */ +/* hts_engine API developed by HTS Working Group */ +/* http://hts-engine.sourceforge.net/ */ +/* ----------------------------------------------------------------- */ +/* */ +/* Copyright (c) 2001-2010 Nagoya Institute of Technology */ +/* Department of Computer Science */ +/* */ +/* 2001-2008 Tokyo Institute of Technology */ +/* Interdisciplinary Graduate School of */ +/* Science and Engineering */ +/* */ +/* All rights reserved. */ +/* */ +/* Redistribution and use in source and binary forms, with or */ +/* without modification, are permitted provided that the following */ +/* conditions are met: */ +/* */ +/* - Redistributions of source code must retain the above copyright */ +/* notice, this list of conditions and the following disclaimer. */ +/* - Redistributions in binary form must reproduce the above */ +/* copyright notice, this list of conditions and the following */ +/* disclaimer in the documentation and/or other materials provided */ +/* with the distribution. */ +/* - Neither the name of the HTS working group nor the names of its */ +/* contributors may be used to endorse or promote products derived */ +/* from this software without specific prior written permission. */ +/* */ +/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND */ +/* CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, */ +/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */ +/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */ +/* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS */ +/* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, */ +/* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED */ +/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, */ +/* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON */ +/* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, */ +/* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY */ +/* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */ +/* POSSIBILITY OF SUCH DAMAGE. */ +/* ----------------------------------------------------------------- */ + +#ifndef HTS_LABEL_C +#define HTS_LABEL_C + +#ifdef __cplusplus +#define HTS_LABEL_C_START extern "C" { +#define HTS_LABEL_C_END } +#else +#define HTS_LABEL_C_START +#define HTS_LABEL_C_END +#endif /* __CPLUSPLUS */ + +HTS_LABEL_C_START; + +#include /* for atof() */ +#include /* for isgraph(),isdigit() */ + +/* hts_engine libraries */ +#include "HTS_hidden.h" + +static HTS_Boolean isdigit_string(char *str) +{ + int i; + + if (sscanf(str, "%d", &i) == 1) + return TRUE; + else + return FALSE; +} + +/* HTS_Label_initialize: initialize label */ +void HTS_Label_initialize(HTS_Label * label) +{ + label->head = NULL; + label->size = 0; + label->frame_flag = FALSE; + label->speech_speed = 1.0; +} + +/* HTS_Label_check_time: check label */ +static void HTS_Label_check_time(HTS_Label * label) +{ + HTS_LabelString *lstring = label->head; + HTS_LabelString *next = NULL; + + if (lstring) + lstring->start = 0.0; + while (lstring) { + next = lstring->next; + if (!next) + break; + if (lstring->end < 0.0 && next->start >= 0.0) + lstring->end = next->start; + else if (lstring->end >= 0.0 && next->start < 0.0) + next->start = lstring->end; + if (lstring->start < 0.0) + lstring->start = -1.0; + if (lstring->end < 0.0) + lstring->end = -1.0; + lstring = next; + } +} + +/* HTS_Label_load_from_fn: load label from file name */ +void HTS_Label_load_from_fn(HTS_Label * label, int sampling_rate, int fperiod, + char *fn) +{ + FILE *fp = HTS_get_fp(fn, "r"); + HTS_Label_load_from_fp(label, sampling_rate, fperiod, fp); + fclose(fp); +} + +/* HTS_Label_load_from_fp: load label from file pointer */ +void HTS_Label_load_from_fp(HTS_Label * label, int sampling_rate, int fperiod, + FILE * fp) +{ + char buff[HTS_MAXBUFLEN]; + HTS_LabelString *lstring = NULL; + double start, end; + const double rate = (double) sampling_rate / ((double) fperiod * 1e+7); + + if (label->head || label->size != 0) + HTS_error(1, "HTS_Label_load_from_fp: label is not initialized.\n"); + /* parse label file */ + while (HTS_get_token(fp, buff)) { + if (!isgraph((int) buff[0])) + break; + label->size++; + + if (lstring) { + lstring->next = + (HTS_LabelString *) HTS_calloc(1, sizeof(HTS_LabelString)); + lstring = lstring->next; + } else { /* first time */ + lstring = (HTS_LabelString *) HTS_calloc(1, sizeof(HTS_LabelString)); + label->head = lstring; + } + if (isdigit_string(buff)) { /* has frame infomation */ + start = atof(buff); + HTS_get_token(fp, buff); + end = atof(buff); + HTS_get_token(fp, buff); + lstring->start = rate * start; + lstring->end = rate * end; + } else { + lstring->start = -1.0; + lstring->end = -1.0; + } + lstring->next = NULL; + lstring->name = HTS_strdup(buff); + } + HTS_Label_check_time(label); +} + +/* HTS_Label_load_from_string: load label from string */ +void HTS_Label_load_from_string(HTS_Label * label, int sampling_rate, + int fperiod, char *data) +{ + char buff[HTS_MAXBUFLEN]; + HTS_LabelString *lstring = NULL; + int data_index = 0; /* data index */ + double start, end; + const double rate = (double) sampling_rate / ((double) fperiod * 1e+7); + + if (label->head || label->size != 0) + HTS_error(1, "HTS_Label_load_from_fp: label list is not initialized.\n"); + /* copy label */ + while (HTS_get_token_from_string(data, &data_index, buff)) { + if (!isgraph((int) buff[0])) + break; + label->size++; + + if (lstring) { + lstring->next = + (HTS_LabelString *) HTS_calloc(1, sizeof(HTS_LabelString)); + lstring = lstring->next; + } else { /* first time */ + lstring = (HTS_LabelString *) HTS_calloc(1, sizeof(HTS_LabelString)); + label->head = lstring; + } + if (isdigit_string(buff)) { /* has frame infomation */ + start = atof(buff); + HTS_get_token_from_string(data, &data_index, buff); + end = atof(buff); + HTS_get_token_from_string(data, &data_index, buff); + lstring->start = rate * start; + lstring->end = rate * end; + } else { + lstring->start = -1.0; + lstring->end = -1.0; + } + lstring->next = NULL; + lstring->name = HTS_strdup(buff); + } + HTS_Label_check_time(label); +} + +/* HTS_Label_load_from_string_list: load label from string list */ +void HTS_Label_load_from_string_list(HTS_Label * label, int sampling_rate, + int fperiod, char **data, int size) +{ + char buff[HTS_MAXBUFLEN]; + HTS_LabelString *lstring = NULL; + int i; + int data_index; + double start, end; + const double rate = (double) sampling_rate / ((double) fperiod * 1e+7); + + if (label->head || label->size != 0) + HTS_error(1, "HTS_Label_load_from_fp: label list is not initialized.\n"); + /* copy label */ + for (i = 0; i < size; i++) { + if (!isgraph((int) data[i][0])) + break; + label->size++; + + if (lstring) { + lstring->next = + (HTS_LabelString *) HTS_calloc(1, sizeof(HTS_LabelString)); + lstring = lstring->next; + } else { /* first time */ + lstring = (HTS_LabelString *) HTS_calloc(1, sizeof(HTS_LabelString)); + label->head = lstring; + } + data_index = 0; + if (isdigit_string(data[i])) { /* has frame infomation */ + HTS_get_token_from_string(data[i], &data_index, buff); + start = atof(buff); + HTS_get_token_from_string(data[i], &data_index, buff); + end = atof(buff); + HTS_get_token_from_string(data[i], &data_index, buff); + lstring->name = HTS_strdup(&buff[data_index]); + lstring->start = rate * start; + lstring->end = rate * end; + } else { + lstring->start = -1.0; + lstring->end = -1.0; + lstring->name = HTS_strdup(data[i]); + } + lstring->next = NULL; + } + HTS_Label_check_time(label); +} + +/* HTS_Label_set_frame_specified_flag: set frame specified flag */ +void HTS_Label_set_frame_specified_flag(HTS_Label * label, HTS_Boolean i) +{ + label->frame_flag = i; +} + +/* HTS_Label_set_speech_speed: set speech speed rate */ +void HTS_Label_set_speech_speed(HTS_Label * label, double f) +{ + if (f > 0.0 && f <= 10.0) + label->speech_speed = f; +} + +/* HTS_Label_get_size: get number of label string */ +int HTS_Label_get_size(HTS_Label * label) +{ + return label->size; +} + +/* HTS_Label_get_string: get label string */ +char *HTS_Label_get_string(HTS_Label * label, int string_index) +{ + HTS_LabelString *lstring = label->head; + + while (string_index-- && lstring) + lstring = lstring->next; + if (!lstring) + return NULL; + return lstring->name; +} + +/* HTS_Label_get_frame_specified_flag: get frame specified flag */ +HTS_Boolean HTS_Label_get_frame_specified_flag(HTS_Label * label) +{ + return label->frame_flag; +} + +/* HTS_Label_get_start_frame: get start frame */ +double HTS_Label_get_start_frame(HTS_Label * label, int string_index) +{ + HTS_LabelString *lstring = label->head; + + while (string_index-- && lstring) + lstring = lstring->next; + if (!lstring) + return -1.0; + return lstring->start; +} + +/* HTS_Label_get_end_frame: get end frame */ +double HTS_Label_get_end_frame(HTS_Label * label, int string_index) +{ + HTS_LabelString *lstring = label->head; + + while (string_index-- && lstring) + lstring = lstring->next; + if (!lstring) + return -1.0; + return lstring->end; +} + +/* HTS_Label_get_speech_speed: get speech speed rate */ +double HTS_Label_get_speech_speed(HTS_Label * label) +{ + return label->speech_speed; +} + +/* HTS_Label_clear: free label */ +void HTS_Label_clear(HTS_Label * label) +{ + HTS_LabelString *lstring, *next_lstring; + + for (lstring = label->head; lstring; lstring = next_lstring) { + next_lstring = lstring->next; + HTS_free(lstring->name); + HTS_free(lstring); + } + HTS_Label_initialize(label); +} + +HTS_LABEL_C_END; + +#endif /* !HTS_LABEL_C */ diff --git a/src/modules/hts_engine/HTS_misc.c b/src/modules/hts_engine/HTS_misc.c new file mode 100644 index 0000000..c927c99 --- /dev/null +++ b/src/modules/hts_engine/HTS_misc.c @@ -0,0 +1,306 @@ +/* ----------------------------------------------------------------- */ +/* The HMM-Based Speech Synthesis System (HTS) */ +/* hts_engine API developed by HTS Working Group */ +/* http://hts-engine.sourceforge.net/ */ +/* ----------------------------------------------------------------- */ +/* */ +/* Copyright (c) 2001-2010 Nagoya Institute of Technology */ +/* Department of Computer Science */ +/* */ +/* 2001-2008 Tokyo Institute of Technology */ +/* Interdisciplinary Graduate School of */ +/* Science and Engineering */ +/* */ +/* All rights reserved. */ +/* */ +/* Redistribution and use in source and binary forms, with or */ +/* without modification, are permitted provided that the following */ +/* conditions are met: */ +/* */ +/* - Redistributions of source code must retain the above copyright */ +/* notice, this list of conditions and the following disclaimer. */ +/* - Redistributions in binary form must reproduce the above */ +/* copyright notice, this list of conditions and the following */ +/* disclaimer in the documentation and/or other materials provided */ +/* with the distribution. */ +/* - Neither the name of the HTS working group nor the names of its */ +/* contributors may be used to endorse or promote products derived */ +/* from this software without specific prior written permission. */ +/* */ +/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND */ +/* CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, */ +/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */ +/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */ +/* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS */ +/* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, */ +/* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED */ +/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, */ +/* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON */ +/* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, */ +/* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY */ +/* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */ +/* POSSIBILITY OF SUCH DAMAGE. */ +/* ----------------------------------------------------------------- */ + +#ifndef HTS_MISC_C +#define HTS_MISC_C + +#ifdef __cplusplus +#define HTS_MISC_C_START extern "C" { +#define HTS_MISC_C_END } +#else +#define HTS_MISC_C_START +#define HTS_MISC_C_END +#endif /* __CPLUSPLUS */ + +HTS_MISC_C_START; + +#include /* for exit(),calloc(),free() */ +#include /* for va_list */ +#include /* for strcpy(),strlen() */ + +/* hts_engine libraries */ +#include "HTS_hidden.h" + +#ifdef FESTIVAL +#include "EST_walloc.h" +#endif /* FESTIVAL */ + +/* HTS_byte_swap: byte swap */ +static int HTS_byte_swap(void *p, const int size, const int block) +{ + char *q, tmp; + int i, j; + + q = (char *) p; + + for (i = 0; i < block; i++) { + for (j = 0; j < (size / 2); j++) { + tmp = *(q + j); + *(q + j) = *(q + (size - 1 - j)); + *(q + (size - 1 - j)) = tmp; + } + q += size; + } + + return i; +} + +/* HTS_error: output error message */ +void HTS_error(const int error, char *message, ...) +{ + va_list arg; + + fflush(stdout); + fflush(stderr); + + if (error > 0) + fprintf(stderr, "\nError: "); + else + fprintf(stderr, "\nWarning: "); + + va_start(arg, message); + vfprintf(stderr, message, arg); + va_end(arg); + + fflush(stderr); + + if (error > 0) + exit(error); +} + +/* HTS_get_fp: wrapper for fopen */ +FILE *HTS_get_fp(const char *name, const char *opt) +{ + FILE *fp = fopen(name, opt); + + if (fp == NULL) + HTS_error(2, "HTS_get_fp: Cannot open %s.\n", name); + + return (fp); +} + +/* HTS_get_pattern_token: get pattern token */ +void HTS_get_pattern_token(FILE * fp, char *buff) +{ + char c; + int i; + HTS_Boolean squote = FALSE, dquote = FALSE; + + c = fgetc(fp); + + while (c == ' ' || c == '\n') + c = fgetc(fp); + + if (c == '\'') { /* single quote case */ + c = fgetc(fp); + squote = TRUE; + } + + if (c == '\"') { /*double quote case */ + c = fgetc(fp); + dquote = TRUE; + } + + if (c == ',') { /*special character ',' */ + strcpy(buff, ","); + return; + } + + i = 0; + while (1) { + buff[i++] = c; + c = fgetc(fp); + if (squote && c == '\'') + break; + if (dquote && c == '\"') + break; + if (!squote && !dquote) { + if (c == ' ') + break; + if (c == '\n') + break; + if (feof(fp)) + break; + } + } + + buff[i] = '\0'; +} + +/* HTS_get_token: get token (separator are space,tab,line break) */ +HTS_Boolean HTS_get_token(FILE * fp, char *buff) +{ + char c; + int i; + + if (feof(fp)) + return FALSE; + c = fgetc(fp); + while (c == ' ' || c == '\n' || c == '\t') { + if (feof(fp)) + return FALSE; + c = getc(fp); + } + + for (i = 0; c != ' ' && c != '\n' && c != '\t' && !feof(fp); i++) { + buff[i] = c; + c = fgetc(fp); + } + + buff[i] = '\0'; + return TRUE; +} + +/* HTS_get_token_from_string: get token from string (separator are space,tab,line break) */ +HTS_Boolean HTS_get_token_from_string(char *string, int *index, char *buff) +{ + char c; + int i; + + c = string[(*index)]; + if (c == '\0') + return FALSE; + c = string[(*index)++]; + if (c == '\0') + return FALSE; + while (c == ' ' || c == '\n' || c == '\t') { + if (c == '\0') + return FALSE; + c = string[(*index)++]; + } + for (i = 0; c != ' ' && c != '\n' && c != '\t' && c != '\0'; i++) { + buff[i] = c; + c = string[(*index)++]; + } + + buff[i] = '\0'; + return TRUE; +} + +/* HTS_fread_big_endian: fread with byteswap */ +int HTS_fread_big_endian(void *p, const int size, const int num, FILE * fp) +{ + const int block = fread(p, size, num, fp); + +#ifdef WORDS_LITTLEENDIAN + HTS_byte_swap(p, size, block); +#endif /* WORDS_LITTLEENDIAN */ + + return block; +} + +/* HTS_fwrite_little_endian: fwrite with byteswap */ +int HTS_fwrite_little_endian(void *p, const int size, const int num, FILE * fp) +{ + const int block = num * size; + +#ifdef WORDS_BIGENDIAN + HTS_byte_swap(p, size, block); +#endif /* WORDS_BIGENDIAN */ + fwrite(p, size, num, fp); + + return block; +} + +/* HTS_calloc: wrapper for calloc */ +char *HTS_calloc(const size_t num, const size_t size) +{ +#ifdef FESTIVAL + char *mem = (char *) safe_wcalloc(num * size); +#else + char *mem = (char *) calloc(num, size); +#endif /* FESTIVAL */ + + if (mem == NULL) + HTS_error(1, "HTS_calloc: Cannot allocate memory.\n"); + + return mem; +} + +/* HTS_Free: wrapper for free */ +void HTS_free(void *ptr) +{ +#ifdef FESTIVAL + wfree(ptr); +#else + free(ptr); +#endif /* FESTIVAL */ +} + +/* HTS_strdup: wrapper for strdup */ +char *HTS_strdup(const char *string) +{ +#ifdef FESTIVAL + return (wstrdup(string)); +#else + char *buff = (char *) HTS_calloc(strlen(string) + 1, sizeof(char)); + strcpy(buff, string); + return buff; +#endif /* FESTIVAL */ +} + +/* HTS_alloc_matrix: allocate double matrix */ +double **HTS_alloc_matrix(const int x, const int y) +{ + int i; + double **p = (double **) HTS_calloc(x, sizeof(double *)); + + for (i = 0; i < x; i++) + p[i] = (double *) HTS_calloc(y, sizeof(double)); + return p; +} + +/* HTS_free_matrix: free double matrix */ +void HTS_free_matrix(double **p, const int x) +{ + int i; + + for (i = x - 1; i >= 0; i--) + HTS_free(p[i]); + HTS_free(p); +} + +HTS_MISC_C_END; + +#endif /* !HTS_MISC_C */ diff --git a/src/modules/hts_engine/HTS_model.c b/src/modules/hts_engine/HTS_model.c new file mode 100644 index 0000000..4c4a734 --- /dev/null +++ b/src/modules/hts_engine/HTS_model.c @@ -0,0 +1,1157 @@ +/* ----------------------------------------------------------------- */ +/* The HMM-Based Speech Synthesis System (HTS) */ +/* hts_engine API developed by HTS Working Group */ +/* http://hts-engine.sourceforge.net/ */ +/* ----------------------------------------------------------------- */ +/* */ +/* Copyright (c) 2001-2010 Nagoya Institute of Technology */ +/* Department of Computer Science */ +/* */ +/* 2001-2008 Tokyo Institute of Technology */ +/* Interdisciplinary Graduate School of */ +/* Science and Engineering */ +/* */ +/* All rights reserved. */ +/* */ +/* Redistribution and use in source and binary forms, with or */ +/* without modification, are permitted provided that the following */ +/* conditions are met: */ +/* */ +/* - Redistributions of source code must retain the above copyright */ +/* notice, this list of conditions and the following disclaimer. */ +/* - Redistributions in binary form must reproduce the above */ +/* copyright notice, this list of conditions and the following */ +/* disclaimer in the documentation and/or other materials provided */ +/* with the distribution. */ +/* - Neither the name of the HTS working group nor the names of its */ +/* contributors may be used to endorse or promote products derived */ +/* from this software without specific prior written permission. */ +/* */ +/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND */ +/* CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, */ +/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */ +/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */ +/* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS */ +/* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, */ +/* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED */ +/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, */ +/* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON */ +/* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, */ +/* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY */ +/* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */ +/* POSSIBILITY OF SUCH DAMAGE. */ +/* ----------------------------------------------------------------- */ + +#ifndef HTS_MODEL_C +#define HTS_MODEL_C + +#ifdef __cplusplus +#define HTS_MODEL_C_START extern "C" { +#define HTS_MODEL_C_END } +#else +#define HTS_MODEL_C_START +#define HTS_MODEL_C_END +#endif /* __CPLUSPLUS */ + +HTS_MODEL_C_START; + +#include /* for atoi(),abs() */ +#include /* for strlen(),strstr(),strrchr(),strcmp() */ +#include /* for isdigit() */ + +/* hts_engine libraries */ +#include "HTS_hidden.h" + +/* HTS_dp_match: recursive matching */ +static HTS_Boolean HTS_dp_match(const char *string, const char *pattern, + const int pos, const int max) +{ + if (pos > max) + return FALSE; + if (string[0] == '\0' && pattern[0] == '\0') + return TRUE; + if (pattern[0] == '*') { + if (HTS_dp_match(string + 1, pattern, pos + 1, max) == 1) + return TRUE; + else + return HTS_dp_match(string, pattern + 1, pos, max); + } + if (string[0] == pattern[0] || pattern[0] == '?') { + if (HTS_dp_match(string + 1, pattern + 1, pos + 1, max + 1) == 1) + return TRUE; + } + + return FALSE; +} + +/* HTS_pattern_match: pattern matching function */ +static HTS_Boolean HTS_pattern_match(const char *string, const char *pattern) +{ + int i, j; + int buff_length, max = 0, nstar = 0, nquestion = 0; + char buff[HTS_MAXBUFLEN]; + const int pattern_length = strlen(pattern); + + for (i = 0; i < pattern_length; i++) { + switch (pattern[i]) { + case '*': + nstar++; + break; + case '?': + nquestion++; + max++; + break; + default: + max++; + } + } + if (nstar == 2 && nquestion == 0 && pattern[0] == '*' + && pattern[i - 1] == '*') { + /* only string matching is required */ + buff_length = i - 2; + for (i = 0, j = 1; i < buff_length; i++, j++) + buff[i] = pattern[j]; + buff[buff_length] = '\0'; + if (strstr(string, buff) != NULL) + return TRUE; + else + return FALSE; + } else + return HTS_dp_match(string, pattern, 0, (int) (strlen(string) - max)); +} + +/* HTS_is_num: check given buffer is number or not */ +static HTS_Boolean HTS_is_num(const char *buff) +{ + int i; + const int length = (int) strlen(buff); + + for (i = 0; i < length; i++) + if (!(isdigit((int) buff[i]) || (buff[i] == '-'))) + return FALSE; + + return TRUE; +} + +/* HTS_name2num: convert name of node to number */ +static int HTS_name2num(const char *buff) +{ + int i; + + for (i = strlen(buff) - 1; '0' <= buff[i] && buff[i] <= '9' && i >= 0; i--); + i++; + + return atoi(&buff[i]); +} + +/* HTS_get_state_num: return the number of state */ +static int HTS_get_state_num(char *string) +{ + char *left, *right; + + if (((left = strchr(string, '[')) == NULL) + || ((right = strrchr(string, ']')) == NULL)) + return 0; + *right = '\0'; + string = left + 1; + + return atoi(string); +} + +/* HTS_Question_load: Load questions from file */ +static void HTS_Question_load(HTS_Question * question, FILE * fp) +{ + char buff[HTS_MAXBUFLEN]; + HTS_Pattern *pattern, *last_pattern; + + /* get question name */ + HTS_get_pattern_token(fp, buff); + question->string = HTS_strdup(buff); + question->head = NULL; + /* get pattern list */ + HTS_get_pattern_token(fp, buff); + last_pattern = NULL; + if (strcmp(buff, "{") == 0) { + while (1) { + HTS_get_pattern_token(fp, buff); + pattern = (HTS_Pattern *) HTS_calloc(1, sizeof(HTS_Pattern)); + if (question->head) + last_pattern->next = pattern; + else /* first time */ + question->head = pattern; + pattern->string = HTS_strdup(buff); + pattern->next = NULL; + HTS_get_pattern_token(fp, buff); + if (!strcmp(buff, "}")) + break; + last_pattern = pattern; + } + } +} + +/* HTS_Question_match: check given string match given question */ +static HTS_Boolean HTS_Question_match(const HTS_Question * question, + const char *string) +{ + HTS_Pattern *pattern; + + for (pattern = question->head; pattern; pattern = pattern->next) + if (HTS_pattern_match(string, pattern->string)) + return TRUE; + + return FALSE; +} + +/* HTS_Question_find_question: find question from question list */ +static HTS_Question *HTS_Question_find_question(HTS_Question * question, + const char *buff) +{ + + for (; question; question = question->next) + if (strcmp(buff, question->string) == 0) + return question; + + HTS_error(1, "HTS_Question_find_question: Cannot find question %s.\n", buff); + return NULL; /* make compiler happy */ +} + +/* HTS_Question_clear: clear loaded question */ +static void HTS_Question_clear(HTS_Question * question) +{ + HTS_Pattern *pattern, *next_pattern; + + HTS_free(question->string); + for (pattern = question->head; pattern; pattern = next_pattern) { + next_pattern = pattern->next; + HTS_free(pattern->string); + HTS_free(pattern); + } +} + +/* HTS_Node_find: find node for given number */ +static HTS_Node *HTS_Node_find(HTS_Node * node, const int num) +{ + for (; node; node = node->next) + if (node->index == num) + return node; + + HTS_error(1, "HTS_Node_find: Cannot find node %d.\n", num); + return NULL; /* make compiler happy */ +} + +/* HTS_Node_clear: recursive function to free Node */ +static void HTS_Node_clear(HTS_Node * node) +{ + if (node->yes != NULL) + HTS_Node_clear(node->yes); + if (node->no != NULL) + HTS_Node_clear(node->no); + HTS_free(node); +} + +/* HTS_Tree_parse_pattern: parse pattern specified for each tree */ +static void HTS_Tree_parse_pattern(HTS_Tree * tree, char *string) +{ + char *left, *right; + HTS_Pattern *pattern, *last_pattern; + + tree->head = NULL; + last_pattern = NULL; + /* parse tree pattern */ + if ((left = strchr(string, '{')) != NULL) { /* pattern is specified */ + string = left + 1; + if (*string == '(') + ++string; + + right = strrchr(string, '}'); + if (string < right && *(right - 1) == ')') + --right; + *right = ','; + + /* parse pattern */ + while ((left = strchr(string, ',')) != NULL) { + pattern = (HTS_Pattern *) HTS_calloc(1, sizeof(HTS_Pattern)); + if (tree->head) { + last_pattern->next = pattern; + } else { + tree->head = pattern; + } + *left = '\0'; + pattern->string = HTS_strdup(string); + string = left + 1; + pattern->next = NULL; + last_pattern = pattern; + } + } +} + +/* HTS_Tree_load: Load trees */ +static void HTS_Tree_load(HTS_Tree * tree, FILE * fp, HTS_Question * question) +{ + char buff[HTS_MAXBUFLEN]; + HTS_Node *node, *last_node; + + HTS_get_pattern_token(fp, buff); + node = (HTS_Node *) HTS_calloc(1, sizeof(HTS_Node)); + node->index = 0; + tree->root = last_node = node; + + if (strcmp(buff, "{") == 0) { + while (HTS_get_pattern_token(fp, buff), strcmp(buff, "}") != 0) { + node = HTS_Node_find(last_node, atoi(buff)); + HTS_get_pattern_token(fp, buff); /* load question at this node */ + + node->quest = HTS_Question_find_question(question, buff); + node->yes = (HTS_Node *) HTS_calloc(1, sizeof(HTS_Node)); + node->no = (HTS_Node *) HTS_calloc(1, sizeof(HTS_Node)); + + HTS_get_pattern_token(fp, buff); + if (HTS_is_num(buff)) + node->no->index = atoi(buff); + else + node->no->pdf = HTS_name2num(buff); + node->no->next = last_node; + last_node = node->no; + + HTS_get_pattern_token(fp, buff); + if (HTS_is_num(buff)) + node->yes->index = atoi(buff); + else + node->yes->pdf = HTS_name2num(buff); + node->yes->next = last_node; + last_node = node->yes; + } + } else { + node->pdf = HTS_name2num(buff); + } +} + +/* HTS_Node_search: tree search */ +static int HTS_Tree_search_node(HTS_Tree * tree, const char *string) +{ + HTS_Node *node = tree->root; + + while (node != NULL) { + if (node->quest == NULL) + return node->pdf; + if (HTS_Question_match(node->quest, string)) { + if (node->yes->pdf > 0) + return node->yes->pdf; + node = node->yes; + } else { + if (node->no->pdf > 0) + return node->no->pdf; + node = node->no; + } + } + + HTS_error(1, "HTS_Tree_search_node: Cannot find node.\n"); + return -1; /* make compiler happy */ +} + +/* HTS_Tree_clear: clear given tree */ +static void HTS_Tree_clear(HTS_Tree * tree) +{ + HTS_Pattern *pattern, *next_pattern; + + for (pattern = tree->head; pattern; pattern = next_pattern) { + next_pattern = pattern->next; + HTS_free(pattern->string); + HTS_free(pattern); + } + + HTS_Node_clear(tree->root); +} + +/* HTS_Window_initialize: initialize dynamic window */ +static void HTS_Window_initialize(HTS_Window * win) +{ + win->size = 0; + win->l_width = NULL; + win->r_width = NULL; + win->coefficient = NULL; + win->max_width = 0; +} + +/* HTS_Window_load: load dynamic windows */ +static void HTS_Window_load(HTS_Window * win, FILE ** fp, int size) +{ + int i, j; + int fsize, length; + char buff[HTS_MAXBUFLEN]; + + win->size = size; + win->l_width = (int *) HTS_calloc(win->size, sizeof(int)); + win->r_width = (int *) HTS_calloc(win->size, sizeof(int)); + win->coefficient = (double **) HTS_calloc(win->size, sizeof(double *)); + /* set delta coefficents */ + for (i = 0; i < win->size; i++) { + HTS_get_token(fp[i], buff); + fsize = atoi(buff); + /* read coefficients */ + win->coefficient[i] = (double *) HTS_calloc(fsize, sizeof(double)); + for (j = 0; j < fsize; j++) { + HTS_get_token(fp[i], buff); + win->coefficient[i][j] = (double) atof(buff); + } + /* set pointer */ + length = fsize / 2; + win->coefficient[i] += length; + win->l_width[i] = -length; + win->r_width[i] = length; + if (fsize % 2 == 0) + win->r_width[i]--; + } + /* calcurate max_width to determine size of band matrix */ + win->max_width = 0; + for (i = 0; i < win->size; i++) { + if (win->max_width < abs(win->l_width[i])) + win->max_width = abs(win->l_width[i]); + if (win->max_width < abs(win->r_width[i])) + win->max_width = abs(win->r_width[i]); + } +} + +/* HTS_Window_clear: free dynamic window */ +static void HTS_Window_clear(HTS_Window * win) +{ + int i; + + if (win->coefficient) { + for (i = win->size - 1; i >= 0; i--) { + win->coefficient[i] += win->l_width[i]; + HTS_free(win->coefficient[i]); + } + HTS_free(win->coefficient); + } + if (win->l_width) + HTS_free(win->l_width); + if (win->r_width) + HTS_free(win->r_width); + + HTS_Window_initialize(win); +} + +/* HTS_Model_initialize: initialize model */ +static void HTS_Model_initialize(HTS_Model * model) +{ + model->vector_length = 0; + model->ntree = 0; + model->npdf = NULL; + model->pdf = NULL; + model->tree = NULL; + model->question = NULL; +} + +/* HTS_Model_load_pdf: load pdfs */ +static void HTS_Model_load_pdf(HTS_Model * model, FILE * fp, int ntree, + HTS_Boolean msd_flag) +{ + int i, j, k, l, m; + float temp; + int ssize; + + /* check */ + if (fp == NULL) + HTS_error(1, "HTS_Model_load_pdf: File for pdfs is not specified.\n"); + + /* load pdf */ + model->ntree = ntree; + /* read MSD flag */ + HTS_fread_big_endian(&i, sizeof(int), 1, fp); + if ((i != 0 || msd_flag != FALSE) && (i != 1 || msd_flag != TRUE)) + HTS_error(1, "HTS_Model_load_pdf: Failed to load header of pdfs.\n"); + /* read stream size */ + HTS_fread_big_endian(&ssize, sizeof(int), 1, fp); + if (ssize < 1) + HTS_error(1, "HTS_Model_load_pdf: Failed to load header of pdfs.\n"); + /* read vector size */ + HTS_fread_big_endian(&model->vector_length, sizeof(int), 1, fp); + if (model->vector_length < 0) + HTS_error(1, + "HTS_Model_load_pdf: # of HMM states %d should be positive.\n", + model->vector_length); + model->npdf = (int *) HTS_calloc(ntree, sizeof(int)); + model->npdf -= 2; + /* read the number of pdfs */ + HTS_fread_big_endian(&model->npdf[2], sizeof(int), ntree, fp); + for (i = 2; i <= ntree + 1; i++) + if (model->npdf[i] < 0) + HTS_error(1, + "HTS_Model_load_pdf: # of pdfs at %d-th state should be positive.\n", + i); + model->pdf = (double ***) HTS_calloc(ntree, sizeof(double **)); + model->pdf -= 2; + /* read means and variances */ + if (msd_flag) { /* for MSD */ + for (j = 2; j <= ntree + 1; j++) { + model->pdf[j] = (double **) + HTS_calloc(model->npdf[j], sizeof(double *)); + model->pdf[j]--; + for (k = 1; k <= model->npdf[j]; k++) { + model->pdf[j][k] = (double *) + HTS_calloc(2 * model->vector_length + 1, sizeof(double)); + for (l = 0; l < ssize; l++) { + for (m = 0; m < model->vector_length / ssize; m++) { + HTS_fread_big_endian(&temp, sizeof(float), 1, fp); + model->pdf[j][k][l * model->vector_length / ssize + m] = + (double) temp; + HTS_fread_big_endian(&temp, sizeof(float), 1, fp); + model->pdf[j][k][l * model->vector_length / ssize + m + + model->vector_length] = (double) temp; + } + HTS_fread_big_endian(&temp, sizeof(float), 1, fp); + if (l == 0) { + if (temp < 0.0 || temp > 1.0) + HTS_error(1, + "HTS_Model_load_pdf: MSD weight should be within 0.0 to 1.0.\n"); + model->pdf[j][k][2 * model->vector_length] = (double) temp; + } + HTS_fread_big_endian(&temp, sizeof(float), 1, fp); + } + } + } + } else { /* for non MSD */ + for (j = 2; j <= ntree + 1; j++) { + model->pdf[j] = + (double **) HTS_calloc(model->npdf[j], sizeof(double *)); + model->pdf[j]--; + for (k = 1; k <= model->npdf[j]; k++) { + model->pdf[j][k] = + (double *) HTS_calloc(2 * model->vector_length, sizeof(double)); + for (l = 0; l < model->vector_length; l++) { + HTS_fread_big_endian(&temp, sizeof(float), 1, fp); + model->pdf[j][k][l] = (double) temp; + HTS_fread_big_endian(&temp, sizeof(float), 1, fp); + model->pdf[j][k][l + model->vector_length] = (double) temp; + } + } + } + } +} + +/* HTS_Model_load_tree: load trees */ +static void HTS_Model_load_tree(HTS_Model * model, FILE * fp) +{ + char buff[HTS_MAXBUFLEN]; + HTS_Question *question, *last_question; + HTS_Tree *tree, *last_tree; + int state; + + /* check */ + if (fp == NULL) + HTS_error(1, "HTS_Model_load_tree: File for trees is not specified.\n"); + + model->ntree = 0; + last_question = NULL; + last_tree = NULL; + while (!feof(fp)) { + HTS_get_pattern_token(fp, buff); + /* parse questions */ + if (strcmp(buff, "QS") == 0) { + question = (HTS_Question *) HTS_calloc(1, sizeof(HTS_Question)); + HTS_Question_load(question, fp); + if (model->question) + last_question->next = question; + else + model->question = question; + question->next = NULL; + last_question = question; + } + /* parse trees */ + state = HTS_get_state_num(buff); + if (state != 0) { + tree = (HTS_Tree *) HTS_calloc(1, sizeof(HTS_Tree)); + tree->next = NULL; + tree->root = NULL; + tree->head = NULL; + tree->state = state; + HTS_Tree_parse_pattern(tree, buff); + HTS_Tree_load(tree, fp, model->question); + if (model->tree) + last_tree->next = tree; + else + model->tree = tree; + tree->next = NULL; + last_tree = tree; + model->ntree++; + } + } + /* No Tree information in tree file */ + if (model->tree == NULL) + HTS_error(1, "HTS_Model_load_tree: No trees are loaded.\n"); +} + +/* HTS_Model_clear: free pdfs and trees */ +static void HTS_Model_clear(HTS_Model * model) +{ + int i, j; + HTS_Question *question, *next_question; + HTS_Tree *tree, *next_tree; + + for (question = model->question; question; question = next_question) { + next_question = question->next; + HTS_Question_clear(question); + HTS_free(question); + } + for (tree = model->tree; tree; tree = next_tree) { + next_tree = tree->next; + HTS_Tree_clear(tree); + HTS_free(tree); + } + if (model->pdf) { + for (i = 2; i <= model->ntree + 1; i++) { + for (j = 1; j <= model->npdf[i]; j++) { + HTS_free(model->pdf[i][j]); + } + model->pdf[i]++; + HTS_free(model->pdf[i]); + } + model->pdf += 2; + HTS_free(model->pdf); + } + if (model->npdf) { + model->npdf += 2; + HTS_free(model->npdf); + } + HTS_Model_initialize(model); +} + +/* HTS_Stream_initialize: initialize stream */ +static void HTS_Stream_initialize(HTS_Stream * stream) +{ + stream->vector_length = 0; + stream->model = NULL; + HTS_Window_initialize(&stream->window); + stream->msd_flag = FALSE; + stream->interpolation_size = 0; +} + +/* HTS_Stream_load_pdf: load pdf */ +static void HTS_Stream_load_pdf(HTS_Stream * stream, FILE ** fp, int ntree, + HTS_Boolean msd_flag, int interpolation_size) +{ + int i; + + /* initialize */ + stream->msd_flag = msd_flag; + stream->interpolation_size = interpolation_size; + stream->model = + (HTS_Model *) HTS_calloc(interpolation_size, sizeof(HTS_Model)); + /* load pdfs */ + for (i = 0; i < stream->interpolation_size; i++) { + HTS_Model_initialize(&stream->model[i]); + HTS_Model_load_pdf(&stream->model[i], fp[i], ntree, stream->msd_flag); + } + /* check */ + for (i = 1; i < stream->interpolation_size; i++) + if (stream->model[0].vector_length != stream->model[1].vector_length) + HTS_error(1, + "HTS_Stream_load_pdf: # of states are different in between given modelsets.\n"); + /* set */ + stream->vector_length = stream->model[0].vector_length; +} + +/* HTS_Stream_load_pdf_and_tree: load PDFs and trees */ +static void HTS_Stream_load_pdf_and_tree(HTS_Stream * stream, FILE ** pdf_fp, + FILE ** tree_fp, HTS_Boolean msd_flag, + int interpolation_size) +{ + int i; + + /* initialize */ + stream->msd_flag = msd_flag; + stream->interpolation_size = interpolation_size; + stream->model = + (HTS_Model *) HTS_calloc(interpolation_size, sizeof(HTS_Model)); + /* load */ + for (i = 0; i < stream->interpolation_size; i++) { + if (!pdf_fp[i]) + HTS_error(1, + "HTS_Stream_load_pdf_and_tree: File for duration PDFs is not specified.\n"); + if (!tree_fp[i]) + HTS_error(1, + "HTS_Stream_load_pdf_and_tree: File for duration trees is not specified.\n"); + HTS_Model_initialize(&stream->model[i]); + HTS_Model_load_tree(&stream->model[i], tree_fp[i]); + HTS_Model_load_pdf(&stream->model[i], pdf_fp[i], stream->model[i].ntree, + stream->msd_flag); + } + /* check */ + for (i = 1; i < stream->interpolation_size; i++) + if (stream->model[0].vector_length != stream->model[i].vector_length) + HTS_error(1, + "HTS_Stream_load_pdf_and_tree: Vector sizes of state output vectors are different in between given modelsets.\n"); + /* set */ + stream->vector_length = stream->model[0].vector_length; +} + +/* HTS_Stream_load_dynamic_window: load windows */ +static void HTS_Stream_load_dynamic_window(HTS_Stream * stream, FILE ** fp, + int size) +{ + HTS_Window_load(&stream->window, fp, size); +} + +/* HTS_Stream_clear: free stream */ +static void HTS_Stream_clear(HTS_Stream * stream) +{ + int i; + + if (stream->model) { + for (i = 0; i < stream->interpolation_size; i++) + HTS_Model_clear(&stream->model[i]); + HTS_free(stream->model); + } + HTS_Window_clear(&stream->window); + HTS_Stream_initialize(stream); +} + +/* HTS_ModelSet_initialize: initialize model set */ +void HTS_ModelSet_initialize(HTS_ModelSet * ms, int nstream) +{ + HTS_Stream_initialize(&ms->duration); + ms->stream = NULL; + ms->gv = NULL; + HTS_Model_initialize(&ms->gv_switch); + ms->nstate = -1; + ms->nstream = nstream; +} + +/* HTS_ModelSet_load_duration: load duration model and number of state */ +void HTS_ModelSet_load_duration(HTS_ModelSet * ms, FILE ** pdf_fp, + FILE ** tree_fp, int interpolation_size) +{ + /* check */ + if (pdf_fp == NULL) + HTS_error(1, + "HTS_ModelSet_load_duration: File for duration PDFs is not specified.\n"); + if (tree_fp == NULL) + HTS_error(1, + "HTS_ModelSet_load_duration: File for duration trees is not specified.\n"); + + HTS_Stream_load_pdf_and_tree(&ms->duration, pdf_fp, tree_fp, FALSE, + interpolation_size); + ms->nstate = ms->duration.vector_length; +} + +/* HTS_ModelSet_load_parameter: load model */ +void HTS_ModelSet_load_parameter(HTS_ModelSet * ms, FILE ** pdf_fp, + FILE ** tree_fp, FILE ** win_fp, + int stream_index, HTS_Boolean msd_flag, + int window_size, int interpolation_size) +{ + int i; + + /* check */ + if (pdf_fp == NULL) + HTS_error(1, + "HTS_ModelSet_load_parameter: File for pdfs is not specified.\n"); + if (tree_fp == NULL) + HTS_error(1, + "HTS_ModelSet_load_parameter: File for wins is not specified.\n"); + if (win_fp == NULL) + HTS_error(1, + "HTS_ModelSet_load_parameter: File for wins is not specified.\n"); + /* initialize */ + if (!ms->stream) { + ms->stream = (HTS_Stream *) HTS_calloc(ms->nstream, sizeof(HTS_Stream)); + for (i = 0; i < ms->nstream; i++) + HTS_Stream_initialize(&ms->stream[i]); + } + /* load */ + HTS_Stream_load_pdf_and_tree(&ms->stream[stream_index], pdf_fp, tree_fp, + msd_flag, interpolation_size); + HTS_Stream_load_dynamic_window(&ms->stream[stream_index], win_fp, + window_size); +} + +/* HTS_ModelSet_load_gv: load GV model */ +void HTS_ModelSet_load_gv(HTS_ModelSet * ms, FILE ** pdf_fp, FILE ** tree_fp, + int stream_index, int interpolation_size) +{ + int i; + + /* check */ + if (pdf_fp == NULL) + HTS_error(1, + "HTS_ModelSet_load_gv: File for GV pdfs is not specified.\n"); + /* initialize */ + if (!ms->gv) { + ms->gv = (HTS_Stream *) HTS_calloc(ms->nstream, sizeof(HTS_Stream)); + for (i = 0; i < ms->nstream; i++) + HTS_Stream_initialize(&ms->gv[i]); + } + if (tree_fp) + HTS_Stream_load_pdf_and_tree(&ms->gv[stream_index], pdf_fp, tree_fp, + FALSE, interpolation_size); + else + HTS_Stream_load_pdf(&ms->gv[stream_index], pdf_fp, 1, FALSE, + interpolation_size); +} + +/* HTS_ModelSet_have_gv_tree: if context-dependent GV is used, return true */ +HTS_Boolean HTS_ModelSet_have_gv_tree(HTS_ModelSet * ms, int stream_index) +{ + int i; + + for (i = 0; i < ms->gv[stream_index].interpolation_size; i++) + if (ms->gv[stream_index].model[i].tree == NULL) + return FALSE; + return TRUE; +} + +/* HTS_ModelSet_load_gv_switch: load GV switch */ +void HTS_ModelSet_load_gv_switch(HTS_ModelSet * ms, FILE * fp) +{ + if (fp != NULL) + HTS_Model_load_tree(&ms->gv_switch, fp); +} + +/* HTS_ModelSet_have_gv_switch: if GV switch is used, return true */ +HTS_Boolean HTS_ModelSet_have_gv_switch(HTS_ModelSet * ms) +{ + if (ms->gv_switch.tree != NULL) + return TRUE; + else + return FALSE; +} + +/* HTS_ModelSet_get_nstate: get number of state */ +int HTS_ModelSet_get_nstate(HTS_ModelSet * ms) +{ + return ms->nstate; +} + +/* HTS_ModelSet_get_nstream: get number of stream */ +int HTS_ModelSet_get_nstream(HTS_ModelSet * ms) +{ + return ms->nstream; +} + +/* HTS_ModelSet_get_vector_length: get vector length */ +int HTS_ModelSet_get_vector_length(HTS_ModelSet * ms, int stream_index) +{ + return ms->stream[stream_index].vector_length; +} + +/* HTS_ModelSet_is_msd: get MSD flag */ +HTS_Boolean HTS_ModelSet_is_msd(HTS_ModelSet * ms, int stream_index) +{ + return ms->stream[stream_index].msd_flag; +} + +/* HTS_ModelSet_get_window_size: get dynamic window size */ +int HTS_ModelSet_get_window_size(HTS_ModelSet * ms, int stream_index) +{ + return ms->stream[stream_index].window.size; +} + +/* HTS_ModelSet_get_window_left_width: get left width of dynamic window */ +int HTS_ModelSet_get_window_left_width(HTS_ModelSet * ms, int stream_index, + int window_index) +{ + return ms->stream[stream_index].window.l_width[window_index]; +} + +/* HTS_ModelSet_get_window_right_width: get right width of dynamic window */ +int HTS_ModelSet_get_window_right_width(HTS_ModelSet * ms, int stream_index, + int window_index) +{ + return ms->stream[stream_index].window.r_width[window_index]; +} + +/* HTS_ModelSet_get_window_coefficient: get coefficient of dynamic window */ +double HTS_ModelSet_get_window_coefficient(HTS_ModelSet * ms, int stream_index, + int window_index, + int coefficient_index) +{ + return ms->stream[stream_index].window. + coefficient[window_index][coefficient_index]; +} + +/* HTS_ModelSet_get_window_max_width: get max width of dynamic window */ +int HTS_ModelSet_get_window_max_width(HTS_ModelSet * ms, int stream_index) +{ + return ms->stream[stream_index].window.max_width; +} + +/* HTS_ModelSet_get_duration_interpolation_size: get interpolation size (duration model) */ +int HTS_ModelSet_get_duration_interpolation_size(HTS_ModelSet * ms) +{ + return ms->duration.interpolation_size; +} + +/* HTS_ModelSet_get_parameter_interpolation_size: get interpolation size (parameter model) */ +int HTS_ModelSet_get_parameter_interpolation_size(HTS_ModelSet * ms, + int stream_index) +{ + return ms->stream[stream_index].interpolation_size; +} + +/* HTS_ModelSet_get_gv_interpolation_size: get interpolation size (GV model) */ +int HTS_ModelSet_get_gv_interpolation_size(HTS_ModelSet * ms, int stream_index) +{ + return ms->gv[stream_index].interpolation_size; +} + +/* HTS_ModelSet_use_gv: get GV flag */ +HTS_Boolean HTS_ModelSet_use_gv(HTS_ModelSet * ms, int stream_index) +{ + if (!ms->gv) + return FALSE; + if (ms->gv[stream_index].vector_length > 0) + return TRUE; + return FALSE; +} + +/* HTS_ModelSet_get_duration_index: get index of duration tree and PDF */ +void HTS_ModelSet_get_duration_index(HTS_ModelSet * ms, char *string, + int *tree_index, int *pdf_index, + int interpolation_index) +{ + HTS_Tree *tree; + HTS_Pattern *pattern; + HTS_Boolean find; + + find = FALSE; + (*tree_index) = 2; + (*pdf_index) = 1; + for (tree = ms->duration.model[interpolation_index].tree; tree; + tree = tree->next) { + pattern = tree->head; + if (!pattern) + find = TRUE; + for (; pattern; pattern = pattern->next) + if (HTS_pattern_match(string, pattern->string)) { + find = TRUE; + break; + } + if (find) + break; + (*tree_index)++; + } + + if (tree == NULL) + HTS_error(1, "HTS_ModelSet_get_duration_index: Cannot find model %s.\n", + string); + (*pdf_index) = HTS_Tree_search_node(tree, string); +} + +/* HTS_ModelSet_get_duration: get duration using interpolation weight */ +void HTS_ModelSet_get_duration(HTS_ModelSet * ms, char *string, double *mean, + double *vari, double *iw) +{ + int i, j; + int tree_index, pdf_index; + const int vector_length = ms->duration.vector_length; + + for (i = 0; i < ms->nstate; i++) { + mean[i] = 0.0; + vari[i] = 0.0; + } + for (i = 0; i < ms->duration.interpolation_size; i++) { + HTS_ModelSet_get_duration_index(ms, string, &tree_index, &pdf_index, i); + for (j = 0; j < ms->nstate; j++) { + mean[j] += iw[i] * ms->duration.model[i].pdf[tree_index][pdf_index][j]; + vari[j] += iw[i] * iw[i] * ms->duration.model[i] + .pdf[tree_index][pdf_index][j + vector_length]; + } + } +} + +/* HTS_ModelSet_get_parameter_index: get index of parameter tree and PDF */ +void HTS_ModelSet_get_parameter_index(HTS_ModelSet * ms, char *string, + int *tree_index, int *pdf_index, + int stream_index, int state_index, + int interpolation_index) +{ + HTS_Tree *tree; + HTS_Pattern *pattern; + HTS_Boolean find; + + find = FALSE; + (*tree_index) = 2; + (*pdf_index) = 1; + for (tree = ms->stream[stream_index].model[interpolation_index].tree; tree; + tree = tree->next) { + if (tree->state == state_index) { + pattern = tree->head; + if (!pattern) + find = TRUE; + for (; pattern; pattern = pattern->next) + if (HTS_pattern_match(string, pattern->string)) { + find = TRUE; + break; + } + if (find) + break; + } + (*tree_index)++; + } + + if (tree == NULL) + HTS_error(1, "HTS_ModelSet_get_parameter_index: Cannot find model %s.\n", + string); + (*pdf_index) = HTS_Tree_search_node(tree, string); +} + +/* HTS_ModelSet_get_parameter: get parameter using interpolation weight */ +void HTS_ModelSet_get_parameter(HTS_ModelSet * ms, char *string, double *mean, + double *vari, double *msd, int stream_index, + int state_index, double *iw) +{ + int i, j; + int tree_index, pdf_index; + const int vector_length = ms->stream[stream_index].vector_length; + + for (i = 0; i < vector_length; i++) { + mean[i] = 0.0; + vari[i] = 0.0; + } + if (msd) + *msd = 0.0; + for (i = 0; i < ms->stream[stream_index].interpolation_size; i++) { + HTS_ModelSet_get_parameter_index(ms, string, &tree_index, &pdf_index, + stream_index, state_index, i); + for (j = 0; j < vector_length; j++) { + mean[j] += iw[i] * + ms->stream[stream_index].model[i].pdf[tree_index][pdf_index][j]; + vari[j] += iw[i] * iw[i] * ms->stream[stream_index].model[i] + .pdf[tree_index][pdf_index][j + vector_length]; + } + if (ms->stream[stream_index].msd_flag) { + *msd += iw[i] * ms->stream[stream_index].model[i] + .pdf[tree_index][pdf_index][2 * vector_length]; + } + } +} + +/* HTS_ModelSet_get_gv_index: get index of GV tree and PDF */ +void HTS_ModelSet_get_gv_index(HTS_ModelSet * ms, char *string, int *tree_index, + int *pdf_index, int stream_index, + int interpolation_index) +{ + HTS_Tree *tree; + HTS_Pattern *pattern; + HTS_Boolean find; + + find = FALSE; + (*tree_index) = 2; + (*pdf_index) = 1; + + if (HTS_ModelSet_have_gv_tree(ms, stream_index) == FALSE) + return; + for (tree = ms->gv[stream_index].model[interpolation_index].tree; tree; + tree = tree->next) { + pattern = tree->head; + if (!pattern) + find = TRUE; + for (; pattern; pattern = pattern->next) + if (HTS_pattern_match(string, pattern->string)) { + find = TRUE; + break; + } + if (find) + break; + (*tree_index)++; + } + + if (tree == NULL) + HTS_error(1, "HTS_ModelSet_get_gv_index: Cannot find model %s.\n", + string); + (*pdf_index) = HTS_Tree_search_node(tree, string); +} + +/* HTS_ModelSet_get_gv: get GV using interpolation weight */ +void HTS_ModelSet_get_gv(HTS_ModelSet * ms, char *string, double *mean, + double *vari, int stream_index, double *iw) +{ + int i, j; + int tree_index, pdf_index; + const int vector_length = ms->gv[stream_index].vector_length; + + for (i = 0; i < vector_length; i++) { + mean[i] = 0.0; + vari[i] = 0.0; + } + for (i = 0; i < ms->gv[stream_index].interpolation_size; i++) { + HTS_ModelSet_get_gv_index(ms, string, &tree_index, &pdf_index, + stream_index, i); + for (j = 0; j < vector_length; j++) { + mean[j] += iw[i] * + ms->gv[stream_index].model[i].pdf[tree_index][pdf_index][j]; + vari[j] += iw[i] * iw[i] * ms->gv[stream_index].model[i] + .pdf[tree_index][pdf_index][j + vector_length]; + } + } +} + +/* HTS_ModelSet_get_gv_switch_index: get index of GV switch tree and PDF */ +void HTS_ModelSet_get_gv_switch_index(HTS_ModelSet * ms, char *string, + int *tree_index, int *pdf_index) +{ + HTS_Tree *tree; + HTS_Pattern *pattern; + HTS_Boolean find; + + find = FALSE; + (*tree_index) = 2; + (*pdf_index) = 1; + for (tree = ms->gv_switch.tree; tree; tree = tree->next) { + pattern = tree->head; + if (!pattern) + find = TRUE; + for (; pattern; pattern = pattern->next) + if (HTS_pattern_match(string, pattern->string)) { + find = TRUE; + break; + } + if (find) + break; + (*tree_index)++; + } + + if (tree == NULL) + HTS_error(1, "HTS_ModelSet_get_gv_switch_index: Cannot find model %s.\n", + string); + (*pdf_index) = HTS_Tree_search_node(tree, string); +} + +/* HTS_ModelSet_get_gv_switch: get GV switch */ +HTS_Boolean HTS_ModelSet_get_gv_switch(HTS_ModelSet * ms, char *string) +{ + int tree_index, pdf_index; + + if (ms->gv_switch.tree == NULL) + return TRUE; + HTS_ModelSet_get_gv_switch_index(ms, string, &tree_index, &pdf_index); + if (pdf_index == 1) + return FALSE; + else + return TRUE; +} + +/* HTS_ModelSet_clear: free model set */ +void HTS_ModelSet_clear(HTS_ModelSet * ms) +{ + int i; + + HTS_Stream_clear(&ms->duration); + if (ms->stream) { + for (i = 0; i < ms->nstream; i++) + HTS_Stream_clear(&ms->stream[i]); + HTS_free(ms->stream); + } + if (ms->gv) { + for (i = 0; i < ms->nstream; i++) + HTS_Stream_clear(&ms->gv[i]); + HTS_free(ms->gv); + } + HTS_Model_clear(&ms->gv_switch); + HTS_ModelSet_initialize(ms, -1); +} + +HTS_MODEL_C_END; + +#endif /* !HTS_MODEL_C */ diff --git a/src/modules/hts_engine/HTS_pstream.c b/src/modules/hts_engine/HTS_pstream.c new file mode 100644 index 0000000..f31e7e6 --- /dev/null +++ b/src/modules/hts_engine/HTS_pstream.c @@ -0,0 +1,537 @@ +/* ----------------------------------------------------------------- */ +/* The HMM-Based Speech Synthesis System (HTS) */ +/* hts_engine API developed by HTS Working Group */ +/* http://hts-engine.sourceforge.net/ */ +/* ----------------------------------------------------------------- */ +/* */ +/* Copyright (c) 2001-2010 Nagoya Institute of Technology */ +/* Department of Computer Science */ +/* */ +/* 2001-2008 Tokyo Institute of Technology */ +/* Interdisciplinary Graduate School of */ +/* Science and Engineering */ +/* */ +/* All rights reserved. */ +/* */ +/* Redistribution and use in source and binary forms, with or */ +/* without modification, are permitted provided that the following */ +/* conditions are met: */ +/* */ +/* - Redistributions of source code must retain the above copyright */ +/* notice, this list of conditions and the following disclaimer. */ +/* - Redistributions in binary form must reproduce the above */ +/* copyright notice, this list of conditions and the following */ +/* disclaimer in the documentation and/or other materials provided */ +/* with the distribution. */ +/* - Neither the name of the HTS working group nor the names of its */ +/* contributors may be used to endorse or promote products derived */ +/* from this software without specific prior written permission. */ +/* */ +/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND */ +/* CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, */ +/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */ +/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */ +/* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS */ +/* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, */ +/* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED */ +/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, */ +/* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON */ +/* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, */ +/* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY */ +/* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */ +/* POSSIBILITY OF SUCH DAMAGE. */ +/* ----------------------------------------------------------------- */ + +#ifndef HTS_PSTREAM_C +#define HTS_PSTREAM_C + +#ifdef __cplusplus +#define HTS_PSTREAM_C_START extern "C" { +#define HTS_PSTREAM_C_END } +#else +#define HTS_PSTREAM_C_START +#define HTS_PSTREAM_C_END +#endif /* __CPLUSPLUS */ + +HTS_PSTREAM_C_START; + +#include /* for sqrt() */ + +/* hts_engine libraries */ +#include "HTS_hidden.h" + +/* HTS_finv: calculate 1.0/variance function */ +static double HTS_finv(const double x) +{ + if (x >= INFTY2) + return 0.0; + if (x <= -INFTY2) + return 0.0; + if (x <= INVINF2 && x >= 0) + return INFTY; + if (x >= -INVINF2 && x < 0) + return -INFTY; + + return (1.0 / x); +} + +/* HTS_PStream_calc_wuw_and_wum: calcurate W'U^{-1}W and W'U^{-1}M */ +static void HTS_PStream_calc_wuw_and_wum(HTS_PStream * pst, const int m) +{ + int t, i, j, k; + double wu; + + for (t = 0; t < pst->length; t++) { + /* initialize */ + pst->sm.wum[t] = 0.0; + for (i = 0; i < pst->width; i++) + pst->sm.wuw[t][i] = 0.0; + + /* calc WUW & WUM */ + for (i = 0; i < pst->win_size; i++) + for (j = pst->win_l_width[i]; j <= pst->win_r_width[i]; j++) + if ((t + j >= 0) && (t + j < pst->length) + && (pst->win_coefficient[i][-j] != 0.0)) { + wu = pst->win_coefficient[i][-j] * + pst->sm.ivar[t + j][i * pst->static_length + m]; + pst->sm.wum[t] += + wu * pst->sm.mean[t + j][i * pst->static_length + m]; + for (k = 0; (k < pst->width) && (t + k < pst->length); k++) + if ((k - j <= pst->win_r_width[i]) + && (pst->win_coefficient[i][k - j] != 0.0)) + pst->sm.wuw[t][k] += wu * pst->win_coefficient[i][k - j]; + } + } +} + + +/* HTS_PStream_ldl_factorization: Factorize W'*U^{-1}*W to L*D*L' (L: lower triangular, D: diagonal) */ +static void HTS_PStream_ldl_factorization(HTS_PStream * pst) +{ + int t, i, j; + + for (t = 0; t < pst->length; t++) { + for (i = 1; (i < pst->width) && (t >= i); i++) + pst->sm.wuw[t][0] -= pst->sm.wuw[t - i][i] * + pst->sm.wuw[t - i][i] * pst->sm.wuw[t - i][0]; + + for (i = 1; i < pst->width; i++) { + for (j = 1; (i + j < pst->width) && (t >= j); j++) + pst->sm.wuw[t][i] -= pst->sm.wuw[t - j][j] * + pst->sm.wuw[t - j][i + j] * pst->sm.wuw[t - j][0]; + pst->sm.wuw[t][i] /= pst->sm.wuw[t][0]; + } + } +} + +/* HTS_PStream_forward_substitution: forward subtitution for mlpg */ +static void HTS_PStream_forward_substitution(HTS_PStream * pst) +{ + int t, i; + + for (t = 0; t < pst->length; t++) { + pst->sm.g[t] = pst->sm.wum[t]; + for (i = 1; (i < pst->width) && (t >= i); i++) + pst->sm.g[t] -= pst->sm.wuw[t - i][i] * pst->sm.g[t - i]; + } +} + +/* HTS_PStream_backward_substitution: backward subtitution for mlpg */ +static void HTS_PStream_backward_substitution(HTS_PStream * pst, const int m) +{ + int t, i; + + for (t = pst->length - 1; t >= 0; t--) { + pst->par[t][m] = pst->sm.g[t] / pst->sm.wuw[t][0]; + for (i = 1; (i < pst->width) && (t + i < pst->length); i++) + pst->par[t][m] -= pst->sm.wuw[t][i] * pst->par[t + i][m]; + } +} + +/* HTS_PStream_calc_gv: subfunction for mlpg using GV */ +static void HTS_PStream_calc_gv(HTS_PStream * pst, const int m, double *mean, + double *vari) +{ + int t; + + *mean = 0.0; + for (t = 0; t < pst->length; t++) + if (pst->gv_switch[t]) + *mean += pst->par[t][m]; + *mean /= pst->gv_length; + *vari = 0.0; + for (t = 0; t < pst->length; t++) + if (pst->gv_switch[t]) + *vari += (pst->par[t][m] - *mean) * (pst->par[t][m] - *mean); + *vari /= pst->gv_length; +} + +/* HTS_PStream_conv_gv: subfunction for mlpg using GV */ +static void HTS_PStream_conv_gv(HTS_PStream * pst, const int m) +{ + int t; + double ratio; + double mean; + double vari; + + HTS_PStream_calc_gv(pst, m, &mean, &vari); + ratio = sqrt(pst->gv_mean[m] / vari); + for (t = 0; t < pst->length; t++) + if (pst->gv_switch[t]) + pst->par[t][m] = ratio * (pst->par[t][m] - mean) + mean; +} + +/* HTS_PStream_calc_derivative: subfunction for mlpg using GV */ +static double HTS_PStream_calc_derivative(HTS_PStream * pst, const int m) +{ + int t, i; + double mean; + double vari; + double dv; + double h; + double gvobj; + double hmmobj; + const double w = 1.0 / (pst->win_size * pst->length); + + HTS_PStream_calc_gv(pst, m, &mean, &vari); + gvobj = -0.5 * W2 * vari * pst->gv_vari[m] * (vari - 2.0 * pst->gv_mean[m]); + dv = -2.0 * pst->gv_vari[m] * (vari - pst->gv_mean[m]) / pst->length; + + for (t = 0; t < pst->length; t++) { + pst->sm.g[t] = pst->sm.wuw[t][0] * pst->par[t][m]; + for (i = 1; i < pst->width; i++) { + if (t + i < pst->length) + pst->sm.g[t] += pst->sm.wuw[t][i] * pst->par[t + i][m]; + if (t + 1 > i) + pst->sm.g[t] += pst->sm.wuw[t - i][i] * pst->par[t - i][m]; + } + } + + for (t = 0, hmmobj = 0.0; t < pst->length; t++) { + hmmobj += W1 * w * pst->par[t][m] * (pst->sm.wum[t] - 0.5 * pst->sm.g[t]); + h = -W1 * w * pst->sm.wuw[t][1 - 1] + - W2 * 2.0 / (pst->length * pst->length) * + ((pst->length - 1) * pst->gv_vari[m] * (vari - pst->gv_mean[m]) + + 2.0 * pst->gv_vari[m] * (pst->par[t][m] - mean) * (pst->par[t][m] - + mean)); + if (pst->gv_switch[t]) + pst->sm.g[t] = + 1.0 / h * (W1 * w * (-pst->sm.g[t] + pst->sm.wum[t]) + + W2 * dv * (pst->par[t][m] - mean)); + else + pst->sm.g[t] = 1.0 / h * (W1 * w * (-pst->sm.g[t] + pst->sm.wum[t])); + } + + return (-(hmmobj + gvobj)); +} + +/* HTS_PStream_gv_parmgen: function for mlpg using GV */ +static void HTS_PStream_gv_parmgen(HTS_PStream * pst, const int m) +{ + int t, i; + double step = STEPINIT; + double prev = -LZERO; + double obj; + + if (pst->gv_length == 0) + return; + + HTS_PStream_conv_gv(pst, m); + if (GV_MAX_ITERATION > 0) { + HTS_PStream_calc_wuw_and_wum(pst, m); + for (i = 1; i <= GV_MAX_ITERATION; i++) { + obj = HTS_PStream_calc_derivative(pst, m); + if (obj > prev) + step *= STEPDEC; + if (obj < prev) + step *= STEPINC; + for (t = 0; t < pst->length; t++) + pst->par[t][m] += step * pst->sm.g[t]; + prev = obj; + } + } +} + +/* HTS_PStream_mlpg: generate sequence of speech parameter vector maximizing its output probability for given pdf sequence */ +static void HTS_PStream_mlpg(HTS_PStream * pst) +{ + int m; + + if (pst->length == 0) + return; + + for (m = 0; m < pst->static_length; m++) { + HTS_PStream_calc_wuw_and_wum(pst, m); + HTS_PStream_ldl_factorization(pst); /* LDL factorization */ + HTS_PStream_forward_substitution(pst); /* forward substitution */ + HTS_PStream_backward_substitution(pst, m); /* backward substitution */ + if (pst->gv_length > 0) + HTS_PStream_gv_parmgen(pst, m); + } +} + +/* HTS_PStreamSet_initialize: initialize parameter stream set */ +void HTS_PStreamSet_initialize(HTS_PStreamSet * pss) +{ + pss->pstream = NULL; + pss->nstream = 0; + pss->total_frame = 0; +} + +/* HTS_PStreamSet_create: parameter generation using GV weight */ +void HTS_PStreamSet_create(HTS_PStreamSet * pss, HTS_SStreamSet * sss, + double *msd_threshold, double *gv_weight) +{ + int i, j, k, l, m; + int frame, msd_frame, state; + + HTS_PStream *pst; + HTS_Boolean not_bound; + + if (pss->nstream) + HTS_error(1, "HTS_PstreamSet_create: HTS_PStreamSet should be clear.\n"); + + /* initialize */ + pss->nstream = HTS_SStreamSet_get_nstream(sss); + pss->pstream = (HTS_PStream *) HTS_calloc(pss->nstream, sizeof(HTS_PStream)); + pss->total_frame = HTS_SStreamSet_get_total_frame(sss); + + /* create */ + for (i = 0; i < pss->nstream; i++) { + pst = &pss->pstream[i]; + if (HTS_SStreamSet_is_msd(sss, i)) { /* for MSD */ + pst->length = 0; + for (state = 0; state < HTS_SStreamSet_get_total_state(sss); state++) + if (HTS_SStreamSet_get_msd(sss, i, state) > msd_threshold[i]) + pst->length += HTS_SStreamSet_get_duration(sss, state); + pst->msd_flag = + (HTS_Boolean *) HTS_calloc(pss->total_frame, sizeof(HTS_Boolean)); + for (state = 0, frame = 0; state < HTS_SStreamSet_get_total_state(sss); + state++) + if (HTS_SStreamSet_get_msd(sss, i, state) > msd_threshold[i]) + for (j = 0; j < HTS_SStreamSet_get_duration(sss, state); j++) { + pst->msd_flag[frame] = TRUE; + frame++; + } else + for (j = 0; j < HTS_SStreamSet_get_duration(sss, state); j++) { + pst->msd_flag[frame] = FALSE; + frame++; + } + } else { /* for non MSD */ + pst->length = pss->total_frame; + pst->msd_flag = NULL; + } + pst->vector_length = HTS_SStreamSet_get_vector_length(sss, i); + pst->width = HTS_SStreamSet_get_window_max_width(sss, i) * 2 + 1; /* band width of R */ + pst->win_size = HTS_SStreamSet_get_window_size(sss, i); + pst->static_length = pst->vector_length / pst->win_size; + pst->sm.mean = HTS_alloc_matrix(pst->length, pst->vector_length); + pst->sm.ivar = HTS_alloc_matrix(pst->length, pst->vector_length); + pst->sm.wum = (double *) HTS_calloc(pst->length, sizeof(double)); + pst->sm.wuw = HTS_alloc_matrix(pst->length, pst->width); + pst->sm.g = (double *) HTS_calloc(pst->length, sizeof(double)); + pst->par = HTS_alloc_matrix(pst->length, pst->static_length); + /* copy dynamic window */ + pst->win_l_width = (int *) HTS_calloc(pst->win_size, sizeof(int)); + pst->win_r_width = (int *) HTS_calloc(pst->win_size, sizeof(int)); + pst->win_coefficient = + (double **) HTS_calloc(pst->win_size, sizeof(double)); + for (j = 0; j < pst->win_size; j++) { + pst->win_l_width[j] = HTS_SStreamSet_get_window_left_width(sss, i, j); + pst->win_r_width[j] = HTS_SStreamSet_get_window_right_width(sss, i, j); + if (pst->win_l_width[j] + pst->win_r_width[j] == 0) + pst->win_coefficient[j] = (double *) + HTS_calloc(-2 * pst->win_l_width[j] + 1, sizeof(double)); + else + pst->win_coefficient[j] = (double *) + HTS_calloc(-2 * pst->win_l_width[j], sizeof(double)); + pst->win_coefficient[j] -= pst->win_l_width[j]; + for (k = pst->win_l_width[j]; k <= pst->win_r_width[j]; k++) + pst->win_coefficient[j][k] = + HTS_SStreamSet_get_window_coefficient(sss, i, j, k); + } + /* copy GV */ + if (HTS_SStreamSet_use_gv(sss, i)) { + pst->gv_mean = + (double *) HTS_calloc(pst->static_length, sizeof(double)); + pst->gv_vari = + (double *) HTS_calloc(pst->static_length, sizeof(double)); + for (j = 0; j < pst->static_length; j++) { + pst->gv_mean[j] = + HTS_SStreamSet_get_gv_mean(sss, i, j) * gv_weight[i]; + pst->gv_vari[j] = HTS_SStreamSet_get_gv_vari(sss, i, j); + } + pst->gv_switch = + (HTS_Boolean *) HTS_calloc(pst->length, sizeof(HTS_Boolean)); + if (HTS_SStreamSet_is_msd(sss, i)) { /* for MSD */ + for (state = 0, frame = 0, msd_frame = 0; + state < HTS_SStreamSet_get_total_state(sss); state++) + for (j = 0; j < HTS_SStreamSet_get_duration(sss, state); + j++, frame++) + if (pst->msd_flag[frame]) + pst->gv_switch[msd_frame++] = + HTS_SStreamSet_get_gv_switch(sss, i, state); + } else { /* for non MSD */ + for (state = 0, frame = 0; + state < HTS_SStreamSet_get_total_state(sss); state++) + for (j = 0; j < HTS_SStreamSet_get_duration(sss, state); j++) + pst->gv_switch[frame++] = + HTS_SStreamSet_get_gv_switch(sss, i, state); + } + for (j = 0, pst->gv_length = 0; j < pst->length; j++) + if (pst->gv_switch[j]) + pst->gv_length++; + } else { + pst->gv_switch = NULL; + pst->gv_length = 0; + pst->gv_mean = NULL; + pst->gv_vari = NULL; + } + /* copy pdfs */ + if (HTS_SStreamSet_is_msd(sss, i)) { /* for MSD */ + for (state = 0, frame = 0, msd_frame = 0; + state < HTS_SStreamSet_get_total_state(sss); state++) + for (j = 0; j < HTS_SStreamSet_get_duration(sss, state); j++) { + if (pst->msd_flag[frame]) { + /* check current frame is MSD boundary or not */ + for (k = 0; k < pst->win_size; k++) { + not_bound = TRUE; + for (l = pst->win_l_width[k]; l <= pst->win_r_width[k]; + l++) + if (frame + l < 0 || pss->total_frame <= frame + l + || !pst->msd_flag[frame + l]) { + not_bound = FALSE; + break; + } + for (l = 0; l < pst->static_length; l++) { + m = pst->static_length * k + l; + pst->sm.mean[msd_frame][m] = + HTS_SStreamSet_get_mean(sss, i, state, m); + if (not_bound || k == 0) + pst->sm.ivar[msd_frame][m] = + HTS_finv(HTS_SStreamSet_get_vari + (sss, i, state, m)); + else + pst->sm.ivar[msd_frame][m] = 0.0; + } + } + msd_frame++; + } + frame++; + } + } else { /* for non MSD */ + for (state = 0, frame = 0; + state < HTS_SStreamSet_get_total_state(sss); state++) { + for (j = 0; j < HTS_SStreamSet_get_duration(sss, state); j++) { + for (k = 0; k < pst->win_size; k++) { + not_bound = TRUE; + for (l = pst->win_l_width[k]; l <= pst->win_r_width[k]; l++) + if (frame + l < 0 || pss->total_frame <= frame + l) { + not_bound = FALSE; + break; + } + for (l = 0; l < pst->static_length; l++) { + m = pst->static_length * k + l; + pst->sm.mean[frame][m] = + HTS_SStreamSet_get_mean(sss, i, state, m); + if (not_bound || k == 0) + pst->sm.ivar[frame][m] = + HTS_finv(HTS_SStreamSet_get_vari(sss, i, state, m)); + else + pst->sm.ivar[frame][m] = 0.0; + } + } + frame++; + } + } + } + /* parameter generation */ + HTS_PStream_mlpg(pst); + } +} + +/* HTS_PStreamSet_get_nstream: get number of stream */ +int HTS_PStreamSet_get_nstream(HTS_PStreamSet * pss) +{ + return pss->nstream; +} + +/* HTS_PStreamSet_get_static_length: get static features length */ +int HTS_PStreamSet_get_static_length(HTS_PStreamSet * pss, int stream_index) +{ + return pss->pstream[stream_index].static_length; +} + +/* HTS_PStreamSet_get_total_frame: get total number of frame */ +int HTS_PStreamSet_get_total_frame(HTS_PStreamSet * pss) +{ + return pss->total_frame; +} + +/* HTS_PStreamSet_get_parameter: get parameter */ +double HTS_PStreamSet_get_parameter(HTS_PStreamSet * pss, + int stream_index, int frame_index, + int vector_index) +{ + return pss->pstream[stream_index].par[frame_index][vector_index]; +} + +/* HTS_PStreamSet_get_parameter_vector: get parameter vector*/ +double *HTS_PStreamSet_get_parameter_vector(HTS_PStreamSet * pss, + int stream_index, int frame_index) +{ + return pss->pstream[stream_index].par[frame_index]; +} + +/* HTS_PStreamSet_get_msd_flag: get generated MSD flag per frame */ +HTS_Boolean HTS_PStreamSet_get_msd_flag(HTS_PStreamSet * pss, + int stream_index, int frame_index) +{ + return pss->pstream[stream_index].msd_flag[frame_index]; +} + +/* HTS_PStreamSet_is_msd: get MSD flag */ +HTS_Boolean HTS_PStreamSet_is_msd(HTS_PStreamSet * pss, int stream_index) +{ + return pss->pstream[stream_index].msd_flag ? TRUE : FALSE; +} + +/* HTS_PStreamSet_clear: free parameter stream set */ +void HTS_PStreamSet_clear(HTS_PStreamSet * pss) +{ + int i, j; + HTS_PStream *pstream; + + if (pss->pstream) { + for (i = 0; i < pss->nstream; i++) { + pstream = &pss->pstream[i]; + HTS_free(pstream->sm.wum); + HTS_free(pstream->sm.g); + HTS_free_matrix(pstream->sm.wuw, pstream->length); + HTS_free_matrix(pstream->sm.ivar, pstream->length); + HTS_free_matrix(pstream->sm.mean, pstream->length); + HTS_free_matrix(pstream->par, pstream->length); + if (pstream->msd_flag) + HTS_free(pstream->msd_flag); + for (j = pstream->win_size - 1; j >= 0; j--) { + pstream->win_coefficient[j] += pstream->win_l_width[j]; + HTS_free(pstream->win_coefficient[j]); + } + if (pstream->gv_mean) + HTS_free(pstream->gv_mean); + if (pstream->gv_vari) + HTS_free(pstream->gv_vari); + HTS_free(pstream->win_coefficient); + HTS_free(pstream->win_l_width); + HTS_free(pstream->win_r_width); + if (pstream->gv_switch) + HTS_free(pstream->gv_switch); + } + HTS_free(pss->pstream); + } + HTS_PStreamSet_initialize(pss); +} + +HTS_PSTREAM_C_END; + +#endif /* !HTS_PSTREAM_C */ diff --git a/src/modules/hts_engine/HTS_sstream.c b/src/modules/hts_engine/HTS_sstream.c new file mode 100644 index 0000000..da8e859 --- /dev/null +++ b/src/modules/hts_engine/HTS_sstream.c @@ -0,0 +1,476 @@ +/* ----------------------------------------------------------------- */ +/* The HMM-Based Speech Synthesis System (HTS) */ +/* hts_engine API developed by HTS Working Group */ +/* http://hts-engine.sourceforge.net/ */ +/* ----------------------------------------------------------------- */ +/* */ +/* Copyright (c) 2001-2010 Nagoya Institute of Technology */ +/* Department of Computer Science */ +/* */ +/* 2001-2008 Tokyo Institute of Technology */ +/* Interdisciplinary Graduate School of */ +/* Science and Engineering */ +/* */ +/* All rights reserved. */ +/* */ +/* Redistribution and use in source and binary forms, with or */ +/* without modification, are permitted provided that the following */ +/* conditions are met: */ +/* */ +/* - Redistributions of source code must retain the above copyright */ +/* notice, this list of conditions and the following disclaimer. */ +/* - Redistributions in binary form must reproduce the above */ +/* copyright notice, this list of conditions and the following */ +/* disclaimer in the documentation and/or other materials provided */ +/* with the distribution. */ +/* - Neither the name of the HTS working group nor the names of its */ +/* contributors may be used to endorse or promote products derived */ +/* from this software without specific prior written permission. */ +/* */ +/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND */ +/* CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, */ +/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */ +/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */ +/* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS */ +/* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, */ +/* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED */ +/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, */ +/* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON */ +/* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, */ +/* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY */ +/* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */ +/* POSSIBILITY OF SUCH DAMAGE. */ +/* ----------------------------------------------------------------- */ + +#ifndef HTS_SSTREAM_C +#define HTS_SSTREAM_C + +#ifdef __cplusplus +#define HTS_SSTREAM_C_START extern "C" { +#define HTS_SSTREAM_C_END } +#else +#define HTS_SSTREAM_C_START +#define HTS_SSTREAM_C_END +#endif /* __CPLUSPLUS */ + +HTS_SSTREAM_C_START; + +/* hts_engine libraries */ +#include "HTS_hidden.h" + +static void HTS_set_duration(int *duration, double *mean, double *vari, + double *remain, int size, double frame_length) +{ + int i; + double temp1, temp2; + double rho = 0.0; + + if (frame_length != 0.0) { /* if frame length is specified, rho is determined */ + temp1 = 0.0; + temp2 = 0.0; + for (i = 0; i < size; i++) { + temp1 += mean[i]; + temp2 += vari[i]; + } + rho = (frame_length - temp1) / temp2; + } + for (i = 0; i < size; i++) { + temp1 = mean[i] + rho * vari[i] + *remain; + duration[i] = (int) (temp1 + 0.5); + if (duration[i] < 1) + duration[i] = 1; + *remain = temp1 - (double) duration[i]; + } +} + +/* HTS_SStreamSet_initialize: initialize state stream set */ +void HTS_SStreamSet_initialize(HTS_SStreamSet * sss) +{ + sss->nstream = 0; + sss->nstate = 0; + sss->sstream = NULL; + sss->duration = NULL; + sss->total_state = 0; + sss->total_frame = 0; +} + +/* HTS_SStreamSet_create: parse label and determine state duration */ +void HTS_SStreamSet_create(HTS_SStreamSet * sss, HTS_ModelSet * ms, + HTS_Label * label, double *duration_iw, + double **parameter_iw, double **gv_iw) +{ + int i, j, k; + double temp1, temp2; + int state; + HTS_SStream *sst; + double *duration_mean, *duration_vari; + double duration_remain; + double frame_length; + int next_time; + int next_state; + + /* initialize state sequence */ + sss->nstate = HTS_ModelSet_get_nstate(ms); + sss->nstream = HTS_ModelSet_get_nstream(ms); + sss->total_frame = 0; + sss->total_state = HTS_Label_get_size(label) * sss->nstate; + sss->duration = (int *) HTS_calloc(sss->total_state, sizeof(int)); + sss->sstream = (HTS_SStream *) HTS_calloc(sss->nstream, sizeof(HTS_SStream)); + for (i = 0; i < sss->nstream; i++) { + sst = &sss->sstream[i]; + sst->vector_length = HTS_ModelSet_get_vector_length(ms, i); + sst->mean = (double **) HTS_calloc(sss->total_state, sizeof(double *)); + sst->vari = (double **) HTS_calloc(sss->total_state, sizeof(double *)); + if (HTS_ModelSet_is_msd(ms, i)) + sst->msd = (double *) HTS_calloc(sss->total_state, sizeof(double)); + else + sst->msd = NULL; + for (j = 0; j < sss->total_state; j++) { + sst->mean[j] = + (double *) HTS_calloc(sst->vector_length, sizeof(double)); + sst->vari[j] = + (double *) HTS_calloc(sst->vector_length, sizeof(double)); + } + sst->gv_switch = + (HTS_Boolean *) HTS_calloc(sss->total_state, sizeof(HTS_Boolean)); + for (j = 0; j < sss->total_state; j++) + sst->gv_switch[j] = TRUE; + } + + /* check interpolation weights */ + for (i = 0, temp1 = 0.0; + i < HTS_ModelSet_get_duration_interpolation_size(ms); i++) + temp1 += duration_iw[i]; + for (i = 0; i < HTS_ModelSet_get_duration_interpolation_size(ms); i++) + if (duration_iw[i] != 0.0) + duration_iw[i] /= temp1; + for (i = 0; i < sss->nstream; i++) { + for (j = 0, temp1 = 0.0; + j < HTS_ModelSet_get_parameter_interpolation_size(ms, i); j++) + temp1 += parameter_iw[i][j]; + for (j = 0; j < HTS_ModelSet_get_parameter_interpolation_size(ms, i); j++) + if (parameter_iw[i][j] != 0.0) + parameter_iw[i][j] /= temp1; + if (HTS_ModelSet_use_gv(ms, i)) { + for (j = 0, temp1 = 0.0; + j < HTS_ModelSet_get_gv_interpolation_size(ms, i); j++) + temp1 += gv_iw[i][j]; + for (j = 0; j < HTS_ModelSet_get_gv_interpolation_size(ms, i); j++) + if (gv_iw[i][j] != 0.0) + gv_iw[i][j] /= temp1; + } + } + + /* determine state duration */ + duration_mean = + (double *) HTS_calloc(sss->nstate * HTS_Label_get_size(label), + sizeof(double)); + duration_vari = + (double *) HTS_calloc(sss->nstate * HTS_Label_get_size(label), + sizeof(double)); + duration_remain = 0.0; + for (i = 0; i < HTS_Label_get_size(label); i++) + HTS_ModelSet_get_duration(ms, HTS_Label_get_string(label, i), + &duration_mean[i * sss->nstate], + &duration_vari[i * sss->nstate], duration_iw); + if (HTS_Label_get_frame_specified_flag(label)) { + /* use duration set by user */ + next_time = 0; + next_state = 0; + state = 0; + for (i = 0; i < HTS_Label_get_size(label); i++) { + temp1 = HTS_Label_get_start_frame(label, i); + temp2 = HTS_Label_get_end_frame(label, i); + if (temp2 >= 0) { + HTS_set_duration(&sss->duration[next_state], + &duration_mean[next_state], + &duration_vari[next_state], &duration_remain, + state + sss->nstate - next_state, + temp2 - next_time); + for (j = next_state; j < state + sss->nstate; j++) + next_time += sss->duration[j]; + next_state = state + sss->nstate; + } else if (i + 1 == HTS_Label_get_size(label)) { + HTS_set_duration(&sss->duration[next_state], + &duration_mean[next_state], + &duration_vari[next_state], &duration_remain, + state + sss->nstate - next_state, 0.0); + } + state += sss->nstate; + } + } else { + /* determine frame length */ + if (HTS_Label_get_speech_speed(label) != 1.0) { + temp1 = 0.0; + for (i = 0; i < HTS_Label_get_size(label) * sss->nstate; i++) { + temp1 += duration_mean[i]; + } + frame_length = temp1 / HTS_Label_get_speech_speed(label); + } else { + frame_length = 0.0; + } + /* set state duration */ + HTS_set_duration(sss->duration, duration_mean, duration_vari, + &duration_remain, + HTS_Label_get_size(label) * sss->nstate, frame_length); + } + HTS_free(duration_mean); + HTS_free(duration_vari); + + /* get parameter */ + for (i = 0, state = 0; i < HTS_Label_get_size(label); i++) { + for (j = 2; j <= sss->nstate + 1; j++) { + sss->total_frame += sss->duration[state]; + for (k = 0; k < sss->nstream; k++) { + sst = &sss->sstream[k]; + if (sst->msd) + HTS_ModelSet_get_parameter(ms, HTS_Label_get_string(label, i), + sst->mean[state], sst->vari[state], + &sst->msd[state], k, j, + parameter_iw[k]); + else + HTS_ModelSet_get_parameter(ms, HTS_Label_get_string(label, i), + sst->mean[state], sst->vari[state], + NULL, k, j, parameter_iw[k]); + } + state++; + } + } + + /* copy dynamic window */ + for (i = 0; i < sss->nstream; i++) { + sst = &sss->sstream[i]; + sst->win_size = HTS_ModelSet_get_window_size(ms, i); + sst->win_max_width = HTS_ModelSet_get_window_max_width(ms, i); + sst->win_l_width = (int *) HTS_calloc(sst->win_size, sizeof(int)); + sst->win_r_width = (int *) HTS_calloc(sst->win_size, sizeof(int)); + sst->win_coefficient = + (double **) HTS_calloc(sst->win_size, sizeof(double)); + for (j = 0; j < sst->win_size; j++) { + sst->win_l_width[j] = HTS_ModelSet_get_window_left_width(ms, i, j); + sst->win_r_width[j] = HTS_ModelSet_get_window_right_width(ms, i, j); + if (sst->win_l_width[j] + sst->win_r_width[j] == 0) + sst->win_coefficient[j] = + (double *) HTS_calloc(-2 * sst->win_l_width[j] + 1, + sizeof(double)); + else + sst->win_coefficient[j] = + (double *) HTS_calloc(-2 * sst->win_l_width[j], sizeof(double)); + sst->win_coefficient[j] -= sst->win_l_width[j]; + for (k = sst->win_l_width[j]; k <= sst->win_r_width[j]; k++) + sst->win_coefficient[j][k] = + HTS_ModelSet_get_window_coefficient(ms, i, j, k); + } + } + + /* determine GV */ + for (i = 0; i < sss->nstream; i++) { + sst = &sss->sstream[i]; + if (HTS_ModelSet_use_gv(ms, i)) { + sst->gv_mean = + (double *) HTS_calloc(sst->vector_length / sst->win_size, + sizeof(double)); + sst->gv_vari = + (double *) HTS_calloc(sst->vector_length / sst->win_size, + sizeof(double)); + HTS_ModelSet_get_gv(ms, HTS_Label_get_string(label, 0), sst->gv_mean, + sst->gv_vari, i, gv_iw[i]); + } else { + sst->gv_mean = NULL; + sst->gv_vari = NULL; + } + } + + if (HTS_ModelSet_have_gv_switch(ms) == TRUE) + for (i = 0; i < HTS_Label_get_size(label); i++) + if (HTS_ModelSet_get_gv_switch(ms, HTS_Label_get_string(label, i)) == + FALSE) + for (j = 0; j < sss->nstream; j++) + for (k = 0; k < sss->nstate; k++) + sss->sstream[j].gv_switch[i * sss->nstate + k] = FALSE; +} + +/* HTS_SStreamSet_get_nstream: get number of stream */ +int HTS_SStreamSet_get_nstream(HTS_SStreamSet * sss) +{ + return sss->nstream; +} + +/* HTS_SStreamSet_get_vector_length: get vector length */ +int HTS_SStreamSet_get_vector_length(HTS_SStreamSet * sss, int stream_index) +{ + return sss->sstream[stream_index].vector_length; +} + +/* HTS_SStreamSet_is_msd: get MSD flag */ +HTS_Boolean HTS_SStreamSet_is_msd(HTS_SStreamSet * sss, int stream_index) +{ + return sss->sstream[stream_index].msd ? TRUE : FALSE; +} + +/* HTS_SStreamSet_get_total_state: get total number of state */ +int HTS_SStreamSet_get_total_state(HTS_SStreamSet * sss) +{ + return sss->total_state; +} + +/* HTS_SStreamSet_get_total_frame: get total number of frame */ +int HTS_SStreamSet_get_total_frame(HTS_SStreamSet * sss) +{ + return sss->total_frame; +} + +/* HTS_SStreamSet_get_msd: get MSD parameter */ +double HTS_SStreamSet_get_msd(HTS_SStreamSet * sss, int stream_index, + int state_index) +{ + return sss->sstream[stream_index].msd[state_index]; +} + +/* HTS_SStreamSet_window_size: get dynamic window size */ +int HTS_SStreamSet_get_window_size(HTS_SStreamSet * sss, int stream_index) +{ + return sss->sstream[stream_index].win_size; +} + +/* HTS_SStreamSet_get_window_left_width: get left width of dynamic window */ +int HTS_SStreamSet_get_window_left_width(HTS_SStreamSet * sss, + int stream_index, int window_index) +{ + return sss->sstream[stream_index].win_l_width[window_index]; +} + +/* HTS_SStreamSet_get_winodow_right_width: get right width of dynamic window */ +int HTS_SStreamSet_get_window_right_width(HTS_SStreamSet * sss, + int stream_index, int window_index) +{ + return sss->sstream[stream_index].win_r_width[window_index]; +} + +/* HTS_SStreamSet_get_window_coefficient: get coefficient of dynamic window */ +double HTS_SStreamSet_get_window_coefficient(HTS_SStreamSet * sss, + int stream_index, + int window_index, + int coefficient_index) +{ + return sss->sstream[stream_index]. + win_coefficient[window_index][coefficient_index]; +} + +/* HTS_SStreamSet_get_window_max_width: get max width of dynamic window */ +int HTS_SStreamSet_get_window_max_width(HTS_SStreamSet * sss, int stream_index) +{ + return sss->sstream[stream_index].win_max_width; +} + +/* HTS_SStreamSet_use_gv: get GV flag */ +HTS_Boolean HTS_SStreamSet_use_gv(HTS_SStreamSet * sss, int stream_index) +{ + return sss->sstream[stream_index].gv_mean ? TRUE : FALSE; +} + +/* HTS_SStreamSet_get_duration: get state duration */ +int HTS_SStreamSet_get_duration(HTS_SStreamSet * sss, int state_index) +{ + return sss->duration[state_index]; +} + +/* HTS_SStreamSet_get_mean: get mean parameter */ +double HTS_SStreamSet_get_mean(HTS_SStreamSet * sss, + int stream_index, + int state_index, int vector_index) +{ + return sss->sstream[stream_index].mean[state_index][vector_index]; +} + +/* HTS_SStreamSet_set_mean: set mean parameter */ +void HTS_SStreamSet_set_mean(HTS_SStreamSet * sss, int stream_index, + int state_index, int vector_index, double f) +{ + sss->sstream[stream_index].mean[state_index][vector_index] = f; +} + +/* HTS_SStreamSet_get_vari: get variance parameter */ +double HTS_SStreamSet_get_vari(HTS_SStreamSet * sss, + int stream_index, + int state_index, int vector_index) +{ + return sss->sstream[stream_index].vari[state_index][vector_index]; +} + +/* HTS_SStreamSet_set_vari: set variance parameter */ +void HTS_SStreamSet_set_vari(HTS_SStreamSet * sss, int stream_index, + int state_index, int vector_index, double f) +{ + sss->sstream[stream_index].vari[state_index][vector_index] = f; +} + +/* HTS_SStreamSet_get_gv_mean: get GV mean parameter */ +double HTS_SStreamSet_get_gv_mean(HTS_SStreamSet * sss, + int stream_index, int vector_index) +{ + return sss->sstream[stream_index].gv_mean[vector_index]; +} + +/* HTS_SStreamSet_get_gv_mean: get GV variance parameter */ +double HTS_SStreamSet_get_gv_vari(HTS_SStreamSet * sss, + int stream_index, int vector_index) +{ + return sss->sstream[stream_index].gv_vari[vector_index]; +} + +/* HTS_SStreamSet_set_gv_switch: set GV switch */ +void HTS_SStreamSet_set_gv_switch(HTS_SStreamSet * sss, int stream_index, + int state_index, HTS_Boolean i) +{ + sss->sstream[stream_index].gv_switch[state_index] = i; +} + +/* HTS_SStreamSet_get_gv_switch: get GV switch */ +HTS_Boolean HTS_SStreamSet_get_gv_switch(HTS_SStreamSet * sss, int stream_index, + int state_index) +{ + return sss->sstream[stream_index].gv_switch[state_index]; +} + +/* HTS_SStreamSet_clear: free state stream set */ +void HTS_SStreamSet_clear(HTS_SStreamSet * sss) +{ + int i, j; + HTS_SStream *sst; + + if (sss->sstream) { + for (i = 0; i < sss->nstream; i++) { + sst = &sss->sstream[i]; + for (j = 0; j < sss->total_state; j++) { + HTS_free(sst->mean[j]); + HTS_free(sst->vari[j]); + } + if (sst->msd) + HTS_free(sst->msd); + HTS_free(sst->mean); + HTS_free(sst->vari); + for (j = sst->win_size - 1; j >= 0; j--) { + sst->win_coefficient[j] += sst->win_l_width[j]; + HTS_free(sst->win_coefficient[j]); + } + HTS_free(sst->win_coefficient); + HTS_free(sst->win_l_width); + HTS_free(sst->win_r_width); + if (sst->gv_mean) + HTS_free(sst->gv_mean); + if (sst->gv_vari) + HTS_free(sst->gv_vari); + HTS_free(sst->gv_switch); + } + HTS_free(sss->sstream); + } + if (sss->duration) + HTS_free(sss->duration); + + HTS_SStreamSet_initialize(sss); +} + +HTS_SSTREAM_C_END; + +#endif /* !HTS_SSTREAM_C */ diff --git a/src/modules/hts_engine/HTS_vocoder.c b/src/modules/hts_engine/HTS_vocoder.c new file mode 100644 index 0000000..bd6cdd6 --- /dev/null +++ b/src/modules/hts_engine/HTS_vocoder.c @@ -0,0 +1,856 @@ +/* ----------------------------------------------------------------- */ +/* The HMM-Based Speech Synthesis System (HTS) */ +/* hts_engine API developed by HTS Working Group */ +/* http://hts-engine.sourceforge.net/ */ +/* ----------------------------------------------------------------- */ +/* */ +/* Copyright (c) 2001-2010 Nagoya Institute of Technology */ +/* Department of Computer Science */ +/* */ +/* 2001-2008 Tokyo Institute of Technology */ +/* Interdisciplinary Graduate School of */ +/* Science and Engineering */ +/* */ +/* All rights reserved. */ +/* */ +/* Redistribution and use in source and binary forms, with or */ +/* without modification, are permitted provided that the following */ +/* conditions are met: */ +/* */ +/* - Redistributions of source code must retain the above copyright */ +/* notice, this list of conditions and the following disclaimer. */ +/* - Redistributions in binary form must reproduce the above */ +/* copyright notice, this list of conditions and the following */ +/* disclaimer in the documentation and/or other materials provided */ +/* with the distribution. */ +/* - Neither the name of the HTS working group nor the names of its */ +/* contributors may be used to endorse or promote products derived */ +/* from this software without specific prior written permission. */ +/* */ +/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND */ +/* CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, */ +/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */ +/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */ +/* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS */ +/* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, */ +/* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED */ +/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, */ +/* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON */ +/* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, */ +/* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY */ +/* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */ +/* POSSIBILITY OF SUCH DAMAGE. */ +/* ----------------------------------------------------------------- */ + +#ifndef HTS_VOCODER_C +#define HTS_VOCODER_C + +#ifdef __cplusplus +#define HTS_VOCODER_C_START extern "C" { +#define HTS_VOCODER_C_END } +#else +#define HTS_VOCODER_C_START +#define HTS_VOCODER_C_END +#endif /* __CPLUSPLUS */ + +HTS_VOCODER_C_START; + +#include /* for sqrt(),log(),exp(),pow(),cos() */ + +/* hts_engine libraries */ +#include "HTS_hidden.h" + +/* HTS_movem: move memory */ +static void HTS_movem(double *a, double *b, const int nitem) +{ + long i = (long) nitem; + + if (a > b) + while (i--) + *b++ = *a++; + else { + a += i; + b += i; + while (i--) + *--b = *--a; + } +} + +/* HTS_mlsafir: sub functions for MLSA filter */ +static double HTS_mlsafir(const double x, double *b, const int m, + const double a, const double aa, double *d) +{ + double y = 0.0; + int i; + + d[0] = x; + d[1] = aa * d[0] + a * d[1]; + + for (i = 2; i <= m; i++) + d[i] += a * (d[i + 1] - d[i - 1]); + + for (i = 2; i <= m; i++) + y += d[i] * b[i]; + + for (i = m + 1; i > 1; i--) + d[i] = d[i - 1]; + + return (y); +} + +/* HTS_mlsadf1: sub functions for MLSA filter */ +static double HTS_mlsadf1(double x, double *b, const int m, + const double a, const double aa, + const int pd, double *d, const double *ppade) +{ + double v, out = 0.0, *pt; + int i; + + pt = &d[pd + 1]; + + for (i = pd; i >= 1; i--) { + d[i] = aa * pt[i - 1] + a * d[i]; + pt[i] = d[i] * b[1]; + v = pt[i] * ppade[i]; + x += (1 & i) ? v : -v; + out += v; + } + + pt[0] = x; + out += x; + + return (out); +} + +/* HTS_mlsadf2: sub functions for MLSA filter */ +static double HTS_mlsadf2(double x, double *b, const int m, + const double a, const double aa, + const int pd, double *d, const double *ppade) +{ + double v, out = 0.0, *pt; + int i; + + pt = &d[pd * (m + 2)]; + + for (i = pd; i >= 1; i--) { + pt[i] = HTS_mlsafir(pt[i - 1], b, m, a, aa, &d[(i - 1) * (m + 2)]); + v = pt[i] * ppade[i]; + + x += (1 & i) ? v : -v; + out += v; + } + + pt[0] = x; + out += x; + + return (out); +} + +/* HTS_mlsadf: functions for MLSA filter */ +static double HTS_mlsadf(double x, double *b, int m, const double a, + const int pd, double *d, double *pade) +{ + const double aa = 1 - a * a; + const double *ppade = &(pade[pd * (pd + 1) / 2]); + + x = HTS_mlsadf1(x, b, m, a, aa, pd, d, ppade); + x = HTS_mlsadf2(x, b, m, a, aa, pd, &d[2 * (pd + 1)], ppade); + + return (x); +} + +/* HTS_rnd: functions for random noise generation */ +static double HTS_rnd(unsigned long *next) +{ + double r; + + *next = *next * 1103515245L + 12345; + r = (*next / 65536L) % 32768L; + + return (r / RANDMAX); +} + +/* HTS_nrandom: functions for gaussian random noise generation */ +static double HTS_nrandom(HTS_Vocoder * v) +{ + if (v->sw == 0) { + v->sw = 1; + do { + v->r1 = 2 * HTS_rnd(&v->next) - 1; + v->r2 = 2 * HTS_rnd(&v->next) - 1; + v->s = v->r1 * v->r1 + v->r2 * v->r2; + } while (v->s > 1 || v->s == 0); + v->s = sqrt(-2 * log(v->s) / v->s); + return (v->r1 * v->s); + } else { + v->sw = 0; + return (v->r2 * v->s); + } +} + +/* HTS_srnd: functions for gaussian random noise generation */ +static unsigned long HTS_srnd(unsigned long seed) +{ + return (seed); +} + +/* HTS_mceq: function for M-sequence random noise generation */ +static int HTS_mseq(HTS_Vocoder * v) +{ + int x0, x28; + + v->x >>= 1; + if (v->x & B0) + x0 = 1; + else + x0 = -1; + if (v->x & B28) + x28 = 1; + else + x28 = -1; + if (x0 + x28) + v->x &= B31_; + else + v->x |= B31; + + return (x0); +} + +/* HTS_mc2b: transform mel-cepstrum to MLSA digital fillter coefficients */ +static void HTS_mc2b(double *mc, double *b, int m, const double a) +{ + if (mc != b) { + if (a != 0.0) { + b[m] = mc[m]; + for (m--; m >= 0; m--) + b[m] = mc[m] - a * b[m + 1]; + } else + HTS_movem(mc, b, m + 1); + } else if (a != 0.0) + for (m--; m >= 0; m--) + b[m] -= a * b[m + 1]; +} + +/* HTS_b2bc: transform MLSA digital filter coefficients to mel-cepstrum */ +static void HTS_b2mc(double *b, double *mc, int m, const double a) +{ + double d, o; + + d = mc[m] = b[m]; + for (m--; m >= 0; m--) { + o = b[m] + a * d; + d = b[m]; + mc[m] = o; + } +} + +/* HTS_freqt: frequency transformation */ +static void HTS_freqt(HTS_Vocoder * v, double *c1, const int m1, + double *c2, const int m2, const double a) +{ + int i, j; + const double b = 1 - a * a; + double *g; + + if (m2 > v->freqt_size) { + if (v->freqt_buff != NULL) + HTS_free(v->freqt_buff); + v->freqt_buff = (double *) HTS_calloc(m2 + m2 + 2, sizeof(double)); + v->freqt_size = m2; + } + g = v->freqt_buff + v->freqt_size + 1; + + for (i = 0; i < m2 + 1; i++) + g[i] = 0.0; + + for (i = -m1; i <= 0; i++) { + if (0 <= m2) + g[0] = c1[-i] + a * (v->freqt_buff[0] = g[0]); + if (1 <= m2) + g[1] = b * v->freqt_buff[0] + a * (v->freqt_buff[1] = g[1]); + for (j = 2; j <= m2; j++) + g[j] = v->freqt_buff[j - 1] + + a * ((v->freqt_buff[j] = g[j]) - g[j - 1]); + } + + HTS_movem(g, c2, m2 + 1); +} + +/* HTS_c2ir: The minimum phase impulse response is evaluated from the minimum phase cepstrum */ +static void HTS_c2ir(double *c, const int nc, double *h, const int leng) +{ + int n, k, upl; + double d; + + h[0] = exp(c[0]); + for (n = 1; n < leng; n++) { + d = 0; + upl = (n >= nc) ? nc - 1 : n; + for (k = 1; k <= upl; k++) + d += k * c[k] * h[n - k]; + h[n] = d / n; + } +} + +/* HTS_b2en: calculate frame energy */ +static double HTS_b2en(HTS_Vocoder * v, double *b, const int m, const double a) +{ + int i; + double en = 0.0; + double *cep; + double *ir; + + if (v->spectrum2en_size < m) { + if (v->spectrum2en_buff != NULL) + HTS_free(v->spectrum2en_buff); + v->spectrum2en_buff = + (double *) HTS_calloc((m + 1) + 2 * IRLENG, sizeof(double)); + v->spectrum2en_size = m; + } + cep = v->spectrum2en_buff + m + 1; + ir = cep + IRLENG; + + HTS_b2mc(b, v->spectrum2en_buff, m, a); + HTS_freqt(v, v->spectrum2en_buff, m, cep, IRLENG - 1, -a); + HTS_c2ir(cep, IRLENG, ir, IRLENG); + + for (i = 0; i < IRLENG; i++) + en += ir[i] * ir[i]; + + return (en); +} + +/* HTS_ignorm: inverse gain normalization */ +static void HTS_ignorm(double *c1, double *c2, int m, const double g) +{ + double k; + if (g != 0.0) { + k = pow(c1[0], g); + for (; m >= 1; m--) + c2[m] = k * c1[m]; + c2[0] = (k - 1.0) / g; + } else { + HTS_movem(&c1[1], &c2[1], m); + c2[0] = log(c1[0]); + } +} + +/* HTS_gnorm: gain normalization */ +static void HTS_gnorm(double *c1, double *c2, int m, const double g) +{ + double k; + if (g != 0.0) { + k = 1.0 + g * c1[0]; + for (; m >= 1; m--) + c2[m] = c1[m] / k; + c2[0] = pow(k, 1.0 / g); + } else { + HTS_movem(&c1[1], &c2[1], m); + c2[0] = exp(c1[0]); + } +} + +/* HTS_lsp2lpc: transform LSP to LPC */ +static void HTS_lsp2lpc(HTS_Vocoder * v, double *lsp, double *a, int m) +{ + int i, k, mh1, mh2, flag_odd; + double xx, xf, xff; + double *p, *q; + double *a0, *a1, *a2, *b0, *b1, *b2; + + flag_odd = 0; + if (m % 2 == 0) + mh1 = mh2 = m / 2; + else { + mh1 = (m + 1) / 2; + mh2 = (m - 1) / 2; + flag_odd = 1; + } + + if (m > v->lsp2lpc_size) { + if (v->lsp2lpc_buff != NULL) + HTS_free(v->lsp2lpc_buff); + v->lsp2lpc_buff = (double *) HTS_calloc(5 * m + 6, sizeof(double)); + v->lsp2lpc_size = m; + } + p = v->lsp2lpc_buff + m; + q = p + mh1; + a0 = q + mh2; + a1 = a0 + (mh1 + 1); + a2 = a1 + (mh1 + 1); + b0 = a2 + (mh1 + 1); + b1 = b0 + (mh2 + 1); + b2 = b1 + (mh2 + 1); + + HTS_movem(lsp, v->lsp2lpc_buff, m); + + for (i = 0; i < mh1 + 1; i++) + a0[i] = 0.0; + for (i = 0; i < mh1 + 1; i++) + a1[i] = 0.0; + for (i = 0; i < mh1 + 1; i++) + a2[i] = 0.0; + for (i = 0; i < mh2 + 1; i++) + b0[i] = 0.0; + for (i = 0; i < mh2 + 1; i++) + b1[i] = 0.0; + for (i = 0; i < mh2 + 1; i++) + b2[i] = 0.0; + + /* lsp filter parameters */ + for (i = k = 0; i < mh1; i++, k += 2) + p[i] = -2.0 * cos(v->lsp2lpc_buff[k]); + for (i = k = 0; i < mh2; i++, k += 2) + q[i] = -2.0 * cos(v->lsp2lpc_buff[k + 1]); + + /* impulse response of analysis filter */ + xx = 1.0; + xf = xff = 0.0; + + for (k = 0; k <= m; k++) { + if (flag_odd) { + a0[0] = xx; + b0[0] = xx - xff; + xff = xf; + xf = xx; + } else { + a0[0] = xx + xf; + b0[0] = xx - xf; + xf = xx; + } + + for (i = 0; i < mh1; i++) { + a0[i + 1] = a0[i] + p[i] * a1[i] + a2[i]; + a2[i] = a1[i]; + a1[i] = a0[i]; + } + + for (i = 0; i < mh2; i++) { + b0[i + 1] = b0[i] + q[i] * b1[i] + b2[i]; + b2[i] = b1[i]; + b1[i] = b0[i]; + } + + if (k != 0) + a[k - 1] = -0.5 * (a0[mh1] + b0[mh2]); + xx = 0.0; + } + + for (i = m - 1; i >= 0; i--) + a[i + 1] = -a[i]; + a[0] = 1.0; +} + +/* HTS_gc2gc: generalized cepstral transformation */ +static void HTS_gc2gc(HTS_Vocoder * v, double *c1, const int m1, + const double g1, double *c2, const int m2, + const double g2) +{ + int i, min, k, mk; + double ss1, ss2, cc; + + if (m1 > v->gc2gc_size) { + if (v->gc2gc_buff != NULL) + HTS_free(v->gc2gc_buff); + v->gc2gc_buff = (double *) HTS_calloc(m1 + 1, sizeof(double)); + v->gc2gc_size = m1; + } + + HTS_movem(c1, v->gc2gc_buff, m1 + 1); + + c2[0] = v->gc2gc_buff[0]; + for (i = 1; i <= m2; i++) { + ss1 = ss2 = 0.0; + min = m1 < i ? m1 : i - 1; + for (k = 1; k <= min; k++) { + mk = i - k; + cc = v->gc2gc_buff[k] * c2[mk]; + ss2 += k * cc; + ss1 += mk * cc; + } + + if (i <= m1) + c2[i] = v->gc2gc_buff[i] + (g2 * ss2 - g1 * ss1) / i; + else + c2[i] = (g2 * ss2 - g1 * ss1) / i; + } +} + +/* HTS_mgc2mgc: frequency and generalized cepstral transformation */ +static void HTS_mgc2mgc(HTS_Vocoder * v, double *c1, int m1, + const double a1, double g1, double *c2, int m2, + const double a2, double g2) +{ + double a; + + if (a1 == a2) { + HTS_gnorm(c1, c1, m1, g1); + HTS_gc2gc(v, c1, m1, g1, c2, m2, g2); + HTS_ignorm(c2, c2, m2, g2); + } else { + a = (a2 - a1) / (1 - a1 * a2); + HTS_freqt(v, c1, m1, c2, m2, a); + HTS_gnorm(c2, c2, m2, g1); + HTS_gc2gc(v, c2, m2, g1, c2, m2, g2); + HTS_ignorm(c2, c2, m2, g2); + } +} + +/* HTS_lsp2mgc: transform LSP to MGC */ +static void HTS_lsp2mgc(HTS_Vocoder * v, double *lsp, double *mgc, + const int m, const double alpha) +{ + int i; + /* lsp2lpc */ + HTS_lsp2lpc(v, lsp + 1, mgc, m); + if (v->use_log_gain) + mgc[0] = exp(lsp[0]); + else + mgc[0] = lsp[0]; + + /* mgc2mgc */ + if (NORMFLG1) + HTS_ignorm(mgc, mgc, m, v->gamma); + else if (MULGFLG1) + mgc[0] = (1.0 - mgc[0]) * v->stage; + if (MULGFLG1) + for (i = m; i >= 1; i--) + mgc[i] *= -v->stage; + HTS_mgc2mgc(v, mgc, m, alpha, v->gamma, mgc, m, alpha, v->gamma); + if (NORMFLG2) + HTS_gnorm(mgc, mgc, m, v->gamma); + else if (MULGFLG2) + mgc[0] = mgc[0] * v->gamma + 1.0; + if (MULGFLG2) + for (i = m; i >= 1; i--) + mgc[i] *= v->gamma; +} + +/* HTS_mglsadff: sub functions for MGLSA filter */ +static double HTS_mglsadff(double x, double *b, const int m, const double a, + double *d) +{ + int i; + + double y; + y = d[0] * b[1]; + for (i = 1; i < m; i++) { + d[i] += a * (d[i + 1] - d[i - 1]); + y += d[i] * b[i + 1]; + } + x -= y; + + for (i = m; i > 0; i--) + d[i] = d[i - 1]; + d[0] = a * d[0] + (1 - a * a) * x; + return x; +} + +/* HTS_mglsadf: sub functions for MGLSA filter */ +static double HTS_mglsadf(double x, double *b, const int m, const double a, + int n, double *d) +{ + int i; + + for (i = 0; i < n; i++) + x = HTS_mglsadff(x, b, m, a, &d[i * (m + 1)]); + + return x; +} + +static double HTS_white_noise(HTS_Vocoder * v) +{ + if (v->gauss) + return (double) HTS_nrandom(v); + else + return HTS_mseq(v); +} + +static void HTS_Vocoder_initialize_excitation(HTS_Vocoder * v) +{ + v->p1 = 0.0; + v->pc = 0.0; + v->p = 0.0; + v->inc = 0.0; +} + +static void HTS_Vocoder_start_excitation(HTS_Vocoder * v, double pitch) +{ + if (v->p1 != 0.0 && pitch != 0.0) + v->inc = (pitch - v->p1) * v->iprd / v->fprd; + else { + v->inc = 0.0; + v->p1 = 0.0; + v->pc = pitch; + } + v->p = pitch; +} + +static double HTS_Vocoder_get_excitation(HTS_Vocoder * v, int fprd_index, + int iprd_index) +{ + double x; + + if (v->p1 == 0.0) + x = HTS_white_noise(v); + else { + if ((v->pc += 1.0) >= v->p1) { + x = sqrt(v->p1); + v->pc -= v->p1; + } else + x = 0.0; + } + if (!--iprd_index) + v->p1 += v->inc; + return x; +} + +static void HTS_Vocoder_end_excitation(HTS_Vocoder * v) +{ + v->p1 = v->p; +} + +/* HTS_Vocoder_initialize: initialize vocoder */ +void HTS_Vocoder_initialize(HTS_Vocoder * v, const int m, const int stage, + HTS_Boolean use_log_gain, const int rate, + const int fperiod, int buff_size) +{ + /* set parameter */ + v->stage = stage; + if (stage != 0) + v->gamma = -1.0 / v->stage; + else + v->gamma = 0.0; + v->use_log_gain = use_log_gain; + v->fprd = fperiod; + v->iprd = IPERIOD; + v->seed = SEED; + v->next = SEED; + v->gauss = GAUSS; + v->rate = rate; + v->p1 = -1.0; + v->sw = 0; + v->x = 0x55555555; + /* open audio device */ + if (0 < buff_size && buff_size <= 48000) { + v->audio = (HTS_Audio *) HTS_calloc(1, sizeof(HTS_Audio)); + HTS_Audio_open(v->audio, rate, buff_size); + } else + v->audio = NULL; + /* init buffer */ + v->freqt_buff = NULL; + v->freqt_size = 0; + v->gc2gc_buff = NULL; + v->gc2gc_size = 0; + v->lsp2lpc_buff = NULL; + v->lsp2lpc_size = 0; + v->postfilter_buff = NULL; + v->postfilter_size = 0; + v->spectrum2en_buff = NULL; + v->spectrum2en_size = 0; + v->pade = NULL; + if (v->stage == 0) { /* for MCP */ + v->c = + (double *) HTS_calloc(m * (3 + PADEORDER) + 5 * PADEORDER + 6, + sizeof(double)); + v->cc = v->c + m + 1; + v->cinc = v->cc + m + 1; + v->d1 = v->cinc + m + 1; + v->pade = (double *) HTS_calloc(21, sizeof(double)); + v->pade[0] = 1.00000000000; + v->pade[1] = 1.00000000000; + v->pade[2] = 0.00000000000; + v->pade[3] = 1.00000000000; + v->pade[4] = 0.00000000000; + v->pade[5] = 0.00000000000; + v->pade[6] = 1.00000000000; + v->pade[7] = 0.00000000000; + v->pade[8] = 0.00000000000; + v->pade[9] = 0.00000000000; + v->pade[10] = 1.00000000000; + v->pade[11] = 0.49992730000; + v->pade[12] = 0.10670050000; + v->pade[13] = 0.01170221000; + v->pade[14] = 0.00056562790; + v->pade[15] = 1.00000000000; + v->pade[16] = 0.49993910000; + v->pade[17] = 0.11070980000; + v->pade[18] = 0.01369984000; + v->pade[19] = 0.00095648530; + v->pade[20] = 0.00003041721; + } else { /* for LSP */ + v->c = (double *) HTS_calloc((m + 1) * (v->stage + 3), sizeof(double)); + v->cc = v->c + m + 1; + v->cinc = v->cc + m + 1; + v->d1 = v->cinc + m + 1; + } +} + +/* HTS_Vocoder_synthesize: pulse/noise excitation and MLSA/MGLSA filster based waveform synthesis */ +void HTS_Vocoder_synthesize(HTS_Vocoder * v, const int m, double lf0, + double *spectrum, double alpha, double beta, + double volume, short *rawdata) +{ + double x; + int i, j; + short xs; + int rawidx = 0; + double p; + + /* lf0 -> pitch */ + if (lf0 == LZERO) + p = 0.0; + else + p = v->rate / exp(lf0); + + /* first time */ + if (v->p1 < 0.0) { + if (v->gauss & (v->seed != 1)) + v->next = HTS_srnd((unsigned) v->seed); + HTS_Vocoder_initialize_excitation(v); + if (v->stage == 0) { /* for MCP */ + HTS_mc2b(spectrum, v->c, m, alpha); + } else { /* for LSP */ + if (v->use_log_gain) + v->c[0] = LZERO; + else + v->c[0] = ZERO; + for (i = 1; i <= m; i++) + v->c[i] = i * PI / (m + 1); + HTS_lsp2mgc(v, v->c, v->c, m, alpha); + HTS_mc2b(v->c, v->c, m, alpha); + HTS_gnorm(v->c, v->c, m, v->gamma); + for (i = 1; i <= m; i++) + v->c[i] *= v->gamma; + } + } + + HTS_Vocoder_start_excitation(v, p); + if (v->stage == 0) { /* for MCP */ + HTS_Vocoder_postfilter_mcp(v, spectrum, m, alpha, beta); + HTS_mc2b(spectrum, v->cc, m, alpha); + for (i = 0; i <= m; i++) + v->cinc[i] = (v->cc[i] - v->c[i]) * v->iprd / v->fprd; + } else { /* for LSP */ + HTS_lsp2mgc(v, spectrum, v->cc, m, alpha); + HTS_mc2b(v->cc, v->cc, m, alpha); + HTS_gnorm(v->cc, v->cc, m, v->gamma); + for (i = 1; i <= m; i++) + v->cc[i] *= v->gamma; + for (i = 0; i <= m; i++) + v->cinc[i] = (v->cc[i] - v->c[i]) * v->iprd / v->fprd; + } + + for (j = 0, i = (v->iprd + 1) / 2; j < v->fprd; j++) { + x = HTS_Vocoder_get_excitation(v, j, i); + if (v->stage == 0) { /* for MCP */ + if (x != 0.0) + x *= exp(v->c[0]); + x = HTS_mlsadf(x, v->c, m, alpha, PADEORDER, v->d1, v->pade); + } else { /* for LSP */ + if (!NGAIN) + x *= v->c[0]; + x = HTS_mglsadf(x, v->c, m, alpha, v->stage, v->d1); + } + x *= volume; + + /* output */ + if (x > 32767.0) + xs = 32767; + else if (x < -32768.0) + xs = -32768; + else + xs = (short) x; + if (rawdata) + rawdata[rawidx++] = xs; + if (v->audio) + HTS_Audio_write(v->audio, xs); + + if (!--i) { + for (i = 0; i <= m; i++) + v->c[i] += v->cinc[i]; + i = v->iprd; + } + } + + HTS_Vocoder_end_excitation(v); + HTS_movem(v->cc, v->c, m + 1); +} + +/* HTS_Vocoder_postfilter_mcp: postfilter for MCP */ +void HTS_Vocoder_postfilter_mcp(HTS_Vocoder * v, double *mcp, const int m, + double alpha, double beta) +{ + double e1, e2; + int k; + + if (beta > 0.0 && m > 1) { + if (v->postfilter_size < m) { + if (v->postfilter_buff != NULL) + HTS_free(v->postfilter_buff); + v->postfilter_buff = (double *) HTS_calloc(m + 1, sizeof(double)); + v->postfilter_size = m; + } + HTS_mc2b(mcp, v->postfilter_buff, m, alpha); + e1 = HTS_b2en(v, v->postfilter_buff, m, alpha); + + v->postfilter_buff[1] -= beta * alpha * mcp[2]; + for (k = 2; k <= m; k++) + v->postfilter_buff[k] *= (1.0 + beta); + + e2 = HTS_b2en(v, v->postfilter_buff, m, alpha); + v->postfilter_buff[0] += log(e1 / e2) / 2; + HTS_b2mc(v->postfilter_buff, mcp, m, alpha); + } +} + +/* HTS_Vocoder_clear: clear vocoder */ +void HTS_Vocoder_clear(HTS_Vocoder * v) +{ + if (v != NULL) { + /* free buffer */ + if (v->freqt_buff != NULL) { + HTS_free(v->freqt_buff); + v->freqt_buff = NULL; + } + v->freqt_size = 0; + if (v->gc2gc_buff != NULL) { + HTS_free(v->gc2gc_buff); + v->gc2gc_buff = NULL; + } + v->gc2gc_size = 0; + if (v->lsp2lpc_buff != NULL) { + HTS_free(v->lsp2lpc_buff); + v->lsp2lpc_buff = NULL; + } + v->lsp2lpc_size = 0; + if (v->postfilter_buff != NULL) { + HTS_free(v->postfilter_buff); + v->postfilter_buff = NULL; + } + v->postfilter_size = 0; + if (v->spectrum2en_buff != NULL) { + HTS_free(v->spectrum2en_buff); + v->spectrum2en_buff = NULL; + } + v->spectrum2en_size = 0; + if (v->pade != NULL) { + HTS_free(v->pade); + v->pade = NULL; + } + if (v->c != NULL) { + HTS_free(v->c); + v->c = NULL; + } + /* close audio device */ + if (v->audio != NULL) { + HTS_Audio_close(v->audio); + HTS_free(v->audio); + v->audio = NULL; + } + } +} + +HTS_VOCODER_C_END; + +#endif /* !HTS_VOCODER_C */ diff --git a/src/modules/hts_engine/INSTALL b/src/modules/hts_engine/INSTALL new file mode 100644 index 0000000..662904d --- /dev/null +++ b/src/modules/hts_engine/INSTALL @@ -0,0 +1,16 @@ +Installation Instructions +************************* + +1. After unpacking the tar.gz file, cd to the festival directory. + +2. Run configure script with appropriate options. + + % ./configure + + For detail, please see. + + % ./configure --help + +3. Run make. + + % make diff --git a/src/modules/hts_engine/Makefile b/src/modules/hts_engine/Makefile new file mode 100644 index 0000000..e91fd29 --- /dev/null +++ b/src/modules/hts_engine/Makefile @@ -0,0 +1,66 @@ +# ----------------------------------------------------------------- # +# The HMM-Based Speech Synthesis System (HTS) # +# festopt_hts_engine developed by HTS Working Group # +# http://hts-engine.sourceforge.net/ # +# ----------------------------------------------------------------- # +# # +# Copyright (c) 2001-2010 Nagoya Institute of Technology # +# Department of Computer Science # +# # +# 2001-2008 Tokyo Institute of Technology # +# Interdisciplinary Graduate School of # +# Science and Engineering # +# # +# All rights reserved. # +# # +# Redistribution and use in source and binary forms, with or # +# without modification, are permitted provided that the following # +# conditions are met: # +# # +# - Redistributions of source code must retain the above copyright # +# notice, this list of conditions and the following disclaimer. # +# - Redistributions in binary form must reproduce the above # +# copyright notice, this list of conditions and the following # +# disclaimer in the documentation and/or other materials provided # +# with the distribution. # +# - Neither the name of the HTS working group nor the names of its # +# contributors may be used to endorse or promote products derived # +# from this software without specific prior written permission. # +# # +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND # +# CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, # +# INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF # +# MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE # +# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS # +# BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, # +# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED # +# TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON # +# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, # +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY # +# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE # +# POSSIBILITY OF SUCH DAMAGE. # +# ----------------------------------------------------------------- # + +TOP=../../.. +DIRNAME=src/modules/hts_engine + +H = HTS_engine.h HTS_hidden.h + +CPPSRCS = fest2hts_engine.cc +CSRCS = HTS_audio.c HTS_engine.c HTS_gstream.c HTS_label.c HTS_misc.c HTS_model.c HTS_pstream.c HTS_sstream.c HTS_vocoder.c + +SRCS = $(CPPSRCS) $(CSRCS) + +OBJS = $(CPPSRCS:.cc=.o) $(CSRCS:.c=.o) +INFOS = README COPYING INSTALL AUTHORS + +FILES=Makefile $(SRCS) $(H) $(INFOS) + +LOCAL_INCLUDES = -I ../include -D FESTIVAL + +INLIB = $(TOP)/src/lib/libFestival.a + +ALL = .buildlib + +include $(TOP)/config/common_make_rules diff --git a/src/modules/hts_engine/README b/src/modules/hts_engine/README new file mode 100644 index 0000000..4955ef6 --- /dev/null +++ b/src/modules/hts_engine/README @@ -0,0 +1,126 @@ +=============================================================================== + The festopt_hts_engine version 1.01 + release December 25, 2010 + + +The festopt_hts_engine is a HMM-based speech synthesis module. It has been +being developed by the HTS working group (see "Who we are" below) and some +graduate students in Nagoya Institute of Technology (see "AUTHORS" in the same +directory). + +******************************************************************************* + Copying +******************************************************************************* + +The festopt_hts_engine is released under the New and Simplified BSD license +(see http://www.opensource.org/). Using and distributing this software copy, is +free (without restriction including without limitation the rights touse, modify, +merge, publish, distribute, sublicense, and/or sell copies of this work, and to +permit persons to whom this work is furnished to do so) subject to the +conditions in the following license: + +/* ----------------------------------------------------------------- */ +/* The HMM-Based Speech Synthesis System (HTS) */ +/* festopt_hts_engine developed by HTS Working Group */ +/* http://hts-engine.sourceforge.net/ */ +/* ----------------------------------------------------------------- */ +/* */ +/* Copyright (c) 2001-2010 Nagoya Institute of Technology */ +/* Department of Computer Science */ +/* */ +/* 2001-2008 Tokyo Institute of Technology */ +/* Interdisciplinary Graduate School of */ +/* Science and Engineering */ +/* */ +/* All rights reserved. */ +/* */ +/* Redistribution and use in source and binary forms, with or */ +/* without modification, are permitted provided that the following */ +/* conditions are met: */ +/* */ +/* - Redistributions of source code must retain the above copyright */ +/* notice, this list of conditions and the following disclaimer. */ +/* - Redistributions in binary form must reproduce the above */ +/* copyright notice, this list of conditions and the following */ +/* disclaimer in the documentation and/or other materials provided */ +/* with the distribution. */ +/* - Neither the name of the HTS working group nor the names of its */ +/* contributors may be used to endorse or promote products derived */ +/* from this software without specific prior written permission. */ +/* */ +/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND */ +/* CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, */ +/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */ +/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */ +/* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS */ +/* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, */ +/* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED */ +/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, */ +/* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON */ +/* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, */ +/* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY */ +/* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */ +/* POSSIBILITY OF SUCH DAMAGE. */ +/* ----------------------------------------------------------------- */ + +Although this software is free, we still offer no warranties and no +maintenance. We will continue to endeavor to fix bugs and answer queries when +can, but are not in a position to guarantee it. We will consider consultancy if +desired, please contacts us for details. + +If you are using the festopt_hts_engine in commercial environments, even though +no license is required, we would be grateful if you let us know as it helps +justify ourselves to our various sponsors. We also strongly encourage you to + + * refer to the use of festopt_hts_engine in any publications that use this + software + * report bugs, where possible with bug fixes, that are found + +See "COPYING" in the same directory for details. + +******************************************************************************* + Installation +******************************************************************************* + +See "INSTALL" in the same directory for details. + +******************************************************************************* + Documentation +******************************************************************************* + +Reference manual of festopt_hts_engine is available at + +http://hts-engine.sourceforge.net/ + +******************************************************************************* + Acknowledgements +******************************************************************************* + +Keiichi Tokuda +Keiichiro Oura +Junichi Yamagishi +Alan W. Black + +******************************************************************************* + Who we are +******************************************************************************* + +The HTS working group is a voluntary group for developing the HMM-Based Speech +Synthesis System. Current members are + + Keiichi Tokuda http://www.sp.nitech.ac.jp/~tokuda/ + (Produce and Design) + Keiichiro Oura http://www.sp.nitech.ac.jp/~uratec/ + (Design and Development, Main Maintainer) + Kei Hashimoto http://www.sp.nitech.ac.jp/~bonanza/ + Heiga Zen + Junichi Yamagishi http://homepages.inf.ed.ac.uk/jyamagis/ + Tomoki Toda http://spalab.naist.jp/~tomoki/index_e.html + Takashi Nose + Shinji Sako http://www.mmsp.nitech.ac.jp/~sako/ + Alan W. Black http://www.cs.cmu.edu/~awb/ + +and the members are dynamically changing. The current formal contact address of +HTS working group and a mailing list for HTS users can be found at +http://hts.sp.nitech.ac.jp/ +=============================================================================== diff --git a/src/modules/hts_engine/fest2hts_engine.cc b/src/modules/hts_engine/fest2hts_engine.cc new file mode 100644 index 0000000..294b475 --- /dev/null +++ b/src/modules/hts_engine/fest2hts_engine.cc @@ -0,0 +1,250 @@ +/* ----------------------------------------------------------------- */ +/* The HMM-Based Speech Synthesis System (HTS) */ +/* festopt_hts_engine developed by HTS Working Group */ +/* http://hts-engine.sourceforge.net/ */ +/* ----------------------------------------------------------------- */ +/* */ +/* Copyright (c) 2001-2010 Nagoya Institute of Technology */ +/* Department of Computer Science */ +/* */ +/* 2001-2008 Tokyo Institute of Technology */ +/* Interdisciplinary Graduate School of */ +/* Science and Engineering */ +/* */ +/* All rights reserved. */ +/* */ +/* Redistribution and use in source and binary forms, with or */ +/* without modification, are permitted provided that the following */ +/* conditions are met: */ +/* */ +/* - Redistributions of source code must retain the above copyright */ +/* notice, this list of conditions and the following disclaimer. */ +/* - Redistributions in binary form must reproduce the above */ +/* copyright notice, this list of conditions and the following */ +/* disclaimer in the documentation and/or other materials provided */ +/* with the distribution. */ +/* - Neither the name of the HTS working group nor the names of its */ +/* contributors may be used to endorse or promote products derived */ +/* from this software without specific prior written permission. */ +/* */ +/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND */ +/* CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, */ +/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */ +/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */ +/* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS */ +/* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, */ +/* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED */ +/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, */ +/* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON */ +/* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, */ +/* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY */ +/* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */ +/* POSSIBILITY OF SUCH DAMAGE. */ +/* ----------------------------------------------------------------- */ + +/* standard C libraries */ +#include +#include +#include +#include +#include "festival.h" + +/* header of hts_engine API */ +#ifdef __cplusplus +extern "C" { +#endif +#include "HTS_engine.h" +#ifdef __cplusplus +} +#endif +/* Getfp: wrapper for fopen */ + static FILE *Getfp(const char *name, const char *opt) +{ + FILE *fp = fopen(name, opt); + + if (fp == NULL) { + cerr << "Getfp: Cannot open " << name << endl; + festival_error(); + } + + return (fp); +} + +/* HTS_Synthesize_Utt: generate speech from utt by using hts_engine API */ +static LISP HTS_Synthesize_Utt(LISP utt) +{ + EST_Utterance *u = get_c_utt(utt); + EST_Item *item = 0; + LISP hts_engine_params = NIL; + LISP hts_output_params = NIL; + + char *fn_ms_lf0 = NULL, *fn_ms_mcp = NULL, *fn_ms_dur = NULL; + char *fn_ts_lf0 = NULL, *fn_ts_mcp = NULL, *fn_ts_dur = NULL; + char *fn_ws_lf0[3], *fn_ws_mcp[3]; + char *fn_ms_gvl = NULL, *fn_ms_gvm = NULL; + char *fn_ts_gvl = NULL, *fn_ts_gvm = NULL; + char *fn_gv_switch = NULL; + + FILE *labfp = NULL; + FILE *lf0fp = NULL, *mcpfp = NULL, *rawfp = NULL, *durfp = NULL; + + HTS_Engine engine; + + int sampling_rate; + int fperiod; + double alpha; + int stage; + double beta; + double uv_threshold; + + /* get params */ + hts_engine_params = + siod_get_lval("hts_engine_params", + "festopt_hts_engine: no parameters set for module"); + hts_output_params = + siod_get_lval("hts_output_params", + "festopt_hts_engine: no output parameters set for module"); + + /* get model file names */ + fn_ms_dur = (char *) get_param_str("-md", hts_engine_params, "hts/lf0.pdf"); + fn_ms_mcp = (char *) get_param_str("-mm", hts_engine_params, "hts/mgc.pdf"); + fn_ms_lf0 = (char *) get_param_str("-mf", hts_engine_params, "hts/dur.pdf"); + fn_ts_dur = + (char *) get_param_str("-td", hts_engine_params, "hts/tree-lf0.inf"); + fn_ts_mcp = + (char *) get_param_str("-tm", hts_engine_params, "hts/tree-mgc.inf"); + fn_ts_lf0 = + (char *) get_param_str("-tf", hts_engine_params, "hts/tree-dur.inf"); + fn_ws_mcp[0] = + (char *) get_param_str("-dm1", hts_engine_params, "hts/mgc.win1"); + fn_ws_mcp[1] = + (char *) get_param_str("-dm2", hts_engine_params, "hts/mgc.win2"); + fn_ws_mcp[2] = + (char *) get_param_str("-dm3", hts_engine_params, "hts/mgc.win3"); + fn_ws_lf0[0] = + (char *) get_param_str("-df1", hts_engine_params, "hts/lf0.win1"); + fn_ws_lf0[1] = + (char *) get_param_str("-df2", hts_engine_params, "hts/lf0.win2"); + fn_ws_lf0[2] = + (char *) get_param_str("-df3", hts_engine_params, "hts/lf0.win3"); + fn_ms_gvm = + (char *) get_param_str("-cm", hts_engine_params, "hts/gv-mgc.pdf"); + fn_ms_gvl = + (char *) get_param_str("-cf", hts_engine_params, "hts/gv-lf0.pdf"); + fn_ts_gvm = + (char *) get_param_str("-em", hts_engine_params, "hts/tree-gv-mgc.inf"); + fn_ts_gvl = + (char *) get_param_str("-ef", hts_engine_params, "hts/tree-gv-lf0.inf"); + fn_gv_switch = + (char *) get_param_str("-k", hts_engine_params, "hts/gv-switch.inf"); + + /* open input file pointers */ + labfp = + Getfp(get_param_str("-labelfile", hts_output_params, "utt.feats"), "r"); + + /* open output file pointers */ + rawfp = Getfp(get_param_str("-or", hts_output_params, "tmp.raw"), "wb"); + lf0fp = Getfp(get_param_str("-of", hts_output_params, "tmp.lf0"), "wb"); + mcpfp = Getfp(get_param_str("-om", hts_output_params, "tmp.mgc"), "wb"); + durfp = Getfp(get_param_str("-od", hts_output_params, "tmp.lab"), "wb"); + + /* get other params */ + sampling_rate = (int) get_param_float("-s", hts_engine_params, 16000.0); + fperiod = (int) get_param_float("-p", hts_engine_params, 80.0); + alpha = (double) get_param_float("-a", hts_engine_params, 0.42); + stage = (int) get_param_float("-g", hts_engine_params, 0.0); + beta = (double) get_param_float("-b", hts_engine_params, 0.0); + uv_threshold = (double) get_param_float("-u", hts_engine_params, 0.5); + + /* initialize */ + HTS_Engine_initialize(&engine, 2); + HTS_Engine_set_sampling_rate(&engine, sampling_rate); + HTS_Engine_set_fperiod(&engine, fperiod); + HTS_Engine_set_alpha(&engine, alpha); + HTS_Engine_set_gamma(&engine, stage); + HTS_Engine_set_beta(&engine, beta); + HTS_Engine_set_msd_threshold(&engine, 1, uv_threshold); + HTS_Engine_set_audio_buff_size(&engine, 0); + + /* load models */ + HTS_Engine_load_duration_from_fn(&engine, &fn_ms_dur, &fn_ts_dur, 1); + HTS_Engine_load_parameter_from_fn(&engine, &fn_ms_mcp, &fn_ts_mcp, fn_ws_mcp, + 0, FALSE, 3, 1); + HTS_Engine_load_parameter_from_fn(&engine, &fn_ms_lf0, &fn_ts_lf0, fn_ws_lf0, + 1, TRUE, 3, 1); + HTS_Engine_load_gv_from_fn(&engine, &fn_ms_gvm, &fn_ts_gvm, 0, 1); + HTS_Engine_load_gv_from_fn(&engine, &fn_ms_gvl, &fn_ts_gvl, 1, 1); + HTS_Engine_load_gv_switch_from_fn(&engine, fn_gv_switch); + + /* generate speech */ + if (u->relation("Segment")->first()) { /* only if there segments */ + HTS_Engine_load_label_from_fp(&engine, labfp); + HTS_Engine_create_sstream(&engine); + HTS_Engine_create_pstream(&engine); + HTS_Engine_create_gstream(&engine); + if (rawfp != NULL) + HTS_Engine_save_generated_speech(&engine, rawfp); + if (durfp != NULL) + HTS_Engine_save_label(&engine, durfp); + if (lf0fp != NULL) + HTS_Engine_save_generated_parameter(&engine, lf0fp, 1); + if (mcpfp != NULL) + HTS_Engine_save_generated_parameter(&engine, mcpfp, 1); + HTS_Engine_refresh(&engine); + } + + /* free */ + HTS_Engine_clear(&engine); + + /* close output file pointers */ + if (rawfp != NULL) + fclose(rawfp); + if (durfp != NULL) + fclose(durfp); + if (lf0fp != NULL) + fclose(lf0fp); + if (mcpfp != NULL) + fclose(mcpfp); + + /* close input file pointers */ + if (labfp != NULL) + fclose(labfp); + + /* Load back in the waveform */ + EST_Wave *w = new EST_Wave; + w->resample(sampling_rate); + + if (u->relation("Segment")->first()) /* only if there segments */ + w->load_file(get_param_str("-or", hts_output_params, "tmp.raw"), "raw", + sampling_rate, "short", str_to_bo("native"), 1); + + item = u->create_relation("Wave")->append(); + item->set_val("wave", est_val(w)); + + /* Load back in the segment times */ + EST_Relation *r = new EST_Relation; + EST_Item *s,*o; + + r->load(get_param_str("-od", hts_output_params, "tmp.lab"),"htk"); + + for(o = r->first(), s = u->relation("Segment")->first() ; (o != NULL) && (s != NULL) ; o = o->next(), s = s->next() ) + if (o->S("name").before("+").after("-").matches(s->S("name"))) + s->set("end",o->F("end")); + else + cerr << "HTS_Synthesize_Utt: Output segment mismatch"; + + delete r; + + return utt; +} + +void festival_hts_engine_init(void) +{ + char buf[1024]; + + HTS_get_copyright(buf); + proclaim_module("hts_engine", buf); + + festival_def_utt_module("HTS_Synthesize", HTS_Synthesize_Utt, + "(HTS_Synthesis UTT)\n Synthesize a waveform using the hts_engine and the current models"); +} diff --git a/src/modules/java/Makefile b/src/modules/java/Makefile new file mode 100644 index 0000000..ad499e5 --- /dev/null +++ b/src/modules/java/Makefile @@ -0,0 +1,45 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### + +TOP=../../.. +DIRNAME=src/modules/java + +BUILD_DIRS = cstr +NEED_JAVA=1 + +FILES = Makefile java.mak java_cpp.mak java_media.mak jsapi.mak NoInit + +ALL = .sub_directories .javalib + +include $(TOP)/config/common_make_rules + diff --git a/src/modules/java/NoInit b/src/modules/java/NoInit new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/src/modules/java/NoInit @@ -0,0 +1 @@ + diff --git a/src/modules/java/cstr/Makefile b/src/modules/java/cstr/Makefile new file mode 100644 index 0000000..352a0f5 --- /dev/null +++ b/src/modules/java/cstr/Makefile @@ -0,0 +1,46 @@ + +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### + +TOP=../../../.. +DIRNAME=src/modules/java/cstr + +BUILD_DIRS = festival testPrograms +NEED_JAVA=1 + +FILES = Makefile + +ALL = .sub_directories + +include $(TOP)/config/common_make_rules + diff --git a/src/modules/java/cstr/festival/Client.java b/src/modules/java/cstr/festival/Client.java new file mode 100644 index 0000000..0ec5290 --- /dev/null +++ b/src/modules/java/cstr/festival/Client.java @@ -0,0 +1,221 @@ + + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Centre for Speech Technology Research \\ + // University of Edinburgh, UK \\ + // Copyright (c) 1996,1997 \\ + // All Rights Reserved. \\ + // Permission is hereby granted, free of charge, to use and distribute \\ + // this software and its documentation without restriction, including \\ + // without limitation the rights to use, copy, modify, merge, publish, \\ + // distribute, sublicense, and/or sell copies of this work, and to \\ + // permit persons to whom this work is furnished to do so, subject to \\ + // the following conditions: \\ + // 1. The code must retain the above copyright notice, this list of \\ + // conditions and the following disclaimer. \\ + // 2. Any modifications must be clearly marked as such. \\ + // 3. Original authors' names are not deleted. \\ + // 4. The authors' names are not used to endorse or promote products \\ + // derived from this software without specific prior written \\ + // permission. \\ + // THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK \\ + // DISCLAIM ALL WARRANTIES With REGARD TO THIS SOFTWARE, INCLUDING \\ + // ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT \\ + // SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE \\ + // FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES \\ + // WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN \\ + // AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, \\ + // ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF \\ + // THIS SOFTWARE. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Author: Richard Caley (rjc@cstr.ed.ac.uk) \\ + // -------------------------------------------------------------------- \\ + // Mainline for Java festival client. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + +package cstr.festival ; + +import java.lang.*; +import java.util.*; +import java.awt.*; +import java.io.*; +import java.net.*; + +import cstr.festival.client.*; +import cstr.festival.scheme.*; + +class RequestHandler implements RequestListener +{ + Thread mainline; + + public RequestHandler(Thread main) + { + mainline=main; + } + + public void requestRunning(Request r) + { + System.out.println("Request "+r.command+" running"); + } + + public void requestResult(Request r, Object res) + { + System.out.println("Request "+r.command+" result="+res); + } + + public void requestError(Request r, String mes) + { + System.out.println("Request "+r.command+" error "+mes); + } + + public void requestFinished(Request r) + { + System.out.println("Request "+r.command+" finished"); + } +} + +public class Client { + + public static void useage(String error) + { + if (error != null) + { + System.out.println(""); + System.out.println(" "+error); + } + System.out.println(""); + System.out.println("Useage: festival_client_java [options] file..."); + System.out.println(""); + System.out.println(" --help Show this message."); + System.out.println(" --server Connect to named host."); + System.out.println(" --port Use the given port."); + System.out.println(" --sync Wait for each expression to finish"); + System.out.println(" executing before reading next."); + System.out.println(" --wait Wait for all expressions to finish"); + System.out.println(" executing before exiting."); + + System.exit(error!=null?1:0); + } + + public static void main (String[] args) + { + String server="localhost"; + int port=1314; + int i; + boolean waitAtEnd=false; + boolean sync=false; + + for(i=0; i=args.length) + useage("Need name after --server"); + + server = args[i]; + } + else if (args[i].equals("--port")) + { + i++; + if (i>=args.length) + useage("Need name after --server"); + + try { + port = Integer.parseInt(args[i]); + } catch (NumberFormatException ex) { + useage("Not a valid port number '" + args[i] + "'"); + } + + } + else if (args[i].equals("--wait")) + waitAtEnd=true; + else if (args[i].equals("--sync")) + sync=true; + else if (args[i].startsWith("--")) + useage("Unknown argument "+args[i]); + else + break; + } + + System.out.println("Server='"+server+":"+port+"'"); + + Session s=null; + + try { + s = new Session(server, port); + } catch (UnknownHostException ex) { + useage("Unknown host '"+ex.getMessage()+"'"); + } catch (IOException ex) { + useage("Can't connect '"+ex.getMessage()+"'"); + } + + s.initialise(); + + System.out.println("Connected"); + + if (i == args.length) + { + args = new String[] { "-" }; + i=0; + } + + RequestListener handler = new RequestHandler(Thread.currentThread()); + + file: + for(; i "); + System.out.flush(); + } + + exp=sreader.nextExprString(); + + if (exp==null) + break; + + System.out.println(": "+exp); + Request r = s.request(exp, handler); + if (sync) + while (!r.isFinished()) + r.waitForUpdate(); + } + } catch (IOException ioex) { + System.out.println(" Error "+ioex.getMessage()); + break file; + } + System.out.println("--EOF--"); + } + + System.out.println("call terminate"); + s.terminate(waitAtEnd); + } +}; diff --git a/src/modules/java/cstr/festival/Makefile b/src/modules/java/cstr/festival/Makefile new file mode 100644 index 0000000..a47e700 --- /dev/null +++ b/src/modules/java/cstr/festival/Makefile @@ -0,0 +1,48 @@ + +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### + +TOP=../../../../.. +DIRNAME=src/modules/java/cstr/festival + +BUILD_DIRS = client scheme +ALL_DIRS = $(BUILD_DIRS) jsapi +JAVA_CLASSES = Client +NEED_JAVA=1 + +FILES = Makefile $(JAVA_CLASSES:=.java) + +ALL = .java .sub_directories + +include $(TOP)/config/common_make_rules + diff --git a/src/modules/java/cstr/festival/client/Festival.java b/src/modules/java/cstr/festival/client/Festival.java new file mode 100644 index 0000000..07b56bb --- /dev/null +++ b/src/modules/java/cstr/festival/client/Festival.java @@ -0,0 +1,368 @@ + + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Centre for Speech Technology Research \\ + // University of Edinburgh, UK \\ + // Copyright (c) 1996,1997 \\ + // All Rights Reserved. \\ + // Permission is hereby granted, free of charge, to use and distribute \\ + // this software and its documentation without restriction, including \\ + // without limitation the rights to use, copy, modify, merge, publish, \\ + // distribute, sublicense, and/or sell copies of this work, and to \\ + // permit persons to whom this work is furnished to do so, subject to \\ + // the following conditions: \\ + // 1. The code must retain the above copyright notice, this list of \\ + // conditions and the following disclaimer. \\ + // 2. Any modifications must be clearly marked as such. \\ + // 3. Original authors' names are not deleted. \\ + // 4. The authors' names are not used to endorse or promote products \\ + // derived from this software without specific prior written \\ + // permission. \\ + // THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK \\ + // DISCLAIM ALL WARRANTIES With REGARD TO THIS SOFTWARE, INCLUDING \\ + // ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT \\ + // SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE \\ + // FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES \\ + // WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN \\ + // AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, \\ + // ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF \\ + // THIS SOFTWARE. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Author: Richard Caley (rjc@cstr.ed.ac.uk) \\ + // -------------------------------------------------------------------- \\ + // The thread which actually communicates with festival. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + + +package cstr.festival.client ; + +import java.lang.*; +import java.util.*; +import java.awt.*; +import java.io.*; +import java.net.*; + +import cstr.est.*; + +class Job +{ + Integer id; + String command; + Session session; + + public Job(Integer i, String c, Session s) + { + id=i; + command=c; + session=s; + } +} + +class Festival extends Thread +{ + public static final int FT_SCHEME = 1; + public static final int FT_WAVE = 2; + + protected static byte [] endm; + protected Socket s; + protected String hostname=null; + protected InetAddress address = null; + protected int port=-1; + protected boolean closeOnExit; + + protected PrintWriter out; + protected InputStream in; + + protected boolean ok; + + protected JobQueue jobs; + + private byte buffer[]; + private int buffered_p; + private int buffered_e; + + static { + String end="ft_StUfF_key"; + endm = new byte[end.length()]; + + for(int i=0;i=0) + { + val.append(line.substring(0, endm)); + pending = line.substring(endm+12); + break; + } + else + { + val.append(line); + val.append("\n"); + } + + line = in.readLine(); + } + return val.toString(); + */ + } +} diff --git a/src/modules/java/cstr/festival/client/JobQueue.java b/src/modules/java/cstr/festival/client/JobQueue.java new file mode 100644 index 0000000..51f8ec0 --- /dev/null +++ b/src/modules/java/cstr/festival/client/JobQueue.java @@ -0,0 +1,104 @@ + + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Centre for Speech Technology Research \\ + // University of Edinburgh, UK \\ + // Copyright (c) 1996,1997 \\ + // All Rights Reserved. \\ + // Permission is hereby granted, free of charge, to use and distribute \\ + // this software and its documentation without restriction, including \\ + // without limitation the rights to use, copy, modify, merge, publish, \\ + // distribute, sublicense, and/or sell copies of this work, and to \\ + // permit persons to whom this work is furnished to do so, subject to \\ + // the following conditions: \\ + // 1. The code must retain the above copyright notice, this list of \\ + // conditions and the following disclaimer. \\ + // 2. Any modifications must be clearly marked as such. \\ + // 3. Original authors' names are not deleted. \\ + // 4. The authors' names are not used to endorse or promote products \\ + // derived from this software without specific prior written \\ + // permission. \\ + // THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK \\ + // DISCLAIM ALL WARRANTIES With REGARD TO THIS SOFTWARE, INCLUDING \\ + // ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT \\ + // SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE \\ + // FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES \\ + // WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN \\ + // AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, \\ + // ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF \\ + // THIS SOFTWARE. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Author: Richard Caley (rjc@cstr.ed.ac.uk) \\ + // -------------------------------------------------------------------- \\ + // A thread safe queue based on Vector. Could be faster. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + + +package cstr.festival.client ; + +import java.lang.*; +import java.util.*; +import java.awt.*; + +public class JobQueue +{ + Vector q; + + public JobQueue(int space) + { + q = new Vector(space); + } + + public JobQueue() + { + q = new Vector(); + } + + public synchronized void add(Object thing) + { + q.addElement(thing); + } + + public synchronized void remove(Object thing) + { + q.removeElement(thing); + } + + public synchronized boolean isEmpty() + { + return q.isEmpty(); + } + + public synchronized Object top() + { + Object o; + + if (q.isEmpty()) + o = null; + else + o = q.firstElement(); + return o; + } + + public synchronized Object get() + { + Object o; + + if (q.isEmpty()) + o = null; + else + { + o = q.firstElement(); + q.removeElementAt(0); + } + return o; + } + + public synchronized Enumeration elements() + { + return q.elements(); + } +} diff --git a/src/modules/java/cstr/festival/client/Makefile b/src/modules/java/cstr/festival/client/Makefile new file mode 100644 index 0000000..e5f50c8 --- /dev/null +++ b/src/modules/java/cstr/festival/client/Makefile @@ -0,0 +1,46 @@ + +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### + +TOP=../../../../../.. +DIRNAME=src/modules/java/cstr/festival/client + +JAVA_CLASSES = Session Festival JobQueue Request RequestListener +NEED_JAVA=1 + +FILES = Makefile $(JAVA_CLASSES:=.java) + +ALL = .java .sub_directories + +include $(TOP)/config/common_make_rules + diff --git a/src/modules/java/cstr/festival/client/Request.java b/src/modules/java/cstr/festival/client/Request.java new file mode 100644 index 0000000..a202a8e --- /dev/null +++ b/src/modules/java/cstr/festival/client/Request.java @@ -0,0 +1,132 @@ + + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Centre for Speech Technology Research \\ + // University of Edinburgh, UK \\ + // Copyright (c) 1996,1997 \\ + // All Rights Reserved. \\ + // Permission is hereby granted, free of charge, to use and distribute \\ + // this software and its documentation without restriction, including \\ + // without limitation the rights to use, copy, modify, merge, publish, \\ + // distribute, sublicense, and/or sell copies of this work, and to \\ + // permit persons to whom this work is furnished to do so, subject to \\ + // the following conditions: \\ + // 1. The code must retain the above copyright notice, this list of \\ + // conditions and the following disclaimer. \\ + // 2. Any modifications must be clearly marked as such. \\ + // 3. Original authors' names are not deleted. \\ + // 4. The authors' names are not used to endorse or promote products \\ + // derived from this software without specific prior written \\ + // permission. \\ + // THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK \\ + // DISCLAIM ALL WARRANTIES With REGARD TO THIS SOFTWARE, INCLUDING \\ + // ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT \\ + // SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE \\ + // FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES \\ + // WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN \\ + // AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, \\ + // ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF \\ + // THIS SOFTWARE. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Author: Richard Caley (rjc@cstr.ed.ac.uk) \\ + // -------------------------------------------------------------------- \\ + // Record of a request made to a session. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + + +package cstr.festival.client ; + +import java.lang.*; +import java.util.*; +import java.awt.*; + +public class Request +{ + public String command; + public Integer id; + public Session session; + + private Vector results; + protected boolean running=false; + protected String error=null; + protected boolean finished=false; + + private RequestListener listener; + + public Request(String c, Integer i, Session s) + { + command=c; + id=i; + session=s; + + results= new Vector(1); + } + + public synchronized boolean isFinished() + { + return finished; + } + + public synchronized void waitForUpdate() + { + try { wait(); } catch (InterruptedException e) { } + // System.out.println("wait done"); + } + + + public synchronized void notifyResult(Object r) + { + results.addElement(r); + if (listener != null) + listener.requestResult(this, r); + notifyAll(); + } + + public synchronized void notifyRunning() + { + running = true; + if (listener != null) + listener.requestRunning(this); + notifyAll(); + } + + public synchronized void notifyError(String message) + { + error = message; + if (listener != null) + listener.requestError(this, message); + notifyAll(); + } + + public synchronized void notifyFinished() + { + finished = true; + if (listener != null) + listener.requestFinished(this); + notifyAll(); + } + + public synchronized void addRequestListener(RequestListener l) + { + listener = l; + + if (running) + l.requestRunning(this); + for(int i=0; i < results.size(); i++) + l.requestResult(this, results.elementAt(i)); + if (error != null) + l.requestError(this, error); + if (finished) + l.requestFinished(this); + } + + public synchronized void removeRequestListener(RequestListener l) + { + if (listener==l) + listener = null; + } +} + diff --git a/src/modules/java/cstr/festival/client/RequestListener.java b/src/modules/java/cstr/festival/client/RequestListener.java new file mode 100644 index 0000000..471dfc9 --- /dev/null +++ b/src/modules/java/cstr/festival/client/RequestListener.java @@ -0,0 +1,51 @@ + + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Centre for Speech Technology Research \\ + // University of Edinburgh, UK \\ + // Copyright (c) 1996,1997 \\ + // All Rights Reserved. \\ + // Permission is hereby granted, free of charge, to use and distribute \\ + // this software and its documentation without restriction, including \\ + // without limitation the rights to use, copy, modify, merge, publish, \\ + // distribute, sublicense, and/or sell copies of this work, and to \\ + // permit persons to whom this work is furnished to do so, subject to \\ + // the following conditions: \\ + // 1. The code must retain the above copyright notice, this list of \\ + // conditions and the following disclaimer. \\ + // 2. Any modifications must be clearly marked as such. \\ + // 3. Original authors' names are not deleted. \\ + // 4. The authors' names are not used to endorse or promote products \\ + // derived from this software without specific prior written \\ + // permission. \\ + // THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK \\ + // DISCLAIM ALL WARRANTIES With REGARD TO THIS SOFTWARE, INCLUDING \\ + // ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT \\ + // SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE \\ + // FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES \\ + // WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN \\ + // AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, \\ + // ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF \\ + // THIS SOFTWARE. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Author: Richard Caley (rjc@cstr.ed.ac.uk) \\ + // -------------------------------------------------------------------- \\ + // Interface for things which keep track of requests to the sewrver. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + +package cstr.festival.client ; + +import java.lang.*; +import java.util.*; +import java.awt.*; + +public interface RequestListener +{ + public abstract void requestRunning(Request r); + public abstract void requestResult(Request r, Object res); + public abstract void requestError(Request r, String mes); + public abstract void requestFinished(Request r); +} diff --git a/src/modules/java/cstr/festival/client/Session.java b/src/modules/java/cstr/festival/client/Session.java new file mode 100644 index 0000000..acc2b66 --- /dev/null +++ b/src/modules/java/cstr/festival/client/Session.java @@ -0,0 +1,186 @@ + + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Centre for Speech Technology Research \\ + // University of Edinburgh, UK \\ + // Copyright (c) 1996,1997 \\ + // All Rights Reserved. \\ + // Permission is hereby granted, free of charge, to use and distribute \\ + // this software and its documentation without restriction, including \\ + // without limitation the rights to use, copy, modify, merge, publish, \\ + // distribute, sublicense, and/or sell copies of this work, and to \\ + // permit persons to whom this work is furnished to do so, subject to \\ + // the following conditions: \\ + // 1. The code must retain the above copyright notice, this list of \\ + // conditions and the following disclaimer. \\ + // 2. Any modifications must be clearly marked as such. \\ + // 3. Original authors' names are not deleted. \\ + // 4. The authors' names are not used to endorse or promote products \\ + // derived from this software without specific prior written \\ + // permission. \\ + // THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK \\ + // DISCLAIM ALL WARRANTIES With REGARD TO THIS SOFTWARE, INCLUDING \\ + // ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT \\ + // SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE \\ + // FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES \\ + // WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN \\ + // AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, \\ + // ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF \\ + // THIS SOFTWARE. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Author: Richard Caley (rjc@cstr.ed.ac.uk) \\ + // -------------------------------------------------------------------- \\ + // Objects representing sessions on festival servers. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + +package cstr.festival.client ; + +import java.lang.*; +import java.util.*; +import java.awt.*; +import java.io.*; +import java.net.*; + +import cstr.est.*; + +class RequestRecorder implements RequestListener +{ + Object result; + + public RequestRecorder() + { + } + + public void requestRunning(Request r) + { + } + + public void requestResult(Request r, Object res) + { + result=res; + } + + public void requestError(Request r, String mes) + { + } + + public void requestFinished(Request r) + { + } +} + +public class Session { + protected Festival festival; + private int lastID=0; + protected Hashtable requests; + + public Session(Socket sock) + { + festival = new Festival(sock); + requests = new Hashtable(10); + } + + public Session(InetAddress addr, int p) + throws IOException + { + festival = new Festival(addr, p); + requests = new Hashtable(10); + } + + public Session(String host, int p) + throws IOException, UnknownHostException + { + festival = new Festival(InetAddress.getByName(host), p); + requests = new Hashtable(10); + } + + protected void finalize() + { + terminate(false); + } + + public void initialise() + { + festival.connect(); + } + + public void terminate(boolean carefully) + { + festival.disconnect(carefully); + } + + public Request request(String c, RequestListener l) + { + Integer id = new Integer(newID()); + Request r = new Request(c,id, this); + if (l!= null) + r.addRequestListener(l); + requests.put(id, r); + + festival.newJob(id, c, this); + + return r; + } + + public Object synchronousRequest(String c) + { + RequestRecorder l = new RequestRecorder(); + + Request r = request(c, l); + + while (!r.isFinished()) + r.waitForUpdate(); + + return l.result; + } + + public void notifyRunning(Integer id) + { + Request r = (Request)requests.get(id); + + r.notifyRunning(); + } + + public void notifyError(Integer id, String message) + { + Request r = (Request)requests.get(id); + + r.notifyError(message); + } + + public void notifyResult(Integer id, int type, byte [] result ) + { + Request r = (Request)requests.get(id); + Object oRes = result; + + if (type == Festival.FT_WAVE) + try { + Wave wv = new Wave(result); + oRes = wv; + } catch (UnsupportedEncodingException ex) { + oRes = ex; + } + else if (type == Festival.FT_SCHEME) + { + oRes = new String(result); + } + + r.notifyResult(oRes); + } + + public void notifyFinished(Integer id) + { + Request r = (Request)requests.get(id); + // it's finished, so we can forget about it. + requests.remove(id); + r.notifyFinished(); + } + + private int newID() + { + return ++lastID; + } +} diff --git a/src/modules/java/cstr/festival/jsapi/EngineCentral.java b/src/modules/java/cstr/festival/jsapi/EngineCentral.java new file mode 100644 index 0000000..4ff1714 --- /dev/null +++ b/src/modules/java/cstr/festival/jsapi/EngineCentral.java @@ -0,0 +1,98 @@ + + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Centre for Speech Technology Research \\ + // University of Edinburgh, UK \\ + // Copyright (c) 1996,1997 \\ + // All Rights Reserved. \\ + // Permission to use, copy, modify, distribute this software and its \\ + // documentation for research, educational and individual use only, is \\ + // hereby granted without fee, subject to the following conditions: \\ + // 1. The code must retain the above copyright notice, this list of \\ + // conditions and the following disclaimer. \\ + // 2. Any modifications must be clearly marked as such. \\ + // 3. Original authors' names are not deleted. \\ + // This software may not be used for commercial purposes without \\ + // specific prior written permission from the authors. \\ + // THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK \\ + // DISCLAIM ALL WARRANTIES With REGARD TO THIS SOFTWARE, INCLUDING \\ + // ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT \\ + // SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE \\ + // FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES \\ + // WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN \\ + // AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, \\ + // ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF \\ + // THIS SOFTWARE. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Author: Richard Caley (rjc@cstr.ed.ac.uk) \\ + // -------------------------------------------------------------------- \\ + // Class which describes what festival voices are available. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + +package cstr.festival.jsapi ; + +import java.lang.*; +import java.util.*; +import java.awt.*; + +import javax.speech.*; +import javax.speech.synthesis.*; + +public class EngineCentral + implements javax.speech.EngineCentral +{ + private static String server; + private static int port; + + static + { + server = System.getProperty("festival.server", + "localhost"); + + port = Integer.getInteger("festival.port", + 1314).intValue(); + } + + Voice rab = new Voice("Roger Burroughs", + Voice.GENDER_MALE, + Voice.AGE_MIDDLE_ADULT, + "LPC Diphone" + ); + Voice ked = new Voice("Kurt Dusterhoff", + Voice.GENDER_MALE, + Voice.AGE_YOUNGER_ADULT, + "LPC Diphone" + ); + + SynthesizerModeDesc festival_rab + = new FestivalModeDesc(server, + port, + server+":"+Integer.toString(port), + Locale.UK, + Boolean.TRUE, + new Voice [] {rab} + ); + + SynthesizerModeDesc festival_ked + = new FestivalModeDesc(server, + port, + server+":"+Integer.toString(port), + Locale.US, + Boolean.TRUE, + new Voice [] {ked} + ); + + public EngineList createEngineList(EngineModeDesc require) + throws SecurityException + { + EngineList engines = new EngineList(); + + engines.addElement(festival_rab); + engines.addElement(festival_ked); + + return engines; + } +} diff --git a/src/modules/java/cstr/festival/jsapi/FestivalModeDesc.java b/src/modules/java/cstr/festival/jsapi/FestivalModeDesc.java new file mode 100644 index 0000000..1e021f3 --- /dev/null +++ b/src/modules/java/cstr/festival/jsapi/FestivalModeDesc.java @@ -0,0 +1,82 @@ + + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Centre for Speech Technology Research \\ + // University of Edinburgh, UK \\ + // Copyright (c) 1996,1997 \\ + // All Rights Reserved. \\ + // Permission to use, copy, modify, distribute this software and its \\ + // documentation for research, educational and individual use only, is \\ + // hereby granted without fee, subject to the following conditions: \\ + // 1. The code must retain the above copyright notice, this list of \\ + // conditions and the following disclaimer. \\ + // 2. Any modifications must be clearly marked as such. \\ + // 3. Original authors' names are not deleted. \\ + // This software may not be used for commercial purposes without \\ + // specific prior written permission from the authors. \\ + // THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK \\ + // DISCLAIM ALL WARRANTIES With REGARD TO THIS SOFTWARE, INCLUDING \\ + // ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT \\ + // SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE \\ + // FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES \\ + // WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN \\ + // AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, \\ + // ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF \\ + // THIS SOFTWARE. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Author: Richard Caley (rjc@cstr.ed.ac.uk) \\ + // -------------------------------------------------------------------- \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + + + +package cstr.festival.jsapi ; + +import java.lang.*; +import java.util.*; +import java.io.*; +import java.net.*; + +import javax.speech.*; +import javax.speech.synthesis.*; + +public class FestivalModeDesc + extends SynthesizerModeDesc + implements EngineCreate +{ + private String server; + private int port; + + FestivalModeDesc(String s, + int p, + String mode, + Locale l, + Boolean running, + Voice [] voices) + { + super("Festival", mode, l, running, voices); + server=s; + port=p; + } + + public int getPort() + { + return port; + } + + public String getServer() + { + return server; + } + + public Engine createEngine() + throws IllegalArgumentException, EngineException + { + return new FestivalSynthesizer(this); + } + + +} diff --git a/src/modules/java/cstr/festival/jsapi/FestivalQueueItem.java b/src/modules/java/cstr/festival/jsapi/FestivalQueueItem.java new file mode 100644 index 0000000..53f939d --- /dev/null +++ b/src/modules/java/cstr/festival/jsapi/FestivalQueueItem.java @@ -0,0 +1,159 @@ + + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Centre for Speech Technology Research \\ + // University of Edinburgh, UK \\ + // Copyright (c) 1996,1997 \\ + // All Rights Reserved. \\ + // Permission to use, copy, modify, distribute this software and its \\ + // documentation for research, educational and individual use only, is \\ + // hereby granted without fee, subject to the following conditions: \\ + // 1. The code must retain the above copyright notice, this list of \\ + // conditions and the following disclaimer. \\ + // 2. Any modifications must be clearly marked as such. \\ + // 3. Original authors' names are not deleted. \\ + // This software may not be used for commercial purposes without \\ + // specific prior written permission from the authors. \\ + // THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK \\ + // DISCLAIM ALL WARRANTIES With REGARD TO THIS SOFTWARE, INCLUDING \\ + // ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT \\ + // SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE \\ + // FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES \\ + // WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN \\ + // AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, \\ + // ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF \\ + // THIS SOFTWARE. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Author: Richard Caley (rjc@cstr.ed.ac.uk) \\ + // -------------------------------------------------------------------- \\ + // A special case of SynthesizerQueueItem for festival. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + + +package cstr.festival.jsapi ; + +import java.lang.*; +import java.util.*; +import java.awt.*; + +import javax.speech.*; +import javax.speech.synthesis.*; + +import cstr.festival.client.*; +import cstr.est.*; + +class SynthesisRequestListener + implements RequestListener +{ + FestivalQueueItem item; + FestivalSynthesizer synth; + + public SynthesisRequestListener(FestivalQueueItem i, + FestivalSynthesizer s) + { + item=i; + synth=s; + } + + public void requestRunning(Request r) + { + } + + public void requestResult(Request r, Object res) + { + // System.out.println("Result="+res); + if (res == null) + return; + else if (res instanceof Wave) + item.queue.add((Wave)res); + } + + public void requestError(Request r, String mes) + { + System.err.println("Festival Error: "+mes); + item.queue.finished(); + } + + public void requestFinished(Request r) + { + item.queue.finished(); + } +} + +public class FestivalQueueItem + extends SynthesizerQueueItem + implements Runnable +{ + private FestivalSynthesizer synth; + private Request fRequest=null; + private RequestListener fListener=null; + private int toPlay = -1; + private Thread thread=null; + private String mode; + MessageQueue queue; + + public FestivalQueueItem(FestivalSynthesizer s, String text, String m, SpeakableListener listener) + { + super(text, text, m.equals("text"), listener); + synth=s; + mode=m; + } + + public FestivalQueueItem(FestivalSynthesizer s, Speakable source, SpeakableListener listener) + { + super(source, source.getJSMLText(), false, listener); + synth=s; + mode="jsml"; + } + + public void startSynthesis() + { + SynthesisRequestListener l = new SynthesisRequestListener(this, synth); + queue = new MessageQueue(); + fRequest = synth.getSession().request( + "(tts_text \""+text+"\" '"+mode+")", + l); + } + + public void startPlaying() + { + thread = new Thread(this); + thread.setPriority(thread.getPriority()+2); + thread.start(); + } + + public boolean running() + { + return thread!=null; + } + + public void cancel() + { + if (fRequest != null && fListener != null) + fRequest.removeRequestListener(fListener); + fRequest=null; + fListener=null; + } + + public void run() + { + synth.speakableStarted(this, listener); + // System.out.println("Start playing "+text); + while(queue.isActive()) + { + Wave wv = (Wave)queue.get(); + // System.out.println("Play "+wv); + if (wv != null) + { + wv.play(); + } + } + // System.out.println("Stop playing "+text); + synth.speakableEnded(this, listener); + } + + +} diff --git a/src/modules/java/cstr/festival/jsapi/FestivalSynthesizer.java b/src/modules/java/cstr/festival/jsapi/FestivalSynthesizer.java new file mode 100644 index 0000000..21d34c4 --- /dev/null +++ b/src/modules/java/cstr/festival/jsapi/FestivalSynthesizer.java @@ -0,0 +1,278 @@ + + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Centre for Speech Technology Research \\ + // University of Edinburgh, UK \\ + // Copyright (c) 1996,1997 \\ + // All Rights Reserved. \\ + // Permission to use, copy, modify, distribute this software and its \\ + // documentation for research, educational and individual use only, is \\ + // hereby granted without fee, subject to the following conditions: \\ + // 1. The code must retain the above copyright notice, this list of \\ + // conditions and the following disclaimer. \\ + // 2. Any modifications must be clearly marked as such. \\ + // 3. Original authors' names are not deleted. \\ + // This software may not be used for commercial purposes without \\ + // specific prior written permission from the authors. \\ + // THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK \\ + // DISCLAIM ALL WARRANTIES With REGARD TO THIS SOFTWARE, INCLUDING \\ + // ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT \\ + // SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE \\ + // FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES \\ + // WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN \\ + // AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, \\ + // ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF \\ + // THIS SOFTWARE. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Author: Richard Caley (rjc@cstr.ed.ac.uk) \\ + // -------------------------------------------------------------------- \\ + // A JSAPI synthesizer which talks to festivel. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + +package cstr.festival.jsapi ; + +import java.lang.*; +import java.util.*; +import java.io.*; +import java.net.*; + +import javax.speech.*; +import javax.speech.synthesis.*; + +import cstr.festival.client.*; + +public class FestivalSynthesizer + extends com.sun.speech.engine.BaseEngine + implements javax.speech.synthesis.Synthesizer +{ + FestivalModeDesc desc; + private String server; + private int port; + private Session session; + private Vector speakableListeners = new Vector(0); + private JobQueue queue; + private String voice; + + public FestivalSynthesizer(FestivalModeDesc d) + { + desc=d; + server=d.getServer(); + port=d.getPort(); + + session = null; + queue = new JobQueue(); + } + + public void allocate() + throws EngineException, EngineStateError + { + super.allocate(); + try { + session = new Session(server, port); + } catch (IOException ex) { + super.deallocate(); + throw new EngineException("IO Exception: "+ex.getMessage()); + } + + session.initialise(); + + Object res = session.synchronousRequest("(tts_return_to_client)"); + + // System.out.println("(tts_return_to_client)="+res); + + Object res2 = session.synchronousRequest("(Parameter.set 'Wavefiletype 'snd)"); + + // System.out.println("(Parameter.set 'Wavefiletype 'snd)="+res2); + + setVoice("rab"); + } + + public void setVoice(String v) + { + voice=v; + Object res = session.synchronousRequest("(voice_"+ voice +"_diphone)"); + } + + public void deallocate() + throws EngineException, EngineStateError + { + super.deallocate(); + session.terminate(false); + } + + public Session getSession() + { + return session; + } + + public EngineModeDesc getEngineModeDesc() + { + return desc; + } + + public void addSpeakableListener(SpeakableListener listener) + { + speakableListeners.addElement(listener); + } + + public void removeSpeakableListener(SpeakableListener listener) + { + speakableListeners.removeElement(listener); + } + + public void cancel() + { + cancelJob((FestivalQueueItem)queue.top()); + } + + public void cancel(Object it) + { + FestivalQueueItem item = findJobFor(it); + if (item != null) + cancelJob(item); + } + + public void cancelAll() + { + Enumeration jobs = queue.elements(); + + while (jobs.hasMoreElements()) + { + FestivalQueueItem item = (FestivalQueueItem)jobs.nextElement(); + cancelJob(item); + } + } + + public Enumeration enumerateQueue() + { + return queue.elements(); + } + + public SynthesizerProperties getSynthesizerProperties() + { + return null; + } + + public String phoneme(String p) + { + return ""; + } + + public void speak(String text, SpeakableListener listener) + { + FestivalQueueItem item = new FestivalQueueItem(this, text, "jsml", listener); + + addJob(item); + } + + public void speak(URL url, SpeakableListener listener) + { + String text = "URL not yet supported"; + FestivalQueueItem item = new FestivalQueueItem(this, text, "text", listener); + + addJob(item); + } + + + public void speak(Speakable text, SpeakableListener listener) + { + FestivalQueueItem item = new FestivalQueueItem(this, text, listener); + + addJob(item); + } + + public void speakPlainText(String text, SpeakableListener listener) + { + FestivalQueueItem item = new FestivalQueueItem(this, text, "text", listener); + + addJob(item); + } + + protected void checkQueue() + { + if (queue.isEmpty()) + { + engineState &= ~Synthesizer.QUEUE_NOT_EMPTY; + engineState |= Synthesizer.QUEUE_EMPTY; + } + else + { + engineState &= ~Synthesizer.QUEUE_NOT_EMPTY; + engineState |= Synthesizer.QUEUE_EMPTY; + toTop((FestivalQueueItem)queue.top()); + } + } + + protected void addJob(FestivalQueueItem item) + { + queue.add(item); + item.startSynthesis(); + checkQueue(); + } + + protected void cancelJob(FestivalQueueItem item) + { + queue.remove(item); + item.cancel(); + SpeakableEvent e = new SpeakableEvent(item.getSource(), SpeakableEvent.SPEAKABLE_CANCELLED); + if (item.getSpeakableListener() != null) + item.getSpeakableListener().speakableCancelled(e); + for(int i=0; i argument you + can + + +Next you must run a festival in server mode and let the JSAPI code +know where it is. See the festival documentation for instruction on +how to run it, when you have done that you need to set the following +java system properties (eg with -D on the java command line) + + festival.server=myhost.yoyodyne.com + festival.port=1314 + +The jsapi_example program has a `--server host:port' argument and also +looks in the FESTIVAL_SERVER environment variable for information in +the same format. + +You can now run the jsapi_example program. This program simply passes +each command line argument to the synthesizer. For instance + + $ FESTIVAL_SERVER=myhost.yoyodyne.com:1314 + + $ export FESTIVAL_SERVER + + $ bin/jsapi_example "Say this. Then say this." "Wasn't that fun!" + +This gives two strings to the synthesizer, resulting in 3 waveforms +being played (because the first contains two sentences). + +Give jsapi_example the -v argument to get more details of what is +happening. + + $ bin/jsapi_example -v "Say this. Then say this." "Wasn't that fun!" + engine=Festival [myhost.yoyodyne.com:1314] + locale = en_GB + mode = myhost.yoyodyne.com:1314 + running = true + voices = Roger Burroughs + Text is 'Say this. Then say this.' + SpeakableEvent 'Say this. Then say this.' speakableStarted + SpeakableEvent 'Say this. Then say this.' topOfQueue + Text is 'Wasn't that fun!' + SpeakableEvent 'Say this. Then say this.' speakableEnded + SpeakableEvent 'Wasn't that fun!' speakableStarted + SpeakableEvent 'Wasn't that fun!' topOfQueue + SpeakableEvent 'Wasn't that fun!' speakableEnded + finishing + +The code for this program is in + + src/modules/java/cstr/testPrograms/SayHelloWorld.java + +6 Issues +-------- + +Some issues remain as to the mapping between festival and JSAPI. + +JSAPI defines a three level hierarchy Engine/Mode/Voice where an +Engine/Mode pair defines a unique, single locale synthesizer. Festival +Has a flat space of voices, which could of course be partitioned into +locales, but introduces the additional question of which server we are +connecting to. Is the Engine `Festival' or is it `the Festival running +on myhost.yoyodyne.com at port 1314'? This means the engine doesn't +identify the class of all festival synthesizers. Perhaps the host and +port number should be the mode, but then it doesn't map to a unique +locale (since one server could have, say, English and Spanish voices). + +Related to this, how should the location of the server be passed to +the JSAPI code. The current method of using the two system properties +limits us to one server. + +Some work needs to be done on making the JSAPI interface query the +server, to get a list of modes and voices, and also to get and set +properties. + + diff --git a/src/modules/java/cstr/festival/scheme/Makefile b/src/modules/java/cstr/festival/scheme/Makefile new file mode 100644 index 0000000..86537d0 --- /dev/null +++ b/src/modules/java/cstr/festival/scheme/Makefile @@ -0,0 +1,46 @@ + +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### + +TOP=../../../../../.. +DIRNAME=src/modules/java/cstr/festival/scheme + +JAVA_CLASSES = SchemeTokenizer SchemeReader ReflectingSchemeReader +NEED_JAVA=1 + +FILES = Makefile $(JAVA_CLASSES:=.java) + +ALL = .java + +include $(TOP)/config/common_make_rules + diff --git a/src/modules/java/cstr/festival/scheme/ReflectingSchemeReader.java b/src/modules/java/cstr/festival/scheme/ReflectingSchemeReader.java new file mode 100644 index 0000000..203959e --- /dev/null +++ b/src/modules/java/cstr/festival/scheme/ReflectingSchemeReader.java @@ -0,0 +1,168 @@ + + + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Centre for Speech Technology Research \\ + // University of Edinburgh, UK \\ + // Copyright (c) 1996,1997 \\ + // All Rights Reserved. \\ + // Permission is hereby granted, free of charge, to use and distribute \\ + // this software and its documentation without restriction, including \\ + // without limitation the rights to use, copy, modify, merge, publish, \\ + // distribute, sublicense, and/or sell copies of this work, and to \\ + // permit persons to whom this work is furnished to do so, subject to \\ + // the following conditions: \\ + // 1. The code must retain the above copyright notice, this list of \\ + // conditions and the following disclaimer. \\ + // 2. Any modifications must be clearly marked as such. \\ + // 3. Original authors' names are not deleted. \\ + // 4. The authors' names are not used to endorse or promote products \\ + // derived from this software without specific prior written \\ + // permission. \\ + // THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK \\ + // DISCLAIM ALL WARRANTIES With REGARD TO THIS SOFTWARE, INCLUDING \\ + // ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT \\ + // SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE \\ + // FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES \\ + // WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN \\ + // AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, \\ + // ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF \\ + // THIS SOFTWARE. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Author: Richard Caley (rjc@cstr.ed.ac.uk) \\ + // -------------------------------------------------------------------- \\ + // A Scheme reader which returns the s expression as s atring suitable \\ + // for passing on to a scheme interpreter, eg for sending to festival. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + + +package cstr.festival.scheme ; + +import java.lang.*; +import java.util.*; +import java.awt.*; +import java.io.*; + + +public class ReflectingSchemeReader extends SchemeReader + +{ + + public ReflectingSchemeReader(Reader r) + { + super(r); + } + + private int parseSexp(StringBuffer b) + throws IOException + { + + tk.nextToken(); + + while (tk.ttype == StreamTokenizer.TT_EOL) + { + b.append(" "); + tk.nextToken(); + } + + if (tk.ttype == StreamTokenizer.TT_EOF) + return SE_EOF; + + if (tk.ttype == '\'') + { + b.append("'"); + int t = parseSexp(b); + return t; + } + + if (tk.ttype == ')') + { + b.append(") "); + return SE_CB; + } + + if (tk.ttype == '(') + { + b.append("("); + while (true) + { + int se_type = parseSexp(b); + + if (se_type == SE_EOF + || se_type == SE_CB) + return SE_LIST; + } + } + + if (tk.ttype == StreamTokenizer.TT_WORD) + { + b.append(tk.sval); + b.append(" "); + return SE_ATOM; + } + + if (tk.ttype == '"') + { + b.append('"'); + int s=b.length(); + b.append(tk.sval); + for(int i=s; i= ' ' && tk.ttype <= '\u00ff') + { + b.append((char)tk.ttype); + b.append(' '); + return SE_ATOM; + } + + System.out.println("UNEXPECTED "+tk.ttype+" "+tk.sval); + return SE_EOF; + } + + public Object nextExpr() + throws IOException + { + return (Object)nextExprString(); + } + + public String nextExprString() + throws IOException + { + StringBuffer exp = new StringBuffer(80); + + int type = parseSexp(exp); + + return type == SE_EOF ? (String)null : exp.toString(); + } +} diff --git a/src/modules/java/cstr/festival/scheme/SchemeReader.java b/src/modules/java/cstr/festival/scheme/SchemeReader.java new file mode 100644 index 0000000..2a127cf --- /dev/null +++ b/src/modules/java/cstr/festival/scheme/SchemeReader.java @@ -0,0 +1,66 @@ + + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Centre for Speech Technology Research \\ + // University of Edinburgh, UK \\ + // Copyright (c) 1996,1997 \\ + // All Rights Reserved. \\ + // Permission is hereby granted, free of charge, to use and distribute \\ + // this software and its documentation without restriction, including \\ + // without limitation the rights to use, copy, modify, merge, publish, \\ + // distribute, sublicense, and/or sell copies of this work, and to \\ + // permit persons to whom this work is furnished to do so, subject to \\ + // the following conditions: \\ + // 1. The code must retain the above copyright notice, this list of \\ + // conditions and the following disclaimer. \\ + // 2. Any modifications must be clearly marked as such. \\ + // 3. Original authors' names are not deleted. \\ + // 4. The authors' names are not used to endorse or promote products \\ + // derived from this software without specific prior written \\ + // permission. \\ + // THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK \\ + // DISCLAIM ALL WARRANTIES With REGARD TO THIS SOFTWARE, INCLUDING \\ + // ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT \\ + // SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE \\ + // FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES \\ + // WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN \\ + // AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, \\ + // ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF \\ + // THIS SOFTWARE. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Author: Richard Caley (rjc@cstr.ed.ac.uk) \\ + // -------------------------------------------------------------------- \\ + // A simple scheme reader. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + + +package cstr.festival.scheme ; + +import java.lang.*; +import java.util.*; +import java.awt.*; +import java.io.*; + + +public abstract class SchemeReader + +{ + static final int SE_EOF=0; + static final int SE_ATOM=1; + static final int SE_LIST=2; + static final int SE_CB=3; + + SchemeTokenizer tk; + + public SchemeReader(Reader r) + { + tk = new SchemeTokenizer(r); + } + + public abstract Object nextExpr() throws IOException ; +} + + diff --git a/src/modules/java/cstr/festival/scheme/SchemeTokenizer.java b/src/modules/java/cstr/festival/scheme/SchemeTokenizer.java new file mode 100644 index 0000000..0b26847 --- /dev/null +++ b/src/modules/java/cstr/festival/scheme/SchemeTokenizer.java @@ -0,0 +1,151 @@ + + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Centre for Speech Technology Research \\ + // University of Edinburgh, UK \\ + // Copyright (c) 1996,1997 \\ + // All Rights Reserved. \\ + // Permission is hereby granted, free of charge, to use and distribute \\ + // this software and its documentation without restriction, including \\ + // without limitation the rights to use, copy, modify, merge, publish, \\ + // distribute, sublicense, and/or sell copies of this work, and to \\ + // permit persons to whom this work is furnished to do so, subject to \\ + // the following conditions: \\ + // 1. The code must retain the above copyright notice, this list of \\ + // conditions and the following disclaimer. \\ + // 2. Any modifications must be clearly marked as such. \\ + // 3. Original authors' names are not deleted. \\ + // 4. The authors' names are not used to endorse or promote products \\ + // derived from this software without specific prior written \\ + // permission. \\ + // THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK \\ + // DISCLAIM ALL WARRANTIES With REGARD TO THIS SOFTWARE, INCLUDING \\ + // ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT \\ + // SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE \\ + // FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES \\ + // WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN \\ + // AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, \\ + // ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF \\ + // THIS SOFTWARE. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Author: Richard Caley (rjc@cstr.ed.ac.uk) \\ + // -------------------------------------------------------------------- \\ + // A tokeniser for scheme expressions. \\ + // \\ + // This should obviously be a subclass of java.io.StreamTokenizer, \\ + // but that is as much use as a chocolate teapot so we have to go from \\ + // first principles. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + +package cstr.festival.scheme ; + +import java.lang.*; +import java.util.*; +import java.awt.*; +import java.io.*; + +public class SchemeTokenizer +{ + public static final int TT_EOF = StreamTokenizer.TT_EOF; + public static final int TT_WORD = StreamTokenizer.TT_WORD; + static final int TT_NOTHING = -4; + + protected Reader r; + + public String sval; + public int ttype; + + protected int pending =-1; + + public SchemeTokenizer(Reader rd) + { + r=rd; + + ttype=TT_NOTHING; + } + + public int nextToken() throws IOException + { + int c; + + if (pending >=0) + { + c=pending; + pending = -1; + } + else + c = r.read(); + + // Skip whitespace and comments + while (true) + { + while (Character.isWhitespace((char)c)) + c = r.read(); + if (c == ';') + { + while (c != '\n' && c != '\r') + c = r.read(); + while (c == '\n' || c == '\r') + c = r.read(); + } + else + break; + } + + if (c <0) + ttype = TT_EOF; + else if (c == '"') + { + ttype = c; + StringBuffer b = new StringBuffer(100); + boolean escape=false; + + while ((c=r.read()) != '"' || escape) + { + if (escape) + { + if (c == 'n') + c='\n'; + else if (c == 'r') + c='\r'; + else if (c == 't') + c='\t'; + + b.append((char)c); + escape=false; + } + else if (c == '\\') + escape=true; + else + { + b.append((char)c); + escape=false; + } + } + + sval = b.toString(); + } + else if (Character.isLetterOrDigit((char)c) || c=='.' || c=='_' || c=='-' || c=='*' || c==':') + { + ttype = TT_WORD; + StringBuffer b = new StringBuffer(100); + + b.append((char)c); + + while (Character.isLetterOrDigit((char)(c=r.read())) || c=='.' || c=='_' || c=='-' || c=='*' || c==':') + b.append((char)c); + + pending=c; + + sval = b.toString(); + } + else + ttype = c; + + return ttype; + } + +} diff --git a/src/modules/java/cstr/testPrograms/Makefile b/src/modules/java/cstr/testPrograms/Makefile new file mode 100644 index 0000000..78aa0b0 --- /dev/null +++ b/src/modules/java/cstr/testPrograms/Makefile @@ -0,0 +1,48 @@ + +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### + +TOP=../../../../.. +DIRNAME=src/modules/java/cstr/testPrograms + +BUILD_DIRS = +EXTRA_JAVA_CLASSES = SayHelloWorld +JAVA_CLASSES = +NEED_JAVA=1 + +FILES = Makefile $(JAVA_CLASSES:=.java) $(EXTRA_JAVA_CLASSES:=.java) + +ALL = .java + +-include ../../Makefile.version +include $(TOP)/config/common_make_rules diff --git a/src/modules/java/cstr/testPrograms/SayHelloWorld.java b/src/modules/java/cstr/testPrograms/SayHelloWorld.java new file mode 100644 index 0000000..407ba58 --- /dev/null +++ b/src/modules/java/cstr/testPrograms/SayHelloWorld.java @@ -0,0 +1,337 @@ + + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Centre for Speech Technology Research \\ + // University of Edinburgh, UK \\ + // Copyright (c) 1996,1997 \\ + // All Rights Reserved. \\ + // Permission to use, copy, modify, distribute this software and its \\ + // documentation for research, educational and individual use only, is \\ + // hereby granted without fee, subject to the following conditions: \\ + // 1. The code must retain the above copyright notice, this list of \\ + // conditions and the following disclaimer. \\ + // 2. Any modifications must be clearly marked as such. \\ + // 3. Original authors' names are not deleted. \\ + // This software may not be used for commercial purposes without \\ + // specific prior written permission from the authors. \\ + // THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK \\ + // DISCLAIM ALL WARRANTIES With REGARD TO THIS SOFTWARE, INCLUDING \\ + // ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT \\ + // SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE \\ + // FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES \\ + // WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN \\ + // AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, \\ + // ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF \\ + // THIS SOFTWARE. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + // \\ + // Author: Richard Caley (rjc@cstr.ed.ac.uk) \\ + // -------------------------------------------------------------------- \\ + // Simple excercise of the java speech API synthesis methods. \\ + // \\ + //\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\ + + +package cstr.testPrograms ; + +import java.lang.*; +import java.util.*; + +import javax.speech.*; +import javax.speech.synthesis.*; + +import cstr.festival.jsapi.*; + +class MySpeakableListener + extends SpeakableAdapter +{ + SpeakableEvent lastEvent=null; + + public MySpeakableListener() + { + lastEvent=null; + } + + public synchronized void topOfQueue(SpeakableEvent e) + { + lastEvent=e; + notifyAll(); + } + + public synchronized void markerReached(SpeakableEvent e) + { + lastEvent=e; + } + + public synchronized void speakableStarted(SpeakableEvent e) + { + lastEvent=e; + } + + public synchronized void speakableCancelled(SpeakableEvent e) + { + lastEvent=e; + } + + public synchronized void speakablePaused(SpeakableEvent e) + { + lastEvent=e; + } + + public synchronized void speakableResumed(SpeakableEvent e) + { + lastEvent=e; + } + + public synchronized void speakableEnded(SpeakableEvent e) + { + lastEvent=e; + notifyAll(); + } + + public synchronized void wordStarted(SpeakableEvent e) + { + lastEvent=e; + System.out.println("SpeakableEvent wordStarted"); + } + + public synchronized boolean waitForEnd() + { + while (lastEvent == null + || ( lastEvent.getId() != SpeakableEvent.SPEAKABLE_ENDED + && lastEvent.getId() != SpeakableEvent.SPEAKABLE_CANCELLED + ) + ) + { + try { wait(); } + catch (InterruptedException e) + { + return false; + } + } + return true; + } +} + +class VerboseSpeakableListener + extends MySpeakableListener + implements SpeakableListener +{ + public VerboseSpeakableListener() + { + super(); + } + + public synchronized void topOfQueue(SpeakableEvent e) + { + super.topOfQueue(e); + System.out.println("SpeakableEvent '"+ e.getSource() +"' topOfQueue"); + } + + public synchronized void markerReached(SpeakableEvent e) + { + super. markerReached(e); + System.out.println("SpeakableEvent '"+ e.getSource() +"' markerReached"); + } + + public synchronized void speakableStarted(SpeakableEvent e) + { + super.speakableStarted(e); + System.out.println("SpeakableEvent '"+ e.getSource() +"' speakableStarted"); + } + + public synchronized void speakableCancelled(SpeakableEvent e) + { + super.speakableCancelled(e); + System.out.println("SpeakableEvent '"+ e.getSource() +"' speakableCancelled"); + } + + public synchronized void speakablePaused(SpeakableEvent e) + { + super.speakablePaused(e); + System.out.println("SpeakableEvent '"+ e.getSource() +"' speakablePaused"); + } + + public synchronized void speakableResumed(SpeakableEvent e) + { + super.speakableResumed(e); + System.out.println("SpeakableEvent '"+ e.getSource() +"' speakableResumed"); + } + + public synchronized void speakableEnded(SpeakableEvent e) + { + super.speakableEnded(e); + System.out.println("SpeakableEvent '"+ e.getSource() +"' speakableEnded"); + } + + public synchronized void wordStarted(SpeakableEvent e) + { + super.wordStarted(e); + System.out.println("SpeakableEvent '"+ e.getSource() +"' wordStarted"); + } + +} + +public class SayHelloWorld +{ + public static void main(String [] args) + { + SynthesizerModeDesc desc=null; + boolean verbose=false; + boolean sync=false; + String [] text; + + int i; + for(i=0; i +#include "festival.h" +#include "parser.h" +#include "EST_SCFG_Chart.h" + +LISP FT_PParse_Utt(LISP utt) +{ + // Parse Words (using part of speech tags) using given + // probabilistic grammar + EST_Utterance *u = get_c_utt(utt); + LISP rules; + + rules = siod_get_lval("scfg_grammar", NULL); + if (rules == NULL) + return utt; + + EST_SCFG grammar(rules); + + scfg_parse(u->relation("Word"),"phr_pos", + u->create_relation("Syntax"),grammar); + + return utt; +} + +LISP FT_MultiParse_Utt(LISP utt) +{ + // You give them a parser and they just want more ... + // Because in some modes utterance may contain multiple sentences + // and the grammars we have only have only deal in more + // traditional sentences this tries to split the utterance into + // sentences and parse them individualls and add them to + // a single Syntax relation as a list of trees. + EST_Utterance *u = get_c_utt(utt); + LISP rules, eos_tree; + EST_Item *s,*e,*st,*et; + + rules = siod_get_lval("scfg_grammar", NULL); + if (rules == NULL) + return utt; + eos_tree = siod_get_lval("scfg_eos_tree",NULL); + u->create_relation("Syntax"); + EST_SCFG_Chart chart; + chart.set_grammar_rules(rules); + + for (st=u->relation("Token")->head(); st; st = st->next()) + { + for (et=st->next(); et; et=et->next()) + if (wagon_predict(et,eos_tree) != 0) + break; + // Now find related words + s = first_leaf(st)->as_relation("Word"); + e = first_leaf(et->next())->as_relation("Word"); + chart.setup_wfst(s,e,"phr_pos"); + chart.parse(); + chart.extract_parse(u->relation("Syntax"),s,e,TRUE); + st = et; + } + + return utt; +} + +void MultiParse(EST_Utterance &u) +{ + // You give them a parser and they just want more ... + // Because in some modes utterance may contain multiple sentences + // and the grammars we have only have only deal in more + // traditional sentences this tries to split the utterance into + // sentences and parse them individualls and add them to + // a single Syntax release as a list of trees. + LISP rules, eos_tree; + EST_Item *s, *w; + + rules = siod_get_lval("scfg_grammar", NULL); + if (rules == NULL) + EST_error("Couldn't find grammar rules\n"); + eos_tree = siod_get_lval("scfg_eos_tree",NULL); + u.create_relation("Syntax"); + EST_SCFG_Chart chart; + chart.set_grammar_rules(rules); + + // produce a parse wherever there is a sentence end marker or + // the end of utterance. + + for (w = s = u.relation("Word")->head(); w; w = w->next()) + if (w->f_present("sentence_end") || (w->next() == 0)) + { + chart.setup_wfst(s, w->next(), "phr_pos"); + chart.parse(); + chart.extract_parse(u.relation("Syntax"), s, w->next(), TRUE); + s = w->next(); + } +} + +void festival_parser_init(void) +{ + proclaim_module("parser"); + + festival_def_utt_module("ProbParse",FT_PParse_Utt, + "(ProbParse UTT)\n\ + Parse part of speech tags in Word relation. Loads the grammar \n\ + from scfg_grammar_filename and saves the best parse\n\ + in the Syntax Relation."); + festival_def_utt_module("MultiProbParse",FT_MultiParse_Utt, + "(MultiProbParse UTT)\n\ + Parse part of speech tags in Word relation. Unlike ProbParse this \n\ + allows multiple sentences to appear in the one utterance. The CART \n\ + tree in eos_tree is used to define end of sentence. Loads the \n\ + grammar from scfg_grammar_filename and saves the best parse\n\ + in the Syntax Relation."); +} diff --git a/src/modules/rxp/Makefile b/src/modules/rxp/Makefile new file mode 100644 index 0000000..fdf0522 --- /dev/null +++ b/src/modules/rxp/Makefile @@ -0,0 +1,59 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996,1997 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + +TOP=../../.. +DIRNAME=src/modules/rxp +H = + +SRCSC = +SRCSCXX = ttsxml.cc + +SRCS = $(SRCSC) $(SRCSCXX) + +OBJS = $(SRCSC:.c=.o) $(SRCSCXX:.cc=.o) + +FILES = Makefile rxp.mak $(SRCS) $(H) + +LOCAL_INCLUDES = -I../include -I$(EST)/include/rxp + +LOCAL_DEFINES = -DCHAR_SIZE=8 + +VC_LOCAL_DEFINES= /DCHAR_SIZE=8 + +INLIB = $(TOP)/src/lib/libFestival.a + +ALL = .buildlib + +include $(TOP)/config/common_make_rules + + diff --git a/src/modules/rxp/rxp.mak b/src/modules/rxp/rxp.mak new file mode 100644 index 0000000..095999b --- /dev/null +++ b/src/modules/rxp/rxp.mak @@ -0,0 +1,50 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Author: Alan W Black ## + ## -------------------------------------------------------------------- ## + ## Richard's (Tobin) XML parser ## + ## ## + ########################################################################### + +INCLUDE_RXP=1 + +MOD_DESC_RXP=Festival Access To RXP XML Parser + +ifeq ($(DIRNAME),src/modules) + EXTRA_LIB_BUILD_DIRS := $(EXTRA_LIB_BUILD_DIRS) rxp +endif + + + + diff --git a/src/modules/rxp/ttsxml.cc b/src/modules/rxp/ttsxml.cc new file mode 100644 index 0000000..06bca48 --- /dev/null +++ b/src/modules/rxp/ttsxml.cc @@ -0,0 +1,216 @@ + /*************************************************************************/ + /* */ + /* Centre for Speech Technology Research */ + /* University of Edinburgh, UK */ + /* Copyright (c) 1998 */ + /* All Rights Reserved. */ + /* */ + /* Permission is hereby granted, free of charge, to use and distribute */ + /* this software and its documentation without restriction, including */ + /* without limitation the rights to use, copy, modify, merge, publish, */ + /* distribute, sublicense, and/or sell copies of this work, and to */ + /* permit persons to whom this work is furnished to do so, subject to */ + /* the following conditions: */ + /* 1. The code must retain the above copyright notice, this list of */ + /* conditions and the following disclaimer. */ + /* 2. Any modifications must be clearly marked as such. */ + /* 3. Original authors' names are not deleted. */ + /* 4. The authors' names are not used to endorse or promote products */ + /* derived from this software without specific prior written */ + /* permission. */ + /* */ + /* THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK */ + /* DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING */ + /* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT */ + /* SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE */ + /* FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES */ + /* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN */ + /* AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, */ + /* ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF */ + /* THIS SOFTWARE. */ + /* */ + /*************************************************************************/ + /* */ + /* Author : Alan W Black */ + /* Date : May 1998 */ + /* ------------------------------------------------------------------- */ + /* Interface between festival and the Richard's XML parser */ + /* */ + /* Provides a LISP function to do the analysis so that this directory */ + /* reasonable be optional */ + /* */ + /*************************************************************************/ + +#include "EST_Pathname.h" +#include "festival.h" +#include "text.h" +#include "rxp.h" + +// So we can share the known_ids table. +#include "ling_class/EST_utterance_xml.h" + + +static InputSource entity_open(Entity ent, void *arg); + +static LISP tts_file_xml(LISP filename) +{ + // Parse the xml file using the LTG's xml parser + EST_String inname = get_c_string(filename); + EST_String line, type, remainder; + Parser p; + Entity ent = 0; + InputSource source = 0; + LISP element_defs; + LISP utt = NIL; // for cummulation of tokens + + if (inname == "-") + source = SourceFromStream("",stdin); + else + { + ent = NewExternalEntity(0,0,strdup8(inname),0,0); + if (ent) + source = EntityOpen(ent); + } + + if (!source) + { + cerr << "xml: unable to open input file \"" << inname << "\"" << endl; + festival_error(); + } + element_defs = siod_get_lval("xxml_elements",NULL); + p = NewParser(); + ParserSetEntityOpener(p, entity_open); + ParserSetFlag(p, ReturnDefaultedAttributes, 1); + if (ParserPush(p, source) == -1) + { + cerr << "xml: parser error\n" << endl; + festival_error(); + } + + while (1) + { + XBit bit = ReadXBit(p); + if (bit->type == XBIT_eof) + break; + else if (bit->type == XBIT_start) + { + Attribute b; + LISP att=NIL; + for (b=bit->attributes; b; b=b->next) + att = cons(cons(rintern(b->definition->name), + cons(cons(rintern(b->value),NIL),NIL)),att); + utt = xxml_call_element_function( + EST_String("(")+bit->element_definition->name,att, + element_defs,utt); + } + else if (bit->type == XBIT_end) + { + utt = xxml_call_element_function( + EST_String(")")+bit->element_definition->name,NIL, + element_defs,utt); + } + else if (bit->type == XBIT_empty) + { + Attribute b; + LISP att=NIL; + for (b=bit->attributes; b; b=b->next) + att = cons(cons(rintern(b->definition->name), + cons(cons(rintern(b->value),NIL),NIL)),att); + utt = xxml_call_element_function( + EST_String(bit->element_definition->name),att, + element_defs,utt); + } + else if (bit->type == XBIT_pcdata) + { + utt = xxml_get_tokens(bit->pcdata_chars, + siod_get_lval("xxml_word_features",NULL), + utt); + } + else if (bit->type == XBIT_cdsect) + { + utt = xxml_get_tokens(bit->cdsect_chars, + siod_get_lval("xxml_word_features",NULL), + utt); + } + else if (bit->type == XBIT_pi) + { + cerr << "xml: ignoring pi " << bit->pi_chars << endl; + } + else if (bit->type == XBIT_error) + { + ParserPerror(p,bit); + festival_error(); + } + else + { + // ignore it + } + FreeXBit(bit); + } + // Last call (should synthesize trailing tokens) + utt = xxml_call_element_function(" ",NIL,element_defs,utt); + + FreeDtd(p->dtd); + FreeParser(p); + if (ent) FreeEntity(ent); + return NIL; +} + +static LISP xml_register_id(LISP pattern_l, LISP result_l) +{ + EST_String pattern = get_c_string(pattern_l); + EST_String result = get_c_string(result_l); + + utterance_xml_register_id(pattern, result); + return NIL; +} + +static LISP xml_registered_ids() +{ + EST_StrList ids; + utterance_xml_registered_ids(ids); + LISP result= NIL; + + EST_Litem *p; + + for(p=ids.head(); p != NULL; p=p->next()) + { + EST_String pat = ids(p); + p=p->next(); + EST_String res = ids(p); + result = cons( + cons(strcons(pat.length(), pat), + strcons(res.length(), res)), + result); + } + + return result; +} + +void festival_rxp_init() +{ + proclaim_module("rxp"); + + init_subr_1("tts_file_xml",tts_file_xml, + "(tts_file_xml FILE)\n\ + Low level tts processor for XML files. This assumes that element\n\ + instructions are set up in the variable xxml_elements."); + + init_subr_2("xml_register_id", xml_register_id, + "(xml_register_id PATTERN RESULT) \n\ + Add a rule for where to find XML entities such as DTDs.\n\ + The pattern is a regular expression, the result is a string\n\ + with substitutions. If the PATTERN matches the a PUBLIC\n\ + or SYSTEM identifier of an XML entity, the RESULT is expanded\n\ + and then used as a filename."); + + init_subr_0("xml_registered_ids", xml_registered_ids, + "(xml_registered_ids) \n\ + Return the current list of places to look for XML entities."); +} + +static InputSource entity_open(Entity ent, void *arg) +{ + (void)arg; + return utterance_xml_try_and_open(ent); +} diff --git a/src/scripts/Makefile b/src/scripts/Makefile new file mode 100644 index 0000000..89d3b43 --- /dev/null +++ b/src/scripts/Makefile @@ -0,0 +1,50 @@ + ########################################################################### + ## ## + ## Centre for Speech Technology Research ## + ## University of Edinburgh, UK ## + ## Copyright (c) 1996 ## + ## All Rights Reserved. ## + ## ## + ## Permission is hereby granted, free of charge, to use and distribute ## + ## this software and its documentation without restriction, including ## + ## without limitation the rights to use, copy, modify, merge, publish, ## + ## distribute, sublicense, and/or sell copies of this work, and to ## + ## permit persons to whom this work is furnished to do so, subject to ## + ## the following conditions: ## + ## 1. The code must retain the above copyright notice, this list of ## + ## conditions and the following disclaimer. ## + ## 2. Any modifications must be clearly marked as such. ## + ## 3. Original authors' names are not deleted. ## + ## 4. The authors' names are not used to endorse or promote products ## + ## derived from this software without specific prior written ## + ## permission. ## + ## ## + ## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## + ## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## + ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## + ## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## + ## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## + ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## + ## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## + ## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## + ## THIS SOFTWARE. ## + ## ## + ########################################################################### + ## ## + ## Makfile for script directory ## + ## --------------------------------------------------------------------- ## + ## Author: Richard Caley (rjc@cstr.ed.ac.uk) ## + ## ## + ########################################################################### + +TOP=../.. +DIRNAME=src/scripts +SCRIPTS= festival_server.sh festival_server_control.sh +EXTRA_SCRIPTS = jsapi_example.sh festival_client_java.sh +FILES = Makefile shared_setup_sh shared_setup_prl shared_script $(SCRIPTS) $(EXTRA_SCRIPTS) +INSTALL = +ALL = $(SCRIPTS) + +include $(TOP)/config/common_make_rules +include $(EST)/config/rules/bin_process.mak + diff --git a/src/scripts/festival_client_java.sh b/src/scripts/festival_client_java.sh new file mode 100644 index 0000000..abf81b1 --- /dev/null +++ b/src/scripts/festival_client_java.sh @@ -0,0 +1,58 @@ +#!/bin/sh +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +TOP=__TOP__ +EST=__EST__ +CLASSPATH=__CLASSPATH__ +JAVA_HOME=__JAVA_HOME__ + +#__SHARED_SETUP__ + +if [ "$1" = "-g" ] + then + shift + g="_g -debug" +fi + +CLASSPATH="$TOP/src/lib/festival.jar:$EST/lib/est_basic.jar:$CLASSPATH" + +export CLASSPATH + +exec $JAVA_HOME/bin/java$g \ + cstr.festival.Client \ + "$@" + + + + + diff --git a/src/scripts/festival_server.sh b/src/scripts/festival_server.sh new file mode 100644 index 0000000..b7e68c8 --- /dev/null +++ b/src/scripts/festival_server.sh @@ -0,0 +1,285 @@ +#!/bin/sh +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +# # +# Run a festival server. # +# # +########################################################################### + +TOP=__TOP__ +EST=__EST__ + +#__SHARED_SETUP__ + +useage() +{ +cat <&3 +} + +handle_hup() +{ +log_mess "got SIGHUP" +respawn_wait=0; +if [ $festival_pid = 0 ] + then + : +else + kill -9 $festival_pid + festival_pid=0 +fi +} + +handle_term() +{ +log_mess "got SIGTERM" +respawn=false; +respawn_wait=0; +if [ $festival_pid = 0 ] + then + : +else + kill -9 $festival_pid + festival_pid=0 +fi +} + +create_server_startup() +{ + local port server_log su + port="$1" + server_log="$2" + + server_startup="$3" + + { + cat < $server_startup +} + +port=1314 +festival=festival +logdir="." +config="" +show=false + +while [ $# -gt 0 ] + do + case "$1" in + -h* | -\? ) useage 0;; + -p* ) port=$2; shift 2;; + -l* ) logdir=$2; shift 2;; + -c* ) config=$2; shift 2;; + -show* ) show=true; shift;; + -* ) useage 1;; + * ) break;; + esac +done + +case $# in +1 ) festival=$1;; +0 ) : ;; +* ) useage 2;; +esac + +respawn=true +normal_respawn_wait=10 +festival_pid=0 + +if [ $port = "1314" ] + then + pext="" +else + pext=".$port" +fi + +server_log="$logdir/festival_server$pext.log" + +# hangup +trap "handle_hup" 1 +# int +trap "handle_term" 2 +# term (ie default kill signal) +trap "handle_term" 15 +# exit from shell +trap "handle_term" 0 + +if $show + then + create_server_startup $port $server_log /tmp/$$ 3>/dev/null + fl=false + while read l + do + if $fl ; then echo $l ; fi + if [ "$l" = ";---" ] ; then fl=true ; fi + done "$logdir/festival_wrapper_pid$pext" + +while $respawn + do + respawn_wait=$normal_respawn_wait + + if [ $festival_pid = 0 ] + then + : + else + kill -9 $festival_pid + festival_pid=0 + fi + + log_mess "run $festival port=$port" + + create_server_startup $port $server_log "$logdir/festival_server$pext.scm" + + $festival --server $server_startup & + + x=$! + if [ -n "$x" ] + then + festival_pid="$x" + + log_mess "waiting" + + while [ $festival_pid -ne 0 ] && kill -0 $festival_pid + do + sleep 60& + echo $! > "$logdir/festival_sleep_pid$pext" + wait $! + done + + if [ "$festival_pid" = 0 ] + then + : + else + wait $festival_pid + fi + + log_mess "festival exited $?" + else + log_mess "can't run $festival" + fi + + if $respawn + then + log_mess "respawn wait $respawn_wait seconds" + sleep $respawn_wait + fi +done +} >$server_log 3>$server_log + +exit 0 diff --git a/src/scripts/festival_server_control.sh b/src/scripts/festival_server_control.sh new file mode 100644 index 0000000..f1d883d --- /dev/null +++ b/src/scripts/festival_server_control.sh @@ -0,0 +1,123 @@ +#!/bin/sh +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +# # +# Run a festival server. # +# # +########################################################################### + +TOP=__TOP__ +EST=__EST__ + +#__SHARED_SETUP__ + +useage() +{ +cat </dev/null + then + case "$action" in + restart ) kill -HUP $wrapper_pid;; + kill|quit|exit ) kill -TERM $wrapper_pid;; + esac + kill -TERM $sleep_pid + else + echo "Server wrapper not running for port $port" + fi + +} + +exit 0 + diff --git a/src/scripts/jsapi_example.sh b/src/scripts/jsapi_example.sh new file mode 100644 index 0000000..4ecab13 --- /dev/null +++ b/src/scripts/jsapi_example.sh @@ -0,0 +1,107 @@ +#!/bin/sh +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1996 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +TOP=__TOP__ +EST=__EST__ +CLASSPATH=__CLASSPATH__ +JAVA_HOME=__JAVA_HOME__ + +#__SHARED_SETUP__ + +useage() { + + cat < ... +In evaluation mode "filenames" starting with ( are evaluated inline +Festival Speech Synthesis System: 2.1:release November 2010 +-q Load no default setup files +--libdir + Set library directory pathname +-b Run in batch mode (no interaction) +--batch Run in batch mode (no interaction) +--tts Synthesize text in files as speech + no files means read from stdin + (implies no interaction by default) +-i Run in interactive mode (default) +--interactive + Run in interactive mode (default) +--pipe Run in pipe mode, reading commands from + stdin, but no prompt or return values + are printed (default if stdin not a tty) +--language + Run in named language, default is + english, spanish and welsh are available +--server Run in server mode waiting for clients + of server_port (1314) +--script + Used in #! scripts, runs in batch mode on + file and passes all other args to Scheme +--heap {1000000} + Set size of Lisp heap, should not normally need + to be changed from its default +-v Display version number and exit +--version Display version number and exit + diff --git a/testsuite/correct/modes_script.out b/testsuite/correct/modes_script.out new file mode 100644 index 0000000..ba0dada --- /dev/null +++ b/testsuite/correct/modes_script.out @@ -0,0 +1,302 @@ + + + +SABLE mode +# +0.9927 100 Homographs +1.1053 100 are +1.3276 100 words +1.5219 100 that +1.6625 100 are +1.9794 100 written +2.0507 100 the +2.5113 100 same +2.9623 100 but +3.1767 100 have +3.6562 100 different +4.7686 100 pronunciations +5.3067 100 such +5.5096 100 as +5.8947 100 lyves +6.0943 100 and +6.6187 100 lives +7.2384 100 You +7.5743 100 say +7.7106 100 e +7.8591 100 thur +8.3142 100 while +8.5008 100 I +8.8347 100 say +8.9573 100 i +9.1226 100 thur +# +0.2442 100 We +0.3856 100 can +0.5176 100 say +0.6803 100 things +1.0162 100 fast +# +0.3916 100 and +1.6744 100 slowly +# +0.3967 100 And +0.5812 100 then +0.7590 100 at +1.2333 100 normal +1.7070 100 speed + + +OGI's mark up mode +# +0.3960 100 This +0.5348 100 is +0.6286 100 an +1.3301 100 example +# +# +0.6264 100 This +0.8762 100 is +0.9298 100 a +1.6096 100 slow +2.6337 100 example +# +# +0.9290 100 This +1.3132 100 is +1.3943 100 a +1.9908 100 very +2.7657 100 slow +4.3412 100 example +# +0.2742 100 a +0.7320 100 normal +1.0788 100 one +# +0.3193 100 and +0.3522 100 a +0.7680 100 fast +1.0759 100 talking +1.6530 100 example +# +0.2713 100 Maybe +0.3723 100 this +0.5197 100 one +0.6032 100 is +0.7087 100 too +1.0439 100 fast +# +# +0.4185 100 My +0.8177 100 name +0.9893 100 is +1.4901 100 Mike +# +0.3887 100 m +0.5879 100 i +1.0145 100 k +1.3130 100 e +# +0.3876 100 My +1.0293 100 telephone +1.4171 100 number +1.6850 100 is +# +0.6027 100 six +1.0735 100 five +1.5615 100 zero +1.9347 100 two +2.3650 100 seven +2.5542 100 eight +3.0240 100 seven +# +0.5290 100 Mike +0.8955 100 here +# +0.4900 100 Chris +0.8800 100 here +# +0.3960 100 This +0.5486 100 is +1.0602 100 Gordon +# +0.3670 100 I'm +0.6943 100 most +1.1428 100 terribly +1.4801 100 sorry +1.7193 100 to +2.2120 100 interrupt +2.3679 100 at +2.5625 100 this +2.9508 100 time +3.3828 100 but +3.5964 100 my +3.8636 100 name +3.9962 100 is +4.3181 100 Roger +# +0.2672 100 A +0.4587 100 good +0.6546 100 day +0.8218 100 to +1.1079 100 you + + +An email mode +# +0.4325 100 From +# +0.4879 100 Alan +0.8582 100 W +1.1638 100 Black +1.9989 100 awb +2.3649 100 at +3.0684 100 cstr +3.2789 100 dot +3.4261 100 ed +3.6272 100 dot +3.9380 100 ac +4.1695 100 dot +4.6229 100 uk +# +0.8100 100 Subject +# +0.7615 100 Example +0.9717 100 mail +1.4312 100 message +# +0.4879 100 Alan +0.8582 100 W +1.1744 100 Black +1.4320 100 writes +1.5817 100 on +1.9377 100 twenty +2.3531 100 seventh +2.8842 100 November +3.5314 100 nineteen +3.9270 100 ninety +4.3941 100 six +# +0.3432 100 I'm +0.6427 100 looking +0.8382 100 for +0.8575 100 a +1.2028 100 demo +1.4475 100 mail +1.9825 100 message +2.3019 100 for +2.9206 100 Festival +3.2794 100 but +3.6595 100 can't +3.9663 100 seem +4.0910 100 to +4.5691 100 find +4.8330 100 any +5.3657 100 suitable +# +0.2923 100 It +0.4736 100 should +0.6184 100 at +0.9506 100 least +1.1097 100 have +1.3371 100 some +1.8206 100 quoted +2.3640 100 text +2.6533 100 and +2.7925 100 have +2.9950 100 some +3.6618 100 interesting +4.2890 100 tokens +4.6581 100 like +4.7210 100 a +4.9262 100 U +5.1250 100 R +5.4252 100 L +5.7507 100 or +6.0542 100 such +6.5034 100 like +# +0.5922 100 Alan +# +0.4171 100 Well +0.6177 100 I'm +0.8280 100 not +1.0981 100 sure +1.7706 100 exactly +2.1386 100 what +2.2396 100 you +2.4730 100 mean +2.6799 100 but +3.4537 100 awb +3.6200 100 at +4.0615 100 cogsci +4.2720 100 dot +4.4192 100 ed +4.6202 100 dot +4.9311 100 ac +5.1625 100 dot +5.6160 100 uk +6.0661 100 has +6.1574 100 an +6.6786 100 interesting +6.9187 100 home +7.3790 100 page +7.7348 100 at +8.0158 100 h +8.2150 100 t +8.3928 100 t +8.6405 100 p +0.0000 100 : +9.2899 100 slash +9.7375 100 slash +10.1158 100 w +10.5072 100 w +10.8986 100 w +11.1222 100 dot +11.8090 100 cstr +12.0195 100 dot +12.1667 100 ed +12.3678 100 dot +12.6924 100 ac +12.9295 100 dot +13.3850 100 uk +13.9456 100 slash +14.2057 100 ~ +14.9393 100 awb +15.3305 100 slash +15.5351 100 which +15.7482 100 might +15.9966 100 be +16.4165 100 what +16.5380 100 you're +16.8793 100 looking +17.0996 100 for +# +0.5377 100 Alan +# +0.3029 100 P +0.6234 100 S +0.7957 100 Will +0.9578 100 you +1.3338 100 attend +1.3841 100 the +1.9387 100 course +# +0.3271 100 I +0.6470 100 hope +0.9680 100 so +# +0.4741 100 bye +0.6674 100 for +1.0077 100 now + + +A singing mode +# +0.6000 100 doe +1.1600 100 ray +1.7500 100 me +2.3600 100 fah +2.9700 100 sew +3.5500 100 lah +4.1100 100 tee +4.8000 100 doe diff --git a/testsuite/correct/parse_script.out b/testsuite/correct/parse_script.out new file mode 100644 index 0000000..854b07d --- /dev/null +++ b/testsuite/correct/parse_script.out @@ -0,0 +1,21 @@ + +(("NT00" + ("NT14" + ("NT06" ("NT01" ("jj" "Probabilistic")) ("NT07" ("nns" "grammars"))) + ("NT17" + ("NT04" ("vbp" "are")) + ("NT08" + ("NT16" ("jj" "easy")) + ("NT09" + ("NT15" ("to" "to")) + ("NT08" + ("NT04" ("vb" "use")) + ("NT08" + ("NT04" ("punc" ",")) + ("NT09" + ("NT15" ("cc" "but")) + ("NT08" + ("NT16" ("jj" "difficult")) + ("NT09" ("NT15" ("to" "to")) ("NT08" ("vb" "train"))))))))))) + ("NT13" ("punc" ".")))) + diff --git a/testsuite/correct/scherr_script.out b/testsuite/correct/scherr_script.out new file mode 100644 index 0000000..f2f106e --- /dev/null +++ b/testsuite/correct/scherr_script.out @@ -0,0 +1,27 @@ + +checking error handling in batch mode +SIOD ERROR: unbound variable : ########################################################################### +closing a file left open: Makefile +SIOD ERROR: unbound variable : printtt +closing a file left open: data/scherr.scm +caught the error +checking error handling in interactive mode + +Festival Speech Synthesis System 2.1:release November 2010 +Copyright (C) University of Edinburgh, 1996-2010. All rights reserved. + +clunits: Copyright (C) University of Edinburgh and CMU 1997-2010 +clustergen_engine: Copyright (C) CMU 2005-2010 +hts_engine: +The HMM-based speech synthesis system (HTS) +hts_engine API version 1.04 (http://hts-engine.sourceforge.net/) +Copyright (C) 2001-2010 Nagoya Institute of Technology + 2001-2008 Tokyo Institute of Technology +All rights reserved. +For details type `(festival_warranty)' +festival> SIOD ERROR: unbound variable : ########################################################################### +closing a file left open: Makefile +SIOD ERROR: unbound variable : printtt +closing a file left open: data/scherr.scm +caught the error +festival> festival> diff --git a/testsuite/correct/text_script.out b/testsuite/correct/text_script.out new file mode 100644 index 0000000..2c14282 --- /dev/null +++ b/testsuite/correct/text_script.out @@ -0,0 +1,90 @@ + +Word test: e.g. 12,000 pounds + e g twelve thousand pounds +Word test: It costs $12 million + It costs twelve million dollars +Word test: Prussia's influence (1864-87) will be discussed at a conference + May 2-10. Call (203) 450-3343, ($1.43 a minute) for details. + Prussia 's influence eighteen sixty four to eighty seven will be discussed at a conference May second to tenth Call two zero three four five zero three three four three one dollar forty three a minute for details +Word test: 23/06/97 at 03:45 + twenty three zero six ninety seven at three forty five +Word test: During the 1950's and 60s, 1.45% took AB123. + During the nineteen fifty 's and sixty 's one point four five percent took A B one two three +Word test: $10, $10,000, $1.00, $1.23, $1.03, $2.56. + ten dollars ten thousand dollars one dollar one dollar twenty three one dollar three two dollars fifty six +Word test: HK$100 million, \10,000, Y1.2345, £1.23, M$1.03, C$2.56. + one hundred million Hong Kong dollars ten thousand yen one point two three four five yen one pound twenty three one M dollar three two Canadian dollars fifty six +Word test: A$1.25, £650.00, C$1.23 billion, Y10,000, #1.23. + one Australian dollar twenty five six hundred and fifty pounds one point two three billion Canadian dollars ten thousand yen one pound twenty three +Word test: I think that is No 123. + I think that is number one hundred and twenty three +Word test: It was exactly 12:45:23. + It was exactly twelve hours forty five minutes and twenty three seconds +Word test: The date will be 3/3/04. + The date will be three three o four +Word test: Its on the 1st, and 2nd on 185th and Cornell. + Its on the first and second on one hundred and eighty fifth and Cornell +Word test: About 2/3 of the stocks increased by more than 1/16%, while the other 1/2 didn't. + About two third 's of the stocks increased by more than one sixteenth percent while the other half didn't +Word test: The U.S. government, EU and NASA are involved. + The U S government E U and NASA are involved +Word test: Henry V: Part I Act II Scene XI: Mr X is I believe, V I Lenin, + and not Charles I. + Henry the fifth Part one Act two Scene eleven Mr X is I believe V I Lenin and not Charles the first +Word test: Dr Taylor is at 12 High St. Edinburgh. + doctor Taylor is at twelve High street Edinburgh +Word test: Dr Taylor is at St Andrew's St, Edinburgh. + doctor Taylor is at saint Andrew 's street Edinburgh +Word test: Dr Taylor is at St Andrew's, St Albans. + doctor Taylor is at saint Andrew 's saint Albans +Word test: Dr Taylor is at Dr Johnson Dr, West Linton. + doctor Taylor is at doctor Johnson drive West Linton +Word test: Dr Taylor is with Dr Black Dr Caley and St Erasmus. + doctor Taylor is with doctor Black doctor Caley and saint Erasmus +Word test: Dr Taylor is at Dr Black Dr near the bus station. + doctor Taylor is at doctor Black drive near the bus station +Phrase test: The man wanted to go for a drive in. + The man wanted to go B + for a drive in BB + +Phrase test: The man wanted to go for a drive in the country. + The man wanted to go for a drive B + in the country BB + +Phrase test: The man wanted to go for a drive-in the country. + The man wanted to go B + for a drive-in the country BB + +Phrase test: The man wanted to go for a drive--in the country. + The man wanted to go for a drive B + in the country BB + +Phrase test: He gave the big boys' lunch in the park. + He gave the big boys lunch B + in the park BB + +Phrase test: He gave the `big boys' lunch in the park. + He gave the big boys B + lunch in the park BB + +Phrase test: That is it---unless you want more. + That is it B + unless you want more BB + +Phrase test: That is it -- unless you want more. + That is it B + unless you want more BB + +Segment test: They called him Mr. Black though he preferred Alan. + # dh ei k oo l d h i m m i s t @ b l a k # dh ou h ii p r i f @@ d a l @ n # +Segment test: They called him Mr. Black was the colour of his beard. + # dh ei k oo l d h i m m i s t @ # b l a k w o z dh @ k uh l @ r o v h i z b i@ d # +Segment test: (They called him Mr.) Black was the colour of his beard. + # dh ei k oo l d h i m m i s t @ # b l a k w o z dh @ k uh l @ r o v h i z b i@ d # +Segment test: The U.S. Secretary didn't arrive in time. + # dh @ y uu e s s e k r @ t r ii d i d @ n t # @ r ai v i n t ai m # +Segment test: My cat who lives in Edinburgh has nine lives. + # m ai k a t # h uu l i v z i n e d i n b r @ h a z n ai n l ai v z # +Segment test: Prussia's influence (1864-87) will be discussed at a conference + May 2-10. Call (203) 450-3343, ($1.43 a minute) for details. + # p r uh s i@ z i n f l u@ n s # ei t ii n s i k s t ii f oo t uu ei t ii s e v @ n # w i l b ii d i s k uh s t a t @ k o n f @ r @ n s # m ei s e k @ n d t uu t e n th # k oo l t uu z i@ r ou th r ii # f oo f ai v z i@ r ou # th r ii th r ii f oo th r ii # w uh n d o l @ f oo t ii th r ii @ m i n i t # f oo d ii t ei l z # diff --git a/testsuite/correct/voice_script.out b/testsuite/correct/voice_script.out new file mode 100644 index 0000000..98fc56a --- /dev/null +++ b/testsuite/correct/voice_script.out @@ -0,0 +1,4 @@ + +rab voice: pass +kal voice: pass +slt voice: pass diff --git a/testsuite/data/.festivalrc b/testsuite/data/.festivalrc new file mode 100644 index 0000000..dec091f --- /dev/null +++ b/testsuite/data/.festivalrc @@ -0,0 +1,8 @@ +;;; +;;; A festivalrc to ensure the tests can be run in the same environment +;;; + +(if (boundp 'voice_rab_diphone) + (set! voice_default 'voice_rab_diphone)) + + diff --git a/testsuite/data/Makefile b/testsuite/data/Makefile new file mode 100644 index 0000000..306d5c8 --- /dev/null +++ b/testsuite/data/Makefile @@ -0,0 +1,43 @@ +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## Data files used in test ## +########################################################################### +TOP=../.. +DIRNAME=testsuite/data + +TESTFILES = fest1.scm fest2.scm utt1.scm dump1.scm voices.scm modes.scm \ + scherr.scm testinstall.scm htstest.scm + +FILES = Makefile .festivalrc $(TESTFILES) + +include $(TOP)/config/common_make_rules diff --git a/testsuite/data/dump1.scm b/testsuite/data/dump1.scm new file mode 100644 index 0000000..dbb28c7 --- /dev/null +++ b/testsuite/data/dump1.scm @@ -0,0 +1,49 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Simple test of a single utterance + +(set! utt1 (Utterance Text "\"This test,\" she said, \"Should include +punctuation; and interesting symbols like, 555-3434 ; 56&67, 4/7 etc.\"")) + +(utt.save utt1 "tmp/utt1.utt") +(utt.load utt1 "tmp/utt1.utt") +(utt.synth utt1) +(utt.save.segs utt1 "tmp/savesegs1.segs") +(utt.save utt1 "tmp/utt2.utt") +(set! utt2 (utt.load nil "tmp/utt2.utt")) +(utt.synth utt2) +(utt.save.segs utt2 "tmp/savesegs2.segs") + + + + diff --git a/testsuite/data/fest1.scm b/testsuite/data/fest1.scm new file mode 100644 index 0000000..c04ade8 --- /dev/null +++ b/testsuite/data/fest1.scm @@ -0,0 +1,74 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Simple test of a single utterance + +(define (remove_ids utt) + "(remove_ids utt) +To all diffs on saved files this removed the _id feature from +every item in utt so other utts will look the same as this if +regenerated." + (mapcar + (lambda (r) + (mapcar + (lambda (i) + (item.remove_feature i "id")) + (utt.relation.items utt r))) + (utt.relationnames utt)) + utt) + +(set! utt1 + (Utterance Text + "On May 5 1985, around 1985 people joined Mr. Black's project.")) + +(utt.synth utt1) +(utt.save.words utt1 "-") +(print (utt.features utt1 'Word '(name R:Token.parent.name R:SylStructure.daughtern.name))) +;; tidy the utt up so it'll look the same on resynthesis +(remove_ids utt1) +(utt.set_feat utt1 "max_id" 0) +(utt.set_feat utt1 "filename" 0) +(utt.save utt1 "tmp/fest2.utt") +(utt.save.segs utt1 "-") + +;;; Test Utterance i/o +(set! utt2 (utt.load nil "tmp/fest2.utt")) +(utt.synth utt2) +(remove_ids utt2) +(utt.set_feat utt2 "max_id" 0) +(utt.set_feat utt2 "filename" 0) +(utt.save utt2 "tmp/fest3.utt") + + + + + diff --git a/testsuite/data/fest2.scm b/testsuite/data/fest2.scm new file mode 100644 index 0000000..8747627 --- /dev/null +++ b/testsuite/data/fest2.scm @@ -0,0 +1,92 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Some specific tokens etc that might cause problems + +(require 'festtest) + +;;; Test the tokenization +(test_words "e.g. 12,000 pounds") +(test_words "It costs $12 million") +(test_words "Prussia's influence (1864-87) will be discussed at a conference + May 2-10. Call (203) 450-3343, ($1.43 a minute) for details.") +(test_words "23/06/97 at 03:45") +(test_words "During the 1950's and 60s, 1.45% took AB123.") +;;; Money, money, money, (must be funny ...) +(test_words "$10, $10,000, $1.00, $1.23, $1.03, $2.56.") +(test_words "HK$100 million, \\10,000, Y1.2345, £1.23, M$1.03, C$2.56.") +(test_words "A$1.25, £650.00, C$1.23 billion, Y10,000, #1.23.") +;;; Various special symbols +(test_words "I think that is No 123.") +(test_words "It was exactly 12:45:23.") +(test_words "The date will be 3/3/04.") +(test_words "Its on the 1st, and 2nd on 185th and Cornell.") +;;; Fractions +(test_words "About 2/3 of the stocks increased by more than 1/16%, while the other 1/2 didn't.") +;;; Abbreviations +(test_words "The U.S. government, EU and NASA are involved.") +;;; Roman numerals +(test_words "Henry V: Part I Act II Scene XI: Mr X is I believe, V I Lenin, + and not Charles I.") +;;; Saint Street Doctor Drive +(test_words "Dr Taylor is at 12 High St. Edinburgh.") +(test_words "Dr Taylor is at St Andrew's St, Edinburgh.") +(test_words "Dr Taylor is at St Andrew's, St Albans.") +(test_words "Dr Taylor is at Dr Johnson Dr, West Linton.") +(test_words "Dr Taylor is with Dr Black Dr Caley and St Erasmus.") +(test_words "Dr Taylor is at Dr Black Dr near the bus station.") + +;; Test the phrase break mechanism +(test_phrases "The man wanted to go for a drive in.") +(test_phrases "The man wanted to go for a drive in the country.") +(test_phrases "The man wanted to go for a drive-in the country.") +(test_phrases "The man wanted to go for a drive--in the country.") +(test_phrases "He gave the big boys' lunch in the park.") +(test_phrases "He gave the `big boys' lunch in the park.") +(test_phrases "That is it---unless you want more.") +(test_phrases "That is it -- unless you want more.") + +;; Some tests of utterance/punctuation boundary +(test_segments "They called him Mr. Black though he preferred Alan.") +(test_segments "They called him Mr. Black was the colour of his beard.") +(test_segments "(They called him Mr.) Black was the colour of his beard.") +(test_segments "The U.S. Secretary didn't arrive in time.") + +(test_segments "My cat who lives in Edinburgh has nine lives.") + +;;; This was showed up different durations on different platforms +;;; sees ok now. Problem was lexicon was in different order +;;; (i.e. qsort did different things on different platforms) +(test_segments "Prussia's influence (1864-87) will be discussed at a conference + May 2-10. Call (203) 450-3343, ($1.43 a minute) for details.") + + diff --git a/testsuite/data/htstest.scm b/testsuite/data/htstest.scm new file mode 100644 index 0000000..7c474e5 --- /dev/null +++ b/testsuite/data/htstest.scm @@ -0,0 +1,58 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;;; +;;; Language Technologies Institute ;;; +;;; Carnegie Mellon University ;;; +;;; Copyright (c) 2004 ;;; +;;; All Rights Reserved. ;;; +;;; ;;; +;;; Permission is hereby granted, free of charge, to use and distribute ;;; +;;; this software and its documentation without restriction, including ;;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;;; +;;; permit persons to whom this work is furnished to do so, subject to ;;; +;;; the following conditions: ;;; +;;; 1. The code must retain the above copyright notice, this list of ;;; +;;; conditions and the following disclaimer. ;;; +;;; 2. Any modifications must be clearly marked as such. ;;; +;;; 3. Original authors' names are not deleted. ;;; +;;; 4. The authors' names are not used to endorse or promote products ;;; +;;; derived from this software without specific prior written ;;; +;;; permission. ;;; +;;; ;;; +;;; CARNEGIE MELLON UNIVERSITY AND THE CONTRIBUTORS TO THIS WORK ;;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;;; +;;; SHALL CARNEGIE MELLON UNIVERSITY NOR THE CONTRIBUTORS BE LIABLE ;;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;;; +;;; THIS SOFTWARE. ;;; +;;; ;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Some test the basic arctic/hts voices + +(format t "cmu_us_awb_arctic_hts\n") +(voice_cmu_us_awb_arctic_hts) +(set! utt1 (SynthText "A whole joy was reaping, but they've gone south, you should fetch azure mike.")) +(utt.save.segs utt1 "-") +(format t "%l\n" (wave.info (utt.wave utt1))) + +(format t "cmu_us_bdl_arctic_hts\n") +(voice_cmu_us_bdl_arctic_hts) +(set! utt1 (SynthText "A whole joy was reaping, but they've gone south, you should fetch azure mike.")) +(utt.save.segs utt1 "-") +(format t "%l\n" (wave.info (utt.wave utt1))) + +(format t "cmu_us_jmk_arctic_hts\n") +(voice_cmu_us_jmk_arctic_hts) +(set! utt1 (SynthText "A whole joy was reaping, but they've gone south, you should fetch azure mike.")) +(utt.save.segs utt1 "-") +(format t "%l\n" (wave.info (utt.wave utt1))) + +(format t "cmu_us_slt_arctic_hts\n") +(voice_cmu_us_slt_arctic_hts) +(set! utt1 (SynthText "A whole joy was reaping, but they've gone south, you should fetch azure mike.")) +(utt.save.segs utt1 "-") +(format t "%l\n" (wave.info (utt.wave utt1))) + diff --git a/testsuite/data/modes.scm b/testsuite/data/modes.scm new file mode 100644 index 0000000..d73be11 --- /dev/null +++ b/testsuite/data/modes.scm @@ -0,0 +1,58 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Some test of various text modes + + +(define (output_words utt) + (utt.save.words utt "-") + utt) + +(set! tts_hooks (list utt.synth output_words)) +(gc-status nil) + +(format t "\n\nSABLE mode\n") +(unwind-protect + (tts (string-append libdir "/../examples/example2.sable") nil)) +(format t "\n\nOGI's mark up mode\n") +(unwind-protect + (tts (string-append libdir "/../examples/ex1.ogi") 'ogimarkup)) +(format t "\n\nAn email mode\n") +(unwind-protect + (tts (string-append libdir "/../examples/ex1.email") nil)) + +(voice_kal_diphone) +(format t "\n\nA singing mode\n") +(unwind-protect + (tts (string-append libdir "/../examples/songs/doremi.xml") 'singing)) + + diff --git a/testsuite/data/scherr.scm b/testsuite/data/scherr.scm new file mode 100644 index 0000000..28cb603 --- /dev/null +++ b/testsuite/data/scherr.scm @@ -0,0 +1,50 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Test catching of errors + +(define (subfunc) + (load "Makefile") ;; shouldn't be a valid Scheme file +) + +(unwind-protect + (subfunc) + (begin + (format t "caught the error\n") + (printtt) ;; will give another error + )) + +(format t "after the error\n") ;; shouldn't get here + + + + diff --git a/testsuite/data/testinstall.scm b/testsuite/data/testinstall.scm new file mode 100644 index 0000000..f295424 --- /dev/null +++ b/testsuite/data/testinstall.scm @@ -0,0 +1,53 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 2001 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; See if the voices are installed + +(define (check_tests_will_work) + "(check_tests_will_work) +These tests *require* ked, don and rab to be installed." + (unwind-protect + (begin + (voice_kal_diphone) + (SynthText "hello world") + (voice_rab_diphone) + (SynthText "hello world") + ) + (begin + (format stderr "\n") + (format stderr "The festival tests require the kal and rab diphone voices to be\n") + (format stderr "installed. Festival may work without that diphone set, but the results of\n") + (format stderr "these tests aren't relevant.\n") + (exit -1)))) + +(check_tests_will_work) + diff --git a/testsuite/data/utt1.scm b/testsuite/data/utt1.scm new file mode 100644 index 0000000..082140c --- /dev/null +++ b/testsuite/data/utt1.scm @@ -0,0 +1,66 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Walk about an utterance using features, relations etc. + +(set! utt1 (Utterance Text "hello, this example has three syllables.")) +(utt.synth utt1) + +(format t "num syllables: %d\n" + (length (utt.features utt1 'Syllable '(name)))) + +(set! segs (utt.relation.items utt1 'Segment)) +(set! s1 (item.relation.parent (nth 2 segs) 'SylStructure)) +(set! s2 (item.relation.parent (nth 3 segs) 'SylStructure)) +(set! s3 (item.relation.parent (nth 4 segs) 'SylStructure)) + +(if (equal? s1 s2) + (print "syls match: ok") + (print "syls don't match: error")) +(if (not (equal? s2 s3)) + (print "syls don't match: ok") + (print "syls match: error")) + +(print + (list + (item.feat s3 'name) + (item.feat s3 'R:SylStructure.daughter1.name) + (item.feat s3 'R:Syllable.n.R:SylStructure.daughter1.name) + (item.feat s3 'R:Syllable.p.R:SylStructure.daughter1.name) + (item.feat s3 'R:SylStructure.parent.name) + (item.feat s3 'R:SylStructure.parent.daughter1.daughtern.name) + (item.feat s3 'R:SylStructure.parent.R:Token.parent.whitespace) + (item.feat s3 'R:SylStructure.parent.R:Word.n.R:Token.parent.p.punc) +)) + + + diff --git a/testsuite/data/voices.scm b/testsuite/data/voices.scm new file mode 100644 index 0000000..e53200c --- /dev/null +++ b/testsuite/data/voices.scm @@ -0,0 +1,78 @@ +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; ;; +;;; Centre for Speech Technology Research ;; +;;; University of Edinburgh, UK ;; +;;; Copyright (c) 1996,1997 ;; +;;; All Rights Reserved. ;; +;;; ;; +;;; Permission is hereby granted, free of charge, to use and distribute ;; +;;; this software and its documentation without restriction, including ;; +;;; without limitation the rights to use, copy, modify, merge, publish, ;; +;;; distribute, sublicense, and/or sell copies of this work, and to ;; +;;; permit persons to whom this work is furnished to do so, subject to ;; +;;; the following conditions: ;; +;;; 1. The code must retain the above copyright notice, this list of ;; +;;; conditions and the following disclaimer. ;; +;;; 2. Any modifications must be clearly marked as such. ;; +;;; 3. Original authors' names are not deleted. ;; +;;; 4. The authors' names are not used to endorse or promote products ;; +;;; derived from this software without specific prior written ;; +;;; permission. ;; +;;; ;; +;;; THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ;; +;;; DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ;; +;;; ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ;; +;;; SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ;; +;;; FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; +;;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ;; +;;; AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ;; +;;; ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ;; +;;; THIS SOFTWARE. ;; +;;; ;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; Test some voice switching + + +(voice_rab_diphone) +(set! utt1 (Utterance Text "The date was July 24th 1997.")) +(utt.synth utt1) +(utt.save.wave utt1 "tmp/rab1.wav") +(voice_kal_diphone) +(set! utt1 (Utterance Text "The date was July 24th 1997.")) +(utt.synth utt1) +(utt.save.wave utt1 "tmp/kal1.wav") +(voice_cmu_us_rms_cg) +(set! utt1 (Utterance Text "The date was July 24th 1997.")) +(utt.synth utt1) +(utt.save.wave utt1 "tmp/rms1.wav") +(voice_cmu_us_awb_cg) +(set! utt1 (Utterance Text "The date was July 24th 1997.")) +(utt.synth utt1) +(utt.save.wave utt1 "tmp/awb1.wav") +(voice_cmu_us_slt_arctic_hts) +(set! utt1 (Utterance Text "The date was July 24th 1997.")) +(utt.synth utt1) +(utt.save.wave utt1 "tmp/slt1.wav") + +;; Check if voices interfere with each other or not +(voice_rab_diphone) +(set! utt1 (Utterance Text "The date was July 24th 1997.")) +(utt.synth utt1) +(utt.save.wave utt1 "tmp/rab2.wav") +(voice_kal_diphone) +(set! utt1 (Utterance Text "The date was July 24th 1997.")) +(utt.synth utt1) +(utt.save.wave utt1 "tmp/kal2.wav") +(voice_cmu_us_rms_cg) +(set! utt1 (Utterance Text "The date was July 24th 1997.")) +(utt.synth utt1) +(utt.save.wave utt1 "tmp/rms2.wav") +(voice_cmu_us_awb_cg) +(set! utt1 (Utterance Text "The date was July 24th 1997.")) +(utt.synth utt1) +(utt.save.wave utt1 "tmp/awb2.wav") +(voice_cmu_us_slt_arctic_hts) +(set! utt1 (Utterance Text "The date was July 24th 1997.")) +(utt.synth utt1) +(utt.save.wave utt1 "tmp/slt2.wav") + diff --git a/testsuite/fest.sh b/testsuite/fest.sh new file mode 100644 index 0000000..9644715 --- /dev/null +++ b/testsuite/fest.sh @@ -0,0 +1,69 @@ +#!/bin/sh +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## +## Basic test of synthesis + +FESTIVAL=$TOP/bin/festival +HOME=$TOP/testsuite/data +export HOME + +test_basic () { + + echo "basic " >&2 + rm -f tmp/fest2.utt tmp/fest3.utt + $FESTIVAL -b data/fest1.scm || exit 1 + diff tmp/fest2.utt tmp/fest3.utt + +} + +test_utt () { + + echo "utt feats " >&2 + $FESTIVAL -b data/utt1.scm || exit 1 + +} + +test_info () +{ + echo info and help >&2 + $FESTIVAL -h +} + +echo >$OUTPUT + +test_basic 2>&1 >> $OUTPUT +test_utt 2>&1 >> $OUTPUT +test_info 2>&1 >> $OUTPUT + +exit 0 diff --git a/testsuite/modes.sh b/testsuite/modes.sh new file mode 100644 index 0000000..dd3b640 --- /dev/null +++ b/testsuite/modes.sh @@ -0,0 +1,51 @@ +#!/bin/sh +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## +## Test text modes + +FESTIVAL=$TOP/bin/festival +HOME=$TOP/testsuite/data +export HOME + +do_modes () { + echo "text modes " >&2 + + $FESTIVAL -b data/modes.scm || exit 1 +} + +echo >$OUTPUT + +do_modes 2>&1 >> $OUTPUT + +exit 0 diff --git a/testsuite/parse.sh b/testsuite/parse.sh new file mode 100644 index 0000000..ac45f13 --- /dev/null +++ b/testsuite/parse.sh @@ -0,0 +1,51 @@ +#!/bin/sh +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1998 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## +## Test text modes + +FESTIVAL=$TOP/bin/festival +HOME=$TOP/testsuite/data +export HOME + +do_parse () { + echo "Probabilistic grammars are easy to use, but difficult to train." | + $TOP/examples/scfg_parse_text || exit 1 +} + +echo >$OUTPUT + +do_parse 2>&1 >> $OUTPUT + +exit 0 + diff --git a/testsuite/scherr.sh b/testsuite/scherr.sh new file mode 100644 index 0000000..e1562b8 --- /dev/null +++ b/testsuite/scherr.sh @@ -0,0 +1,56 @@ +#!/bin/sh +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## +## Basic test voices + +FESTIVAL=$TOP/bin/festival +HOME=$TOP/testsuite/data +export HOME + +do_test () { + echo "checking error handling in batch mode " >&2 + + $FESTIVAL -b data/scherr.scm + + echo "checking error handling in interactive mode " >&2 + + echo "(load \"data/scherr.scm\")" | $FESTIVAL --interactive + +} + +echo >$OUTPUT + +do_test >> $OUTPUT 2>&1 + +exit 0 diff --git a/testsuite/testinstall.sh b/testsuite/testinstall.sh new file mode 100644 index 0000000..fb928ea --- /dev/null +++ b/testsuite/testinstall.sh @@ -0,0 +1,43 @@ +#!/bin/sh +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 2001 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## +## test overall installtion (and not continue if there are problems + +FESTIVAL=$1/bin/festival +HOME=$1/testsuite/data +export HOME + +echo "(quit)" | $FESTIVAL -b || exit 1 +$FESTIVAL -b data/testinstall.scm || exit 1 +exit 0 diff --git a/testsuite/text.sh b/testsuite/text.sh new file mode 100644 index 0000000..4eb00cf --- /dev/null +++ b/testsuite/text.sh @@ -0,0 +1,53 @@ +#!/bin/sh +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## +## Basic test of synthesis + +FESTIVAL=$TOP/bin/festival +HOME=$TOP/testsuite/data +export HOME + +test_text () { + + echo "text " >&2 + + $FESTIVAL -b data/fest2.scm || exit 1 + +} + +echo >$OUTPUT + +test_text 2>&1 >> $OUTPUT + +exit 0 diff --git a/testsuite/voice.sh b/testsuite/voice.sh new file mode 100644 index 0000000..9686eaf --- /dev/null +++ b/testsuite/voice.sh @@ -0,0 +1,63 @@ +#!/bin/sh +########################################################################### +## ## +## Centre for Speech Technology Research ## +## University of Edinburgh, UK ## +## Copyright (c) 1997 ## +## All Rights Reserved. ## +## ## +## Permission is hereby granted, free of charge, to use and distribute ## +## this software and its documentation without restriction, including ## +## without limitation the rights to use, copy, modify, merge, publish, ## +## distribute, sublicense, and/or sell copies of this work, and to ## +## permit persons to whom this work is furnished to do so, subject to ## +## the following conditions: ## +## 1. The code must retain the above copyright notice, this list of ## +## conditions and the following disclaimer. ## +## 2. Any modifications must be clearly marked as such. ## +## 3. Original authors' names are not deleted. ## +## 4. The authors' names are not used to endorse or promote products ## +## derived from this software without specific prior written ## +## permission. ## +## ## +## THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK ## +## DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ## +## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT ## +## SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE ## +## FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ## +## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN ## +## AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ## +## ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF ## +## THIS SOFTWARE. ## +## ## +########################################################################### +## +## Basic test voices + +FESTIVAL=$TOP/bin/festival +HOME=$TOP/testsuite/data +export HOME + +do_voices () { + + echo "multi-voices " >&2 + + $FESTIVAL -b data/voices.scm || exit 1 + + for i in rab kal slt + do + if cmp tmp/${i}1.wav tmp/${i}2.wav + then echo $i voice: pass + else echo $i voice: fail + fi + done + + # CG voices have some randomness in them so they wont be the same + +} + +echo >$OUTPUT + +do_voices 2>&1 >> $OUTPUT + +exit 0