src/native_client_sdk/src/doc/devguide/coding/audio.rst

   1 .. _devguide-coding-audio:
   2
   3 #####
   4 Audio
   5 #####
   6
   7 .. contents::
   8   :local:
   9   :backlinks: none
  10   :depth: 2
  11
  12 This chapter describes how to use the Pepper audio API to play an audio
  13 stream. The Pepper audio API provides a low-level means of playing a stream of
  14 audio samples generated by a Native Client module. The API generally works as
  15 follows: A Native Client module creates an audio resource that represents an
  16 audio stream, and tells the browser to start or stop playing the audio
  17 resource. The browser calls a function in the Native Client module to fill a
  18 buffer with audio samples every time it needs data to play from the audio
  19 stream.
  20
  21 The code examples in this chapter describe a simple Native Client module that
  22 generates audio samples using a sine wave with a frequency of 440 Hz. The module
  23 starts playing the audio samples as soon as it is loaded into the browser. For a
  24 slightly more sophisticated example, see the ``audio`` example (source code in
  25 the SDK directory ``examples/api/audio``), which lets users specify a frequency
  26 for the sine wave and click buttons to start and stop audio playback.
  27
  28 Reference information
  29 =====================
  30
  31 For reference information related to the Pepper audio API, see the following
  32 documentation:
  33
  34 * `pp::AudioConfig class
  35   <https://developers.google.com/native-client/peppercpp/classpp_1_1_audio_config>`_
  36
  37 * `pp::Audio class
  38   <https://developers.google.com/native-client/peppercpp/classpp_1_1_audio>`_
  39
  40 * `audio_config.h
  41   <https://developers.google.com/native-client/peppercpp/audio__config_8h>`_
  42
  43 * `audio.h <https://developers.google.com/native-client/peppercpp/audio_8h>`_
  44
  45 * `PP_AudioSampleRate
  46   <https://developers.google.com/native-client/pepperc/group___enums.html#gaee750c350655f2fb0fe04c04029e0ff8>`_
  47
  48 About the Pepper audio API
  49 ==========================
  50
  51 The Pepper audio API lets Native Client modules play audio streams in a
  52 browser. To play an audio stream, a module generates audio samples and writes
  53 them into a buffer. The browser reads the audio samples from the buffer and
  54 plays them using an audio device on the client computer.
  55
  56 .. image:: /images/pepper-audio-buffer.png
  57
  58 This mechanism is simple but low-level. If you want to play plain sound files in
  59 a web application, you may want to consider higher-level alternatives such as
  60 using the HTML ``<audio>`` tag, JavaScript, or the new `Web Audio API
  61 <http://chromium.googlecode.com/svn/trunk/samples/audio/index.html>`_.
  62
  63 The Pepper audio API is a good option for playing audio data if you want to do
  64 audio processing in your web application. You might use the audio API, for
  65 example, if you want to apply audio effects to sounds, synthesize your own
  66 sounds, or do any other type of CPU-intensive processing of audio
  67 samples. Another likely use case is gaming applications: you might use a gaming
  68 library to process audio data, and then simply use the audio API to output the
  69 processed data.
  70
  71 The Pepper audio API is straightforward to use:
  72
  73 #. Your module creates an audio configuration resource and an audio resource.
  74
  75 #. Your module implements a callback function that fills an audio buffer with
  76    data.
  77
  78 #. Your module invokes the StartPlayback and StopPlayback methods of the audio
  79    resource (e.g., when certain events occur).
  80
  81 #. The browser invokes your callback function whenever it needs audio data to
  82    play. Your callback function can generate the audio data in a number of
  83    ways---e.g., it can generate new data, or it can copy pre-mixed data into the
  84    audio buffer.
  85
  86 This basic interaction is illustrated below, and described in detail in the
  87 sections that follow.
  88
  89 .. image:: /images/pepper-audio-api.png
  90
  91 Digital audio concepts
  92 ======================
  93
  94 Before you use the Pepper audio API, it's helpful to understand a few concepts
  95 that are fundamental to how digital audio is recorded and played back:
  96
  97 sample rate
  98   the number of times an input sound source is sampled per second;
  99   correspondingly, the number of samples that are played back per second
 100
 101 bit depth
 102   the number of bits used to represent a sample
 103
 104 channels
 105   the number of input sources recorded in each sampling interval;
 106   correspondingly, the number of outputs that are played back simultaneously
 107   (typically using different speakers)
 108
 109 The higher the sample rate and bit depth used to record a sound wave, the more
 110 accurately the sound wave can be reproduced, since it will have been sampled
 111 more frequently and stored using a higher level of quantization. Common sampling
 112 rates include 44,100 Hz (44,100 samples/second, the sample rate used on CDs),
 113 and 48,000 Hz (the sample rate used on DVDs and Digital Audio Tapes). A common
 114 bit depth is 16 bits per sample, and a common number of channels is 2 (left and
 115 right channels for stereo sound).
 116
 117 .. _pepper_audio_configurations:
 118
 119 The Pepper audio API currently lets Native Client modules play audio streams
 120 with the following configurations:
 121
 122 * **sample rate**: 44,100 Hz or 48,000 Hz
 123 * **bit depth**: 16
 124 * **channels**: 2 (stereo)
 125
 126 Setting up the module
 127 =====================
 128
 129 The code examples below describe a simple Native Client module that generates
 130 audio samples using a sine wave with a frequency of 440 Hz. The module starts
 131 playing the audio samples as soon as it is loaded into the browser.
 132
 133 The Native Client module is set up by implementing subclasses of the
 134 ``pp::Module`` and ``pp::Instance`` classes, as normal.
 135
 136 .. naclcode::
 137
 138   class SineSynthInstance : public pp::Instance {
 139    public:
 140     explicit SineSynthInstance(PP_Instance instance);
 141     virtual ~SineSynthInstance() {}
 142
 143     // Called by the browser once the NaCl module is loaded and ready to
 144     // initialize.  Creates a Pepper audio context and initializes it. Returns
 145     // true on success.  Returning false causes the NaCl module to be deleted
 146     // and no other functions to be called.
 147     virtual bool Init(uint32_t argc, const char* argn[], const char* argv[]);
 148
 149    private:
 150     // Function called by the browser when it needs more audio samples.
 151     static void SineWaveCallback(void* samples,
 152                                  uint32_t buffer_size,
 153                                  void* data);
 154
 155     // Audio resource.
 156     pp::Audio audio_;
 157
 158     ...
 159
 160   };
 161
 162   class SineSynthModule : public pp::Module {
 163    public:
 164     SineSynthModule() : pp::Module() {}
 165     ~SineSynthModule() {}
 166
 167     // Create and return a SineSynthInstance object.
 168     virtual pp::Instance* CreateInstance(PP_Instance instance) {
 169       return new SineSynthInstance(instance);
 170     }
 171   };
 172
 173 Creating an audio configuration resource
 174 ========================================
 175
 176 Resources
 177 ---------
 178
 179 Before the module can play an audio stream, it must create two resources: an
 180 audio configuration resource and an audio resource. Resources are handles to
 181 objects that the browser provides to module instances. An audio resource is an
 182 object that represents the state of an audio stream, including whether the
 183 stream is paused or being played back, and which callback function to invoke
 184 when the samples in the stream's buffer run out. An audio configuration resource
 185 is an object that stores configuration data for an audio resource, including the
 186 sampling frequency of the audio samples, and the number of samples that the
 187 callback function must provide when the browser invokes it.
 188
 189 Sample frame count
 190 ------------------
 191
 192 Prior to creating an audio configuration resource, the module should call
 193 ``RecommendSampleFrameCount`` to obtain a *sample frame count* from the
 194 browser. The sample frame count is the number of samples that the callback
 195 function must provide per channel each time the browser invokes the callback
 196 function. For example, if the sample frame count is 4096 for a stereo audio
 197 stream, the callback function must provide a 8192 samples (4096 for the left
 198 channel and 4096 for the right channel).
 199
 200 The module can request a specific sample frame count, but the browser may return
 201 a different sample frame count depending on the capabilities of the client
 202 device. At present, ``RecommendSampleFrameCount`` simply bound-checks the
 203 requested sample frame count (see ``include/ppapi/c/ppb_audio_config.h`` for the
 204 minimum and maximum sample frame counts, currently 64 and 32768). In the future,
 205 ``RecommendSampleFrameCount`` may perform a more sophisticated calculation,
 206 particularly if there is an intrinsic buffer size for the client device.
 207
 208 Selecting a sample frame count for an audio stream involves a tradeoff between
 209 latency and CPU usage. If you want your module to have short audio latency so
 210 that it can rapidly change what's playing in the audio stream, you should
 211 request a small sample frame count. That could be useful in gaming applications,
 212 for example, where sounds have to change frequently in response to game
 213 action. However, a small sample frame count results in higher CPU usage, since
 214 the browser must invoke the callback function frequently to refill the audio
 215 buffer. Conversely, a large sample frame count results in higher latency but
 216 lower CPU usage. You should request a large sample frame count if your module
 217 will play long, uninterrupted audio segments.
 218
 219 Supported audio configurations
 220 ------------------------------
 221
 222 After the module obtains a sample frame count, it can create an audio
 223 configuration resource. Currently the Pepper audio API supports audio streams
 224 with the configuration settings shown :ref:`above<pepper_audio_configurations>`.
 225 C++ modules can create a configuration resource by instantiating a
 226 ``pp::AudioConfig`` object. Check ``audio_config.h`` for the latest
 227 configurations that are supported.
 228
 229 .. naclcode::
 230
 231   bool SineSynthInstance::Init(uint32_t argc,
 232                                const char* argn[],
 233                                const char* argv[]) {
 234
 235     // Ask the browser/device for an appropriate sample frame count size.
 236     sample_frame_count_ =
 237         pp::AudioConfig::RecommendSampleFrameCount(PP_AUDIOSAMPLERATE_44100,
 238                                                    kSampleFrameCount);
 239
 240     // Create an audio configuration resource.
 241     pp::AudioConfig audio_config = pp::AudioConfig(this,
 242                                                    PP_AUDIOSAMPLERATE_44100,
 243                                                    sample_frame_count_);
 244
 245     // Create an audio resource.
 246     audio_ = pp::Audio(this,
 247                        audio_config,
 248                        SineWaveCallback,
 249                        this);
 250
 251     // Start playback when the module instance is initialized.
 252     return audio_.StartPlayback();
 253   }
 254
 255 Creating an audio resource
 256 ==========================
 257
 258 Once the module has created an audio configuration resource, it can create an
 259 audio resource. To do so, it instantiates a ``pp::Audio`` object, passing in a
 260 pointer to the module instance, the audio configuration resource, a callback
 261 function, and a pointer to user data (data that is used in the callback
 262 function).  See the example above.
 263
 264 Implementing a callback function
 265 ================================
 266
 267 The browser calls the callback function associated with an audio resource every
 268 time it needs more samples to play. The callback function can generate new
 269 samples (e.g., by applying sound effects), or copy pre-mixed samples into the
 270 audio buffer. The example below generates new samples by computing values of a
 271 sine wave.
 272
 273 The last parameter passed to the callback function is generic user data that the
 274 function can use in processing samples. In the example below, the user data is a
 275 pointer to the module instance, which includes member variables
 276 ``sample_frame_count_`` (the sample frame count obtained from the browser) and
 277 ``theta_`` (the last angle that was used to compute a sine value in the previous
 278 callback; this lets the function generate a smooth sine wave by starting at that
 279 angle plus a small delta).
 280
 281 .. naclcode::
 282
 283   class SineSynthInstance : public pp::Instance {
 284    public:
 285     ...
 286
 287    private:
 288     static void SineWaveCallback(void* samples,
 289                                  uint32_t buffer_size,
 290                                  void* data) {
 291
 292       // The user data in this example is a pointer to the module instance.
 293       SineSynthInstance* sine_synth_instance =
 294           reinterpret_cast<SineSynthInstance*>(data);
 295
 296       // Delta by which to increase theta_ for each sample.
 297       const double delta = kTwoPi * kFrequency / PP_AUDIOSAMPLERATE_44100;
 298       // Amount by which to scale up the computed sine value.
 299       const int16_t max_int16 = std::numeric_limits<int16_t>::max();
 300
 301       int16_t* buff = reinterpret_cast<int16_t*>(samples);
 302
 303       // Make sure we can't write outside the buffer.
 304       assert(buffer_size >= (sizeof(*buff) * kChannels *
 305                              sine_synth_instance->sample_frame_count_));
 306
 307       for (size_t sample_i = 0;
 308            sample_i < sine_synth_instance->sample_frame_count_;
 309            ++sample_i, sine_synth_instance->theta_ += delta) {
 310
 311         // Keep theta_ from going beyond 2*Pi.
 312         if (sine_synth_instance->theta_ > kTwoPi) {
 313           sine_synth_instance->theta_ -= kTwoPi;
 314         }
 315
 316         // Compute the sine value for the current theta_, scale it up,
 317         // and write it into the buffer once for each channel.
 318         double sin_value(std::sin(sine_synth_instance->theta_));
 319         int16_t scaled_value = static_cast<int16_t>(sin_value * max_int16);
 320         for (size_t channel = 0; channel < kChannels; ++channel) {
 321           *buff++ = scaled_value;
 322         }
 323       }
 324     }
 325
 326     ...
 327   };
 328
 329 Application threads and real-time requirements
 330 ----------------------------------------------
 331
 332 The callback function runs in a background application thread. This allows audio
 333 processing to continue even when the application is busy doing something
 334 else. If the main application thread and the callback thread access the same
 335 data, you may be tempted to use a lock to control access to that data. You
 336 should avoid the use of locks in the callback thread, however, as attempting to
 337 acquire a lock may cause the thread to get swapped out, resulting in audio
 338 dropouts.
 339
 340 In general, you must program the callback thread carefully, as the Pepper audio
 341 API is a very low level API that needs to meet hard real-time requirements. If
 342 the callback thread spends too much time processing, it can easily miss the
 343 real-time deadline, resulting in audio dropouts. One way the callback thread can
 344 miss the deadline is by taking too much time doing computation. Another way the
 345 callback thread can miss the deadline is by executing a function call that swaps
 346 out the callback thread. Unfortunately, such function calls include just about
 347 all C Run-Time (CRT) library calls and Pepper API calls. The callback thread
 348 should therefore avoid calls to malloc, gettimeofday, mutex, condvars, critical
 349 sections, and so forth; any such calls could attempt to take a lock and swap out
 350 the callback thread, which would be disastrous for audio playback. Similarly,
 351 the callback thread should avoid Pepper API calls. Audio dropouts due to thread
 352 swapping can be very rare and very hard to track down and debug---it's best to
 353 avoid making system/Pepper calls in the first place. In short, the audio
 354 (callback) thread should use "lock-free" techniques and avoid making CRT library
 355 calls.
 356
 357 One other issue to be aware of is that the ``StartPlayback`` function (discussed
 358 below) is an asynchronous RPC; i.e., it does not block. That means that the
 359 callback function may not be called immediately after the call to
 360 ``StartPlayback``. If it's important to synchronize the callback thread with
 361 another thread so that the audio stream starts playing simultaneously with
 362 another action in your application, you must handle such synchronization
 363 manually.
 364
 365 Starting and stopping playback
 366 ==============================
 367
 368 To start and stop audio playback, the module simply reacts to JavaScript
 369 messages.
 370
 371 .. naclcode::
 372
 373   const char* const kPlaySoundId = "playSound";
 374   const char* const kStopSoundId = "stopSound";
 375
 376   void SineSynthInstance::HandleMessage(const pp::Var& var_message) {
 377     if (!var_message.is_string()) {
 378       return;
 379     }
 380     std::string message = var_message.AsString();
 381     if (message == kPlaySoundId) {
 382       audio_.StartPlayback();
 383     } else if (message == kStopSoundId) {
 384       audio_.StopPlayback();
 385     } else if (...) {
 386       ...
 387     }
 388   }