We need to tell the base class that we're dropping buffers,
so it drops the input timestamps corresponding to these.
Otherwise, the first actual audio buffers we output will be
stamped with those - GST_CLOCK_TIMESTAMP_NONE. That mismatch
between input buffer count and output buffer count will stay
while playing. With enough headers and long enough buffer
durations, the sink will have played enough before receiving
the first valid timestamp (usually 0), and will trigger an
audible discontinuity.
if (!got_audio_frame) {
GST_INFO_OBJECT (dec, "dropping in-stream header, %d bytes", size);
+ gst_audio_decoder_finish_frame (audio_dec, NULL, 1);
return GST_FLOW_OK;
}