Noisy speech sequences for evaluation of voice activity detection (VAD) algorithms

Name: Noisy speech sequences for evaluation of voice activity detection (VAD) algorithms
Creator: David Dean
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Keywords: Electrical and Electronic Engineering,Evaluation protocols,Voice activity detection,Signal processing,Speech databases,research data,data collections,research project

Dean,David; Mason,Robert; Vogt,Robert; Sridharan ,Sridha.

doi:10.4225/09/586f2a3faff49

Noisy speech sequences for evaluation of voice activity detection (VAD) algorithms

Viewed: 5466

This dataset has been collected to provide a simulation of noisy speech in a wide variety of typical background noise conditions. This distribution contains the QUT-NOISE database and the code required to create the QUT-NOISE-TIMIT database from the QUT-NOISE database and a locally installed copy of the TIMIT database.

The recordings, as described by Dean, Sridharan, Vogt & Mason in The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms, include 20 noise sessions of at least 30 minutes duration. Two separate noise recordings, separated by at least one day in all but the CAR scenario, were conducted in 10 separate locations over 5 separate common background noise scenarios. Locations included a cafe, home, street, car and reverb (closed indoor pool & carpark).

Recordings were collected with a prosumer-quality Zoom H2 handheld stereo microphone recorder. This device was chosen as the quality of the background noise recordings should be higher than typical recording scenarios, allowing any expected recording quality to be easily synthesised.

Each of the 20 noise sessions were recorded with the Zoom H2 set to record raw stereo WAV output with a sampling rate of 48 kHz, and 16 bits per sample. The recordings were conducted using the rear microphone pair of the Zoom H2, as the greater
microphone angular separation (when compared to the front microphone pair) could potentially allow for more useful comparisons to be made between the two channels in future research. In order to calculate the room response in the reverberant CAR and REVERB scenarios, 10 second frequency sweeps were played with the studio monitor positioned several metres away from the microphone. Each reverberant session contained 12 frequency sweeps, with 6 before the main 30+ minute recording session and 6 after. Each of the noise sessions collected was manually labeled with the boundaries of the main 30+ minute recording session, as well as the rough locations of each individual frequency sweep in the reverberant sessions. In addition, the locations of any bad portions of data (such as microphone failure) were
labeled to allow them to be avoided.

Access rights

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the Queensland University of Technology nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL QUEENSLAND UNIVERSITY OF TECHNOLOGY BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Geographical area of data collection

kmlPolyCoords

153.552920,-26.777500 152.452799,-26.777500 152.452799,-28.037280 153.552920,-28.037280 153.552920,-26.777500

Publications

Dean, David B., Sridharan, Sridha, Vogt, Robert J., & Mason, Michael W. (2010) The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms, In "Proceedings of Interspeech 2010", Makuhari Messe International Convention Complex, Makuhari, Japan, available at http://eprints.qut.edu.au/38144/

Research areas

Electrical and Electronic Engineering

Evaluation protocols

Voice activity detection

Signal processing

Speech databases

Cite this collection

Dean,David; Mason,Robert; Vogt,Robert; Sridharan ,Sridha. (2014): Noisy speech sequences for evaluation of voice activity detection (VAD) algorithms . [Queensland University of Technology]. https://doi.org/10.4225/09/586f2a3faff49