Noisy speech sequences for evaluation of voice activity detection (VAD) algorithms
This dataset has been collected to provide a simulation of noisy speech in a wide variety of typical background noise conditions. This distribution contains the QUT-NOISE database and the code required to create the QUT-NOISE-TIMIT database from the QUT-NOISE database and a locally installed copy of the TIMIT database.
The recordings, as described by Dean, Sridharan, Vogt & Mason in The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms, include 20 noise sessions of at least 30 minutes duration. Two separate noise recordings, separated by at least one day in all but the CAR scenario, were conducted in 10 separate locations over 5 separate common background noise scenarios. Locations included a cafe, home, street, car and reverb (closed indoor pool & carpark).
Recordings were collected with a prosumer-quality Zoom H2 handheld stereo microphone recorder. This device was chosen as the quality of the background noise recordings should be higher than typical recording scenarios, allowing any expected recording quality to be easily synthesised.
Each of the 20 noise sessions were recorded with the Zoom H2 set to record raw stereo WAV output with a sampling rate of 48 kHz, and 16 bits per sample. The recordings were conducted using the rear microphone pair of the Zoom H2, as the greater
microphone angular separation (when compared to the front microphone pair) could potentially allow for more useful comparisons to be made between the two channels in future research. In order to calculate the room response in the reverberant CAR and REVERB scenarios, 10 second frequency sweeps were played with the studio monitor positioned several metres away from the microphone. Each reverberant session contained 12 frequency sweeps, with 6 before the main 30+ minute recording session and 6 after. Each of the noise sessions collected was manually labeled with the boundaries of the main 30+ minute recording session, as well as the rough locations of each individual frequency sweep in the reverberant sessions. In addition, the locations of any bad portions of data (such as microphone failure) were
labeled to allow them to be avoided.
Geographical area of data collection
Cite this collection
Access the data
Creative Commons Attribution-NonCommercial-Share Alike 4.0 (CC-BY-NC-SA)