The VidTIMIT Audio-Video Dataset
|
Overview | Examples
| License & Downloads | Contact | Publications | Related
The VidTIMIT dataset is comprised of
video and corresponding
audio
recordings of 43 people, reciting short sentences.
It can be useful for research on topics such as automatic lip reading, multi-view
face recognition, multi-modal speech recognition and person
identification.
The dataset was recorded in 3 sessions, with a mean delay
of 7 days
between Session 1 and 2, and 6 days between Session 2 and 3.
The sentences were chosen from the test section of the TIMIT
corpus. There are 10 sentences per person. The first
six sentences (sorted alpha-numerically by filename)
are assigned to Session 1. The next two sentences are
assigned to
Session 2 with the remaining two to Session 3.
The first two sentences for all persons are the same, with
the remaining
eight generally different for each person.
In addition to the sentences, each person performed a head rotation sequence in each session. The sequence consists of the person
moving their head
to the left, right, back to the center, up, then down and finally
return
to center.
The recording was done in an office environment using a
broadcast quality
digital video camera. The video of each person is stored as a numbered
sequence of JPEG images with a resolution of 512 x 384 pixels.
90%
quality setting was used during the creation of the JPEG images.
The corresponding audio is stored as a mono, 16 bit, 32 kHz WAV file.
Overview | Examples|
License & Downloads | Contact | Publications | Related
Overview | Examples | License & Downloads | Contact | Publications | Related
PLEAES READ BEFORE DOWNLOADING
LICENSE
The VidTIMIT dataset is
Copyright © 2001 Conrad Sanderson.
Distribution and research usage
of this dataset is permitted under the following conditions:
- This
notice is
left
intact and not modified in any way.
- The
dataset
is provided as is. There is no warranty as to the fitness for any
particular purpose.
- The
author of the
dataset is not responsible for any direct
or indirect losses resulting from the use of the dataset.
- Any publication
(e.g. conference paper, journal article, technical report, book chapter, book) resulting from the usage of VidTIMIT
must cite
the following conference paper:
C. Sanderson and K.K. Paliwal.
Polynomial Features for Robust Face Authentication.
IEEE International Conference on Image Processing (ICIP), Vol. 3, 2002, pp. 997-1000.
(doi: 10.1109/ICIP.2002.1039143)
|
NOTES
- The VidTIMIT dataset is comprised of 44 files, in total taking up about 3 Gb. Each zip is on average 71 Mb
- These files are available online with the kind assistance of the University of Queensland
- Please download only one file at a time -- this is so the server is not overloaded
FILES
- vidtimit_documentation.pdf
- fadg0.zip
- faks0.zip
- fcft0.zip
- fcmh0.zip
- fcmr0.zip
- fcrh0.zip
- fdac1.zip
- fdms0.zip
- fdrd1.zip
- fedw0.zip
- felc0.zip
- fgjd0.zip
- fjas0.zip
- fjem0.zip
- fjre0.zip
- fjwb0.zip
- fkms0.zip
- fpkt0.zip
- fram1.zip
- mabw0.zip
- mbdg0.zip
- mbjk0.zip
- mccs0.zip
- mcem0.zip
- mdab0.zip
- mdbb0.zip
- mdld0.zip
- mgwt0.zip
- mjar0.zip
- mjsw0.zip
- mmdb1.zip
- mmdm2.zip
- mpdf0.zip
- mpgl0.zip
- mrcz0.zip
- mreb0.zip
- mrgg0.zip
- mrjo0.zip
- msjs1.zip
- mstk0.zip
- mtas1.zip
- mtmr0.zip
- mwbt0.zip
Overview | Examples | License & Downloads | Contact | Publications | Related
Overview | Examples | License & Downloads | Contact | Publications | Related
A selection of publications referring to the VidTIMIT dataset:
Overview | Examples | License & Downloads | Contact | Publications | Related
Related datasets & software:
- ChokePoint Dataset (for experiments in person recognition under real-world video surveillance conditions)
- LFW-crop (cropped version of Labeled Faces in the Wild)
|