First DIHARD Challenge development - SEEDLingS

The source data was drawn from SEEDLingS (The Study of Environmental Effects on Developing Linguistic Skills) corpus, designed to investigate how infants' early linguistic and environmental input plays a role in their learning. Recordings were generated in the home environment of infants in the...

Full description

Bibliographic Details
Main Author: Ryant, Neville
Corporate Author: Linguistic Data Consortium
Other Authors: Cieri, Christopher, Fiumara, James, Liberman, Mark
Format: Book
Language:Undetermined
Published: Philadelphia : Linguistic Data Consortium, University of Pennsylvania, 2019
Subjects:
LEADER 02742nam a22003613a 4500
001 44ff203f-f357-4ee0-ab32-072839292f6b
005 20240811000000.0
008 230201s2019 pau o 000 0 und d
020 |a 1585638919 
020 |a 9781585638918 
035 |a (OCoLC)1368012653 
040 |a CGU  |b eng  |c CGU 
050 4 |a TK7895.S65  |b F47 2019b 
100 1 |a Ryant, Neville 
245 1 0 |a First DIHARD Challenge development - SEEDLingS 
260 |a Philadelphia :  |b Linguistic Data Consortium, University of Pennsylvania,  |c 2019 
300 |a 1 CDRom ;  |c 4 3/4 in 
336 |a text  |2 rdacontent 
337 |a computer  |2 rdamedia 
338 |a computer disc  |2 rdacarrier 
500 |a LDC2019S10 
520 |a The source data was drawn from SEEDLingS (The Study of Environmental Effects on Developing Linguistic Skills) corpus, designed to investigate how infants' early linguistic and environmental input plays a role in their learning. Recordings were generated in the home environment of infants in the Rochester, New York area. A subset of that data was annotated by LDC for use in the First DIHARD Challenge. This release, when combined with First DIHARD Challenge Evaluation - Nine Sources (LDC2019S12), contains the evaluation set audio data and annotation as well as the official scoring tool. The development data for the First DIHARD Challenge is also available from LDC as Eight Sources (LDC2019S09) and SEEDLingS (LDC2019S10). All audio is provided in the form of 16 kHz, mono-channel FLAC files. The diarization for each recording is stored as a NIST Rich Transcription Time Marked (RTTM) file. RTTM files are space-separated text files containing one turn per line. Segmentation files are stored as HTK label files. Each of these files contains one speech segment per line. Both of the annotation file types are encoded as UTF-8. More information about the file formats is provided in the included documentation 
650 0 |a Automatic speech recognition  |v Databases 
650 0 |a Conversation  |v Databases 
650 0 |a English language  |x Data processing  |v Databases 
650 0 |a English language  |x Spoken English  |x Data processing  |v Databases 
650 0 |a English language  |x Spoken English  |z United States  |v Databases 
700 1 |a Cieri, Christopher 
700 1 |a Fiumara, James 
700 1 |a Liberman, Mark 
710 2 |a Linguistic Data Consortium 
999 1 0 |i 44ff203f-f357-4ee0-ab32-072839292f6b  |l 12799894  |s US-ICU  |m first_dihard_challenge_development_seedlings_______________________________2019_______lingua________________________________________ryant__neville_____________________e 
999 1 1 |l 12799894  |s ISIL:US-ICU  |t BKS  |a ASR-JRLASR  |b 115592380  |c TK7895.S65F47 2019b  |d Library of Congress classification  |y 10456272  |p LOANABLE