First DIHARD Challenge development - SEEDLingS

The source data was drawn from SEEDLingS (The Study of Environmental Effects on Developing Linguistic Skills) corpus, designed to investigate how infants' early linguistic and environmental input plays a role in their learning. Recordings were generated in the home environment of infants in the...

Full description

Bibliographic Details
Main Author:	Ryant, Neville
Corporate Author:	Linguistic Data Consortium
Other Authors:	Cieri, Christopher, Fiumara, James, Liberman, Mark
Format:	Book
Language:	Undetermined
Published:	Philadelphia : Linguistic Data Consortium, University of Pennsylvania, 2019
Subjects:	Automatic speech recognition > Databases Conversation > Databases English language > Data processing > Databases English language > Spoken English > Data processing > Databases English language > Spoken English > United States > Databases


LEADER	02742nam a22003613a 4500
001	44ff203f-f357-4ee0-ab32-072839292f6b
005	20240811000000.0
008	230201s2019 pau o 000 0 und d
020			\|a 1585638919
020			\|a 9781585638918
035			\|a (OCoLC)1368012653
040			\|a CGU \|b eng \|c CGU
050		4	\|a TK7895.S65 \|b F47 2019b
100	1		\|a Ryant, Neville
245	1	0	\|a First DIHARD Challenge development - SEEDLingS
260			\|a Philadelphia : \|b Linguistic Data Consortium, University of Pennsylvania, \|c 2019
300			\|a 1 CDRom ; \|c 4 3/4 in
336			\|a text \|2 rdacontent
337			\|a computer \|2 rdamedia
338			\|a computer disc \|2 rdacarrier
500			\|a LDC2019S10
520			\|a The source data was drawn from SEEDLingS (The Study of Environmental Effects on Developing Linguistic Skills) corpus, designed to investigate how infants' early linguistic and environmental input plays a role in their learning. Recordings were generated in the home environment of infants in the Rochester, New York area. A subset of that data was annotated by LDC for use in the First DIHARD Challenge. This release, when combined with First DIHARD Challenge Evaluation - Nine Sources (LDC2019S12), contains the evaluation set audio data and annotation as well as the official scoring tool. The development data for the First DIHARD Challenge is also available from LDC as Eight Sources (LDC2019S09) and SEEDLingS (LDC2019S10). All audio is provided in the form of 16 kHz, mono-channel FLAC files. The diarization for each recording is stored as a NIST Rich Transcription Time Marked (RTTM) file. RTTM files are space-separated text files containing one turn per line. Segmentation files are stored as HTK label files. Each of these files contains one speech segment per line. Both of the annotation file types are encoded as UTF-8. More information about the file formats is provided in the included documentation
650		0	\|a Automatic speech recognition \|v Databases
650		0	\|a Conversation \|v Databases
650		0	\|a English language \|x Data processing \|v Databases
650		0	\|a English language \|x Spoken English \|x Data processing \|v Databases
650		0	\|a English language \|x Spoken English \|z United States \|v Databases
700	1		\|a Cieri, Christopher
700	1		\|a Fiumara, James
700	1		\|a Liberman, Mark
710	2		\|a Linguistic Data Consortium
999	1	0	\|i 44ff203f-f357-4ee0-ab32-072839292f6b \|l 12799894 \|s US-ICU \|m first_dihard_challenge_development_seedlings_______________________________2019_______lingua________________________________________ryant__neville_____________________e
999	1	1	\|l 12799894 \|s ISIL:US-ICU \|t BKS \|a ASR-JRLASR \|b 115592380 \|c TK7895.S65F47 2019b \|d Library of Congress classification \|y 10456272 \|p LOANABLE

First DIHARD Challenge development - SEEDLingS

Similar Items