Home – Licensing info
Sinica MCDC8 consists of eight hours of conversational speech with signal-aligned IPUs and word boundary annotation. Academic licenses can be issued via the Association for Computational Linguistics and Chinese Language Processing (https://www.aclclp.org.tw/use_mat.php#mcdc)
SPCCSD contains 3.5 hours of speech data with manually verified phone boundary information. Academic licenses can be issued via the Association for Computational Linguistics and Chinese Language Processing (https://www.aclclp.org.tw/use_mat.php#pad)。
Sinica Chinese Core Vocabulary (version 1.0) consists of 1,121 Chinese words that are derived from the intersection of the top 2000 (most frequently used) words in the Sinica Balanced Corpus and in the Taiwan Mandarin Conversational Corpus. It consists of word information about part of speech, frequency, ranking in both of the corpora as well as the corresponding English glossaries with Chinese examples and English translations. All Chinese characters are transcribed in Pinyin. Academic licenses can be issued via the Association for Computational Linguistics and Chinese Language Processing (https://www.aclclp.org.tw/use_sccv.php)。