Subword Lexical Chaining for Automatic Story Segmentation in Chinese Broadcast News
文献类型:会议
作者:Xie, Lei[1] Yang, Yulian[2] Zeng, Jia[3]
机构:[1]Audio, Speech and Language Processing Group (ASLP) School of Computer Science, Northwestern Polytechnical University, Xi'an, China
[2]Audio, Speech and Language Processing Group (ASLP) School of Computer Science, Northwestern Polytechnical University, Xi'an, China
[3]Department of Computer Science, Hong Kong Baptist University, Hong Kong, Hong Kong
年:2008
通讯作者:Xie, L (reprint author), NW Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP, Xian 710072, Peoples R China.
会议名称:Advances in Multimedia Information Processing - PCM 2008, 9th Pacific
Rim Conference on Multimedia
页码范围:248-258
会议开始日期:2008-12-09
收录情况:EI(20094612439569) CPCI-S(WOS:000262476800026)
所属部门:计算机学院
人气指数:4528
浏览次数:4486
语言:外文
关键词:Story segmentation; topic segmentation; spoken document retrieval;
multimedia; Chinese
摘要:We present a subword lexical chaining approach to automatic story segmentation of Chinese broadcast news (BN). Conventional lexical chains link related words with cohesion (e.g. repetition of words) and high concentration points of starting and ending chains are indicative of story boundaries. However, inevitable speech recognition errors in BN transcripts may destroy the cohesiveness of words, resulting in word match failures. We show the robustness of Chinese subwords (characters and syllables
...MoreWe present a subword lexical chaining approach to automatic story segmentation of Chinese broadcast news (BN). Conventional lexical chains link related words with cohesion (e.g. repetition of words) and high concentration points of starting and ending chains are indicative of story boundaries. However, inevitable speech recognition errors in BN transcripts may destroy the cohesiveness of words, resulting in word match failures. We show the robustness of Chinese subwords (characters and syllables) in lexical matching in errorful ASR transcripts. This motivates us to discover story boundaries on chains formed by character and syllable n-gram units. Experimental results on the TDT2 Mandarin corpus show that chaining by character unigram exhibits the best story segmentation performance with relative F-measure improvement of 6.06% over conventional word chaining. Integrations of multi-scales (words and subwords) exhibit further improvement. For example, fusion by voting from different scales achieves an F-measure gain of 9.04% over words.
...Hide

数据加载中...
年度:0 影响因子:
计算机学院 谢磊
dc:title:Subword Lexical Chaining for Automatic Story Segmentation in Chinese Broadcast News
dc:creator:Xie, Lei;Yang, Yulian;Zeng, Jia
dc:date: publishDate:2008-12-09
dc:type:会议
dc:format: Media:Advances in Multimedia Information Processing - PCM 2008, 9th Pacific
Rim Conference on Multimedia
dc:identifier: LnterrelatedLiterature:Advances in Multimedia Information Processing - PCM 2008, 9th Pacific
Rim Conference on Multimedia.5353(248-258).
dc:identifier:DOI:10.1007/978-3-540-89796-5_26
dc: identifier:ISBN:0302-9743