Measuring semantic similarity by contextualword connections in Chinese news story segmentation
文献类型:会议
作者:Nie, Xuecheng[1] Feng, Wei[2] Wan, Liang[3] Xie, Lei[4]
机构:[1]School of Computer Science and Technology, Tianjin University, China |School of Computer Software, Tianjin University, China
[2]School of Computer Science and Technology, Tianjin University, China |Tianjin Key Lab for Advanced Signal Processing, Civil Aviation University of China, Tianjin, China
[3]School of Computer Software, Tianjin University, China |Tianjin Key Lab for Advanced Signal Processing, Civil Aviation University of China, Tianjin, China
[4]School of Computer Science, Northwestern Polytechnical University, Xi'an, China
年:2013
通讯作者:Feng, W.(wfeng@tju.edu.cn)
会议名称:2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
页码范围:8312-8316
会议地点:Vancouver, BC, Canada
会议开始日期:2013-05-26
会议结束日期:2013-05-31
收录情况:EI(20135217121725)
所属部门:计算机学院
人气指数:3768
浏览次数:3727
语言:中文
摘要:A lot of recent work in story segmentation focuses on developing better partitioning criteria to segment news transcripts into sequences of topically coherent stories, while simply relying on the repetition based hard word-level similarities and ignoring the semantic correlations between different words. In this paper, we propose a purely data-driven approach to measuring soft semantic word- and sentence-level similarity from a given corpus, without the guidance of linguistic knowledge, ground-t
...MoreA lot of recent work in story segmentation focuses on developing better partitioning criteria to segment news transcripts into sequences of topically coherent stories, while simply relying on the repetition based hard word-level similarities and ignoring the semantic correlations between different words. In this paper, we propose a purely data-driven approach to measuring soft semantic word- and sentence-level similarity from a given corpus, without the guidance of linguistic knowledge, ground-truth topic labeling or story boundaries. We show that contextual word connections can help to produce semantically meaningful similarity measurement between any pair of Chinese words. Based on this, we further use a parallel all-pair SimRank algorithm to propagate such contextual similarities throughout the whole vocabulary. The resultant word semantic similarity matrix is then used to refine the classical cosine similarity measurement of sentences. Experiments on benchmark Chinese news corpora show that, story segmentation using the proposed soft semantic similarity measurement can always produce better segmentation accuracy than using the hard similarity. Specifically, we can achieve 3%-10% average F1-measure improvement to state-of-the-art NCuts based story segmentation. ? 2013 IEEE.
...Hide

数据加载中...
年度:0 影响因子:
计算机学院 谢磊
计算机学院 谢磊
dc:title:Measuring semantic similarity by contextualword connections in Chinese news story segmentation
dc:creator:Nie, Xuecheng;Feng, Wei;Wan, Liang,等
dc:date: publishDate:2013-05-26
dc:type:会议
dc:format: Media:2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
dc:identifier: LnterrelatedLiterature:2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013.Vancouver, BC, Canada.
dc:identifier:DOI:10.1109/ICASSP.2013.6639286
dc: identifier:ISBN:9781479903566