作者其他论文
文献详情
Pre-Alignment Guided Attention for Improving Training Efficiency and Model Stability
文献类型:期刊
作者:Zhu, Xiaolian[1]  Zhang, Yuchao[2]  Yang, Shan[3]  Xue, Liumeng[4]  Xie, Lei[5]  
机构:[1]Northwestern Polytech Univ, Sch Comp Sci, Xian 710065, Shaanxi, Peoples R China.;Hebei Univ Econ & Business, Publ Comp Educ Ctr, Shijiazhuang 050061, Hebei, Peoples R China.;
[2]Northwestern Polytech Univ, Sch Comp Sci, Xian 710065, Shaanxi, Peoples R China.;
[3]Northwestern Polytech Univ, Sch Comp Sci, Xian 710065, Shaanxi, Peoples R China.;
[4]Northwestern Polytech Univ, Sch Comp Sci, Xian 710065, Shaanxi, Peoples R China.;
[5]Northwestern Polytech Univ, Sch Comp Sci, Xian 710065, Shaanxi, Peoples R China.;
通讯作者:Xie, L (reprint author), Northwestern Polytech Univ, Sch Comp Sci, Xian 710065, Shaanxi, Peoples R China.
年:2019
期刊名称:IEEE ACCESS影响因子和分区
卷:7
期:100
页码范围:65955-65964
增刊:正刊
学科:计算机科学
收录情况:SCI(E)(WOS:000471046700001)  EI(20192507063896)  
所属部门:计算机学院
重要成果类型:重要期刊
人气指数:83
浏览次数:83
基金:National Key Research and Development Program of China [2017YFB1002102]; Natural Science Foundation of Hebei University of Economics and Business [2016KYQ05]
关键词:Attention; alignment loss; speech synthesis; training efficiency; model stability
摘要:
Recently, end-to-end (E2E) neural text-to-speech systems, such as Tacotron2, have begun to surpass the traditional multi-stage hand-engineered systems, with both simplified system building pipelines and high-quality speech. With a unique encoder-decoder neural structure, the Tacotron2 system no longer needs separately learned text analysis front-end, duration model, acoustic model, and audio synthesis module. The key of such a system lies in the attention mechanism, which learns an alignment bet ...More
0
评论(0 条评论)
登录