Attentive Statistics Pooling for Deep Speaker Embedding

Koji Okabe; Takafumi Koshinaka; Koichi Shinoda

doi:10.21437/Interspeech.2018-993

論文・著書情報

タイトル

和文:
英文:	Attentive Statistics Pooling for Deep Speaker Embedding

著者

和文:	岡部浩司, 越仲孝文, 篠田浩一.
英文:	Koji Okabe, Takafumi Koshinaka, Koichi Shinoda.

言語

English

掲載誌/書名

和文:
英文:	Proc. Interspeech 2018

巻, 号, ページ

pp. 2252-2256

出版年月

2018年9月4日

出版者

和文:
英文:	ISCA

会議名称

和文:
英文:	Interspeech 2018

開催地

和文:	ハイデラバード
英文:	Hyderabad

ファイル

公式リンク

https://www.isca-speech.org/archive/Interspeech_2018/pdfs/0993.pdf

DOI

https://doi.org/10.21437/Interspeech.2018-993

アブストラクト

This paper proposes attentive statistics pooling for deep speaker embedding in text-independent speaker verification. In conventional speaker embedding, frame-level features are averaged over all the frames of a single utterance to form an utterance-level feature. Our method utilizes an attention mechanism to give different weights to different frames and generates not only weighted means but also weighted standard deviations. In this way, it can capture long-term variations in speaker characteristics more effectively. An evaluation on the NIST SRE 2012 and the VoxCeleb data sets shows that it reduces equal error rates (EERs) from the conventional method by 7.5% and 8.1%, respectively.

Home

各種検索

サポート

T2R2について

関連リンク

論文・著書情報