Hướng đến tiền huấn luyện cross-attention trong dịch máy bằng nơ-ron

Chỉ số đề mục

12

Lĩnh vực nghiên cứu

Khoa học máy tính

Dạng tài liệu

Tác giả

Phạm Vĩnh Khang, Nguyễn Hồng Bửu Long⁽¹⁾

Nhan đề

Hướng đến tiền huấn luyện cross-attention trong dịch máy bằng nơ-ron

Nhan đề tiếng anh

Towards cross-attention pre-training in neural machine translation

Nguồn trích

Tạp chí Khoa học - Đại học Sư phạm TP Hồ Chí Minh

Năm xuất bản

2022

Số

10

Trang

1749-1755

ISSN

1859-3100

Từ khóa

Cross-attention, Xuyên ngữ, Xử lí ngôn ngữ tự nhiên, Dịch máy tự động, Mô hình ngôn ngữ

Từ khóa tiếng anh

Cross-attention, Cross-lingual, Natural language processing, Neural machine translation, Pre-training, language model

Tóm tắt

Sự xuất hiện của các kĩ thuật tiền huấn luyện (pre-training) và những mô hình ngôn ngữ đã cải thiện đáng kể nhiều giải pháp của các bài toán trong lĩnh vực xử lí ngôn ngữ tự nhiên (XLNNTN). Tuy nhiên, việc ứng dụng những mô hình ngôn ngữ đã được tiền huấn luyện (pre-trained language models) vào bài toán dịch máy vẫn còn là một vấn đề khó, vì mô hình ngôn ngữ không học được thông tin về sự tương tác giữa cặp ngôn ngữ trong quá trình tiền huấn luyện. Trong bài báo này, chúng tôi sẽ tìm hiểu một số công trình nghiên cứu về việc tiền huấn luyện mô-đun cross-attention giữa encoder và decoder bằng cách sử dụng ngữ liệu đơn ngữ lớn. Kết quả thí nghiệm đã chứng minh được sự hiệu quả của việc sử dụng mô hình ngôn ngữ được tiền huấn luyện cho bài toán dịch tự động)

Tóm tắt tiếng anh

The advent of pre-train techniques and large language models has significantly leveraged the performance of many natural language processing (NLP) tasks. However, pre-trained language models for neural machine translation remain a challenge as little information about the interaction of the language pair is learned. In this paper, we explore several studies trying to define a training scheme to pre-train the cross-attention module between the encoder and the decoder by using the large-scale monolingual corpora independently. The experiments show promising results, proving the effectiveness of using pre-trained language models in neural machine translation.

Kí hiệu kho

TTKHCNQG, CTv 138

File toàn văn

Xem toàn văn

Tài liệu tham khảo

[1] Yang, J., Wang, M., Zhou, H., Zhao, C., Zhang, W., Yu, Y., & Li, L. (2020), Towards making the most of bert in neural machine translation.,Proceedings of the AAAI Conference on Artificial Intelligence, 9378–9385.
[2] Weng, R., Yu, H., Huang, S., Cheng, S., & Luo, W. (2020), Acquiring Knowledge f-rom Pre-trained Model to Neural Machine Translation.,Proceedings of the AAAI Conference on Artificial Intelligence, 9266-9273.
[3] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., . . . Polosukhin, I. (2017), Attention is All you Need.,NIPS.
[4] Tran, N. L., Le, D. M., & Nguyen, D. Q. (2022), BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese.,arXiv preprint arXiv:2109.09701.
[5] Song, K., Tan, X., Qin, T., Lu, J., & Liu, T.-Y. (2019), MASS: Masked Sequence to Sequence Pretraining for Language Generation.,arXiv preprint arXiv:1905.02450.
[6] Ren, S., Zhou, L., Liu, S., Wei, F., Zhou, M., & Ma, S. (2021), SemFace: Pre-training Encoder and Decoder with a Semantic Interface for Neural Machine Translation.,Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 4518-4527.
[7] Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018), Improving Language Understanding by Generative Pre-Training,OpenAI.
[8] Ott, M., Edunov, S., Baevski, A., Fan, A., Gross, S., Ng, N., . . . Auli, M. (2019), fairseq: A Fast, Extensible Toolkit for Sequence Modeling.,Proceedings of NAACL-HLT 2019: Demonstrations.
[9] Nguyen, Q. D., & Nguyen, T. A. (2020), PhoBERT: Pre-trained language models for Vietnamese.,arXiv preprint arXiv:2003.00744.
[10] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., . . . Stoyanov, V. (2019), RoBERTa: A Robustly Optimized BERT Pretraining Approach.,arXiv preprint arXiv:1907.11692.
[11] Liu, Y., Gu, J., Goyal, N., Li, X., Edunov, S., Ghazvininejad, M., . . . Zettlemoyer, L. (2020), Multilingual Denoising Pre-training for Neural Machine Translation.,Transactions of the Association for Computational Linguistics, 8, 726-742.
[12] Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., . . . Zettlemoyer, L. (2019), BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension.,arXiv preprint arXiv:1910.13461.
[13] Lample, G., Conneau, A., Denoyer, L., & Ranzato, M. (2017), Unsupervised Machine Translation Using Monolingual Corpora Only.,
[14] Lample, G., & Conneau, A. (2019), Cross-lingual Language Model Pretraining.,arXiv preprint arXiv:1901.07291.
[15] Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018), BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,arXiv preprint arXiv:1810.04805.
[16] Artetxe, M., Labaka, G., & Agirre, E. (2018), A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings.,Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 789-798.
[17] Kingma, D. P., & Ba, J. (2014), Adam: A Method for Stochastic Optimization.,arXiv preprint arXiv:1412.6980.