Adaptation of distilling knowledge method in Natural Language Processing for sentiment analysis.

  • O. Korovii National Technical University of Ukraine "Kyiv Polytechnic Institute of Igor Sikorsky"
Keywords: BERT, FastText, distill knowledge, neural network, natural language processing, sentiment analysis


This paper describes how to adapt an application method of " knowledge distillation " for sentiment analysis for Ukrainian and Russian languages. It is demonstrated how to minimize resources without losing much accuracy, but speeding up the text sentiment recognition, and how to decrease expenses on cloud by using the method of "knowledge distillation". For research we used two types of different neural networks architecture for natural language processing: BERT instead of ensemble models and FastText like a small model. Combination of these two neural networks (BERT as a teacher and FastText as a learner) allowed us to achieve the speedup up to 5 times and without sacrificing much accuracy in sentiment analysis task.


Goodfellow, I., Bengio, Y. and Courville, A., 2016. Deep learning. Cambridge (EE. UU.): MIT Press.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, L. and Polosukhin, I., 2021. Attention Is All You Need. [online] Available at:

Abdur Rahman, Mobashir Sadat, Saeed Siddik, "Sentiment Analysis on Twitter Data: Comparative Study on Different Approaches", International Journal of Intelligent Systems and Applications(IJISA), Vol.13, No.4, pp.1-13, 2021. DOI: 10.5815/ijisa.2021.04.01

Golam Mostafa, Ikhtiar Ahmed, Masum Shah Junayed, "Investigation of Different Machine Learning Algorithms to Determine Human Sentiment Using Twitter Data", International Journal of Information Technology and Computer Science(IJITCS), Vol.13, No.2, pp.38-48, 2021. DOI: 10.5815/ijitcs.2021.02.04

Khalid Mahboob, Fayyaz Ali, Hafsa Nizami, "Sentiment Analysis of RSS Feeds on Sports News – A Case Study", International Journal of Information Technology and Computer Science(IJITCS), Vol.11, No.12, pp.19-29, 2019. DOI: 10.5815/ijitcs.2019.12.02

Hinton, G., Vinyals, O. and Dean, J., 2021. Distilling the Knowledge in a Neural Network. [online] Available at:

C. Buciluˇa, R. Caruana, and A. Niculescu-Mizil. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06, pages 535–541, New York, NY, USA, 2006. ACM.

N. Srivastava, G.E. Hinton, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014.

Dalal AL-Alimi, Yuxiang Shao, Ahamed Alalimi, Ahmed Abdu, "Mask R-CNN for Geospatial Object Detection", International Journal of Information Technology and Computer Science(IJITCS), Vol.12, No.5, pp.63-72, 2020. DOI: 10.5815/ijitcs.2020.05.05

Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, 2016. You Only Look Once: Unified, Real-Time Object Detection. [online] Available at:

Tatoeba: Collection of sentences and translations, 2021. [online] Available at:

Devlin, J., Chang, M., Lee, K. and Toutanova, K., 2021. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. [online] Available at:

Joulin, A., Grave, E., Bojanowski, P. and Mikolov, T., 2021. Bag of Tricks for Efficient Text Classification. [online] Available at:

Kaggle: Your Machine Learning and Data Science Community, 2021. [online] Available at:

Scaleway. 2021. Cloud, Compute, Storage and Network models and pricing. [online] Available at:

Abstract views: 0
PDF Downloads: 0
How to Cite
Korovii , O. (2021). Adaptation of distilling knowledge method in Natural Language Processing for sentiment analysis . COMPUTER-INTEGRATED TECHNOLOGIES: EDUCATION, SCIENCE, PRODUCTION, (45), 78-83.
Computer science and computer engineering