Comparison of optimization methods for neural networks training

  • N. Polishchuk LNTU
  • S. Нrinyuk LNTU
  • S. Datsyuk LNTU
Keywords: optimization methods, neural networks, gradient descent method, stochastic gradient, tensorflow, machine learning, convolutional neural networks


Modern methods of training neural networks consist in finding the minimum of some continuous error function. Over the past years, various optimization algorithms have been proposed that use different approaches to update the parameters of the model weights. This article describes the most common optimization methods used in neural networks training process, also provides a comparative analysis of these methods on the example of learning simple convolutional neural network on the MNIST data set. Analysed various implementations of the gradient descent method, impulse methods, adaptive methods, generalized problems of their use.


Kelley, Henry J. (1960). Gradient theory of optimal flight paths. Ars Journal 30(10): 947–954. doi:10.2514/8.5282. (англ.)

Arthur E. Bryson [en] (1961, April). A gradient method for optimizing multi-stage allocation processes. In Proceedings of the Harvard Univ. Symposium on digital computers and their applications. (англ.)

Dreyfus, Stuart (1962). The numerical solution of variational problems. Journal of Mathematical Analysis and Applications 5 (1): 30–45. doi:10.1016/0022-247x(62)90004-5. (англ.)

Dreyfus, Stuart (1973). The computational solution of optimal control problems with time lag. IEEE Transactions on Automatic Control 18 (4): 383–385. doi:10.1109/tac.1973.1100330. (англ.)

Schmidhuber, Jürgen (2015). Deep Learning. Scholarpedia 10 (11): 32832. Bibcode:2015SchpJ..1032832S. doi:10.4249/scholarpedia.32832.(англ.)

Ruder, S. An overview of gradient descent optimization algorithms / S. Ruder // Cornell University Library. – 2016. – URL: https://arxiv. org/abs/1609.04747

Jordan, J. Intro to optimization in deep learning: Gradient Descent/ J. Jordan // Paperspace. Series: Optimization. – 2018. – URL:

Seppo Linnainmaa[en] (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 6-7. (англ.)

Anish Singh Walia Types of Optimization Algorithms used in Neural Networks and Ways to Optimize Gradient Descent – URL:

Fletcher, R. Practical methods of optimization / R. Fletcher. – Wiley, 2000. – 450 p.¬

Abstract views: 1
PDF Downloads: 4
How to Cite
Polishchuk, N., НrinyukS., & Datsyuk, S. (2020). Comparison of optimization methods for neural networks training. COMPUTER-INTEGRATED TECHNOLOGIES: EDUCATION, SCIENCE, PRODUCTION, (35), 177-183. Retrieved from
Computer science and computer engineering