Converence of Gradient Descent

less than 1 minute read

Problem (Differentiable Function). We want to minimize a differentiable function $f: \mathbb{R}^d \rightarrow \mathbb{R}$. We require that the problem is well-posed, in the sense that $\argmin f \neq 0$.

Algorithm (GD). Let $x^0 \in \mathbb{R}^d$, and let $\gamma > 0$ be a step size. The Gradient Descent (GD) algorithm defines a sequence $(x^t)_{t\in \mathbb{N}}$ satisfying $$ x^{t+1} = x^t - \gamma \nabla f(x^t) $$

Theorem 1: sublinear convergence under the assumption that $f$ is convex
Theorem 2: linear convergence (a faster form of convergence) under the stronger assumption that $f$ is $\mu$-stongly convex
Theorem 3: convergence of nonconvex function under the assumption that $f$ is L-PL

1. Convergence for Convex and Smooth Functions

Theorem 1. Consider the Problem (Differentiable Function) and assume that $f$ is convex and $L$-smooth, for some $L > 0$. Let $(x^t)_{t \in \mathbb{N}}$ be the sequence of iterates generated by GD algorithm, with a stepsize satisfying $0 < \gamma \leq \dfrac{1}{L}$. Then, for all $x^* \in \argmin f$ for all $t \in \mathbb{N}$ we have that $$ f(x^t) - \inf f \leq \dfrac{\vert\vert x^0 - x^* \vert\vert^2}{2\gamma t} $$

2. Convergence for Strongly Convex and Smooth Functions

3. Convergence for Polyak-Lojasiewicz and Smooth Functions

Converence of Gradient Descent

1. Convergence for Convex and Smooth Functions

2. Convergence for Strongly Convex and Smooth Functions

3. Convergence for Polyak-Lojasiewicz and Smooth Functions

You may also enjoy

[ROS2] Jazzy Jalisco on Apple M1 with Devcontainer and Docker

Convergence Analysis of ODE Models for Accelerated First-Order Methods

[AI Semiconductor] In-Memory Computing

[AI Semiconductor] GPU and TPU