Converence of Gradient Descent
Problem (Differentiable Function). We want to minimize a differentiable function $f: \mathbb{R}^d \rightarrow \mathbb{R}$. We require that the problem is well-posed, in the sense that $\argmin f \neq 0$.
Algorithm (GD). Let $x^0 \in \mathbb{R}^d$, and let $\gamma > 0$ be a step size. The Gradient Descent (GD) algorithm defines a sequence $(x^t)_{t\in \mathbb{N}}$ satisfying
$$
x^{t+1} = x^t - \gamma \nabla f(x^t)
$$
- Theorem 1: sublinear convergence under the assumption that $f$ is convex
- Theorem 2: linear convergence (a faster form of convergence) under the stronger assumption that $f$ is $\mu$-stongly convex
- Theorem 3: convergence of nonconvex function under the assumption that $f$ is L-PL
1. Convergence for Convex and Smooth Functions
Theorem 1. Consider the Problem (Differentiable Function) and assume that $f$ is convex and $L$-smooth, for some $L > 0$. Let $(x^t)_{t \in \mathbb{N}}$ be the sequence of iterates generated by GD algorithm, with a stepsize satisfying $0 < \gamma \leq \dfrac{1}{L}$. Then, for all $x^* \in \argmin f$ for all $t \in \mathbb{N}$ we have that
$$
f(x^t) - \inf f \leq \dfrac{\vert\vert x^0 - x^* \vert\vert^2}{2\gamma t}
$$