March 2, 2019 — Support my next blog post, buy me a coffee ☕.
In a recent paper, my colleagues and I considered the deep ReLU network approximation of generalized bandlimited functions \(f:B=[0,1]^d\rightarrow\mathbb{R}\) of the form $$ f(\boldsymbol{x}) = \int_{\mathbb{R}^d}F(\boldsymbol{w})K(\boldsymbol{w}\cdot\boldsymbol{x})d\boldsymbol{w}, $$ with \(\mathrm{supp}\,F\subset[-M,M]^d\), \(M\geq1\), and for some square-integrable function \(F:[-M,M]^d\rightarrow\mathbb{C}\) and analytic \(K:\mathbb{R}\rightarrow\mathbb{C}\). We showed that, for any measure \(\mu\), such functions can be approximated with error \(\epsilon\) in the \(L^2(B,\mu)\)-norm by deep ReLU networks of depth \(L=\mathcal{O}\left(\log_2^2\frac{1}{\epsilon}\right)\) and size \(W=\mathcal{O}\left(\frac{1}{\epsilon^2}\log_2^2\frac{1}{\epsilon}\right)\), up to some constants that depend on \(F\), \(K\), \(\mu\) and \(B\).
Our theorem is based on a result by Maurey, and on the ability of deep ReLU networks to approximate Chebyshev polynomials and analytic functions efficiently.
A famous theorem of Carathéodory states that if a point \(x\in\mathbb{R}^d\) lies in the convex hull of a set \(P\) then \(x\) can be written as the convex combination of at most \(d+1\) points in \(P\). Maurey's theorem is an extension of Carathéodory's result to the infinite-dimensional case of Hilbert spaces.
Theorem (Maurey). Let \(H\) be a Hilbert space with norm \(\Vert\cdot\Vert\). Suppose there exists \(G\subset H\) such that for every \(g\in G\), \(\Vert g\Vert\leq b\) for some \(b>0\). Then, for every \(f\) in the convex hull of \(G\) and every integer \(n\geq 1\), there is a \(f_n\) in the convex hull of \(n\) points in \(G\) and a constant \(c>b^2-\Vert f\Vert^2\) such that \(\Vert f - f_n\Vert^2\leq \frac{c}{n}\).
In practice, we used Maurey's theorem to show that there exists $$ \begin{align} f_{\epsilon}(\boldsymbol{x}) = \sum_{j=1}^{\lceil 1/\epsilon^2\rceil}b_j\big[\cos(\beta_j)\mathrm{Re}(K(\boldsymbol{w}_j\cdot\boldsymbol{x})) - \sin(\beta_j)\mathrm{Im}(K(\boldsymbol{w}_j\cdot\boldsymbol{x}))\big], \end{align} $$ with \(\sum_{j=1}^{\lceil 1/\epsilon^2\rceil}\vert b_j\vert \leq C_F\) and \(\beta_j\in\mathbb{R}\), such that $$ \Vert f_{\epsilon}(\boldsymbol{x}) - f(\boldsymbol{x})\Vert_{L^2(\mu, B)} \leq 2C_F\sqrt{\mu(B)}\epsilon. $$ In other words, the function \(f\) is approximated by a linear combination of analytic functions with error \(2C_F\sqrt{\mu(B)}\epsilon\) in the \(L^2(B,\mu)\)-norm. The task of approximating \(f\) by deep ReLU networks has been reduced to approximating analytic functions by deep networks.
The Chebyshev polynomials of the first kind play a central role in approximation theory. They are defined on \([-1,1]\) via the three-term recurrence relation $$ T_n(x) = 2xT_{n-1}(x) - T_{n-2}(x), \;\; n\geq 2, $$ with \(T_0=1\) and \(T_1(x) = x\). We showed that deep ReLU networks can implement the recurrence relation efficiently. Since truncated Chebyshev series can approximate analytic functions exponentially well, so can deep networks.
2019 Deep networks and the Kolmogorov–Arnold theorem