March 2, 2019
In a recent paper, my colleagues and I considered the deep ReLU network approximation of generalized bandlimited functions \(f:B=[0,1]^d\rightarrow\mathbb{R}\) of the form $$ f(\boldsymbol{x}) = \int_{\mathbb{R}^d}F(\boldsymbol{w})K(\boldsymbol{w}\cdot\boldsymbol{x})d\boldsymbol{w}, $$ with \(\mathrm{supp}\,F\subset[-M,M]^d\), \(M\geq1\), and for some square-integrable function \(F:[-M,M]^d\rightarrow\mathbb{C}\) and analytic \(K:\mathbb{R}\rightarrow\mathbb{C}\). We showed that, for any measure \(\mu\), such functions can be approximated with error \(\epsilon\) in the \(L^2(B,\mu)\)-norm by deep ReLU networks of depth \(L=\mathcal{O}\left(\log_2^2\frac{1}{\epsilon}\right)\) and size \(W=\mathcal{O}\left(\frac{1}{\epsilon^2}\log_2^2\frac{1}{\epsilon}\right)\), up to some constants that depend on \(F\), \(K\), \(\mu\) and \(B\).
Our theorem is based on a result by Maurey, and on the ability of deep ReLU networks to approximate Chebyshev polynomials and analytic functions efficiently.
A famous theorem of Carathéodory states that if a point \(x\in\mathbb{R}^d\) lies in the convex hull of a set \(P\) then \(x\) can be written as the convex combination of at most \(d+1\) points in \(P\). Maurey's theorem is an extension of Carathéodory's result to the infinite-dimensional case of Hilbert spaces.
Theorem (Maurey). Let \(H\) be a Hilbert space with norm \(\Vert\cdot\Vert\). Suppose there exists \(G\subset H\) such that for every \(g\in G\), \(\Vert g\Vert\leq b\) for some \(b>0\). Then, for every \(f\) in the convex hull of \(G\) and every integer \(n\geq 1\), there is a \(f_n\) in the convex hull of \(n\) points in \(G\) and a constant \(c>b^2-\Vert f\Vert^2\) such that \(\Vert f - f_n\Vert^2\leq \frac{c}{n}\).
In practice, we used Maurey's theorem to show that there exists $$ \begin{align} f_{\epsilon}(\boldsymbol{x}) = \sum_{j=1}^{\lceil 1/\epsilon^2\rceil}b_j\big[\cos(\beta_j)\mathrm{Re}(K(\boldsymbol{w}_j\cdot\boldsymbol{x})) - \sin(\beta_j)\mathrm{Im}(K(\boldsymbol{w}_j\cdot\boldsymbol{x}))\big], \end{align} $$ with \(\sum_{j=1}^{\lceil 1/\epsilon^2\rceil}\vert b_j\vert \leq C_F\) and \(\beta_j\in\mathbb{R}\), such that $$ \Vert f_{\epsilon}(\boldsymbol{x}) - f(\boldsymbol{x})\Vert_{L^2(\mu, B)} \leq 2C_F\sqrt{\mu(B)}\epsilon. $$ In other words, the function \(f\) is approximated by a linear combination of analytic functions with error \(2C_F\sqrt{\mu(B)}\epsilon\) in the \(L^2(B,\mu)\)-norm. The task of approximating \(f\) by deep ReLU networks has been reduced to approximating analytic functions by deep networks.
The Chebyshev polynomials of the first kind play a central role in approximation theory. They are defined on \([-1,1]\) via the three-term recurrence relation $$ T_n(x) = 2xT_{n-1}(x) - T_{n-2}(x), \;\; n\geq 2, $$ with \(T_0=1\) and \(T_1(x) = x\). We showed that deep ReLU networks can implement the recurrence relation efficiently. Since truncated Chebyshev series can approximate analytic functions exponentially well, so can deep networks.