Solar system model with planets orbiting a bright yellow sun against a black background. The solar orbiter circles the periphery in the foreground.
By ESA.

Fast algorithms for N-body problems and beyond

A recurring task in the physical and data sciences is the rapid evaluation of sums of the form (1)

$$u_i = \sum_{j=1}^N G(x_i - y_j) q_j, \quad i=1,2,\ldots,N,$$

where $\{x_i\}_{i=1}^N, \{y_j\}_{j=1}^N \subset \mathbb{R}^d$, $\{ q_j\}_{j=1}^N \subset \mathbb{R}$ and $G: \mathbb{R}^d \to \mathbb{R}$. This is known as the $N$-body problem and was first studied in the context of celestial mechanics. There, $y_j$ is the position of a planet with mass $q_j$ for $j=1,2,\ldots,N$ and $G$ is the gravitational potential.  Computing the sums then amounts to evaluating the gravitational potential at the points $\{x_i\}_{i=1}^N$ (usually, $x_i = y_i$ for $i=1,2,\ldots,N$). Over time, $N$-body problems have emerged in myriad applications, such as  the numerical solution of partial differential equations and kernel methods for data analysis. 

There are $N$ sums in (1) , each of which has $N$ terms. As a consequence, direct summation requires $O(N^2)$ arithmetic operations. This is prohibitively expensive in most modern applications where $N$ can be on the order of trillions.  Fortunately, it turns out that we can usually do much better if we are willing to compute an approximation to the true solution. Instead of trying to compute the values of $\{u_i\}_{i=1}^N$ exactly, we prescribe a desired accuracy $\varepsilon > 0$ and content ourselves with computing these quantities with an error of $\varepsilon$. Remarkably, under this assumption, it now turns out that for the vast majority of $G$ that arise in practice, there exist algorithms with operation counts that scale only linearly with $N$.  We emphasize that $\varepsilon$ can be chosen arbitrarily small. In fact, we can often obtain answers that are even more accurate than what would be produced through direct summation due to the effects of rounding.

One of the first algorithms of this type was the Fast Multipole Method (FMM) which was invented in 1987.  If the kernel function $G$ is the free space fundamental solution of an elliptic partial differential equation, the FMM allows one to evaluate all sums in (1) to accuracy $\varepsilon > 0$ in merely $O(\log(1/\varepsilon)^{d-1} N)$ operations. Variants of the FMM have also been developed that handle more general kernels. 

Here, we will illustrate just one of the key ideas in the FMM. We will restrict ourselves to the case where $d=2$ (we will identify points in $\mathbb{R}^2$ with $\mathbb{C}$) and let $G(r) = \log | r |$.  We will also assume that the points $\{x_i\}_{i=1}^N$ and  $\{y_j\}_{j=1}^N$ are sufficiently separated in space. To make this precise, for any $\delta > 0$, let us define the disk of radius $\delta$ centered at the origin (2)

$$B_\delta =  \{ z \in \mathbb{C} :  |z| \leq \delta \}.$$

We will fix a value $\delta > 0$ and consider the problem of evaluating (3)

$$u_i = \sum_{j=1}^N \log | x_i - y_j | q_j, \quad i=1,2,\ldots,N,$$

where $\{x_i\}_{i=1}^N \subset  \mathbb{C} \backslash B_{2\delta}$, $\{y_j\}_{j=1}^N \subset B_\delta$,  and $\{q_j\}_{j=1}^N \subset \mathbb{R}$. The arrangement of the points is shown in Figure 1 (below). Clearly, direct summation of (3) still requires $O(N^2)$ operations. 

A centered circle of about 1.5cm mediumly filled with red dots. This circle is surrounded by another about 1cm surrounding which is empty. These are encased by a square, about 2cm wider around, with sparse blue dots.
Figure 1. $\{y_j\}_{j=1}^N$ in red and $\{x_i\}_{i=1}^N$ in blue in (3).

If we content ourselves with an approximation, we can improve upon this using the Taylor series expansion of the logarithm (4)

$$\log(z-w) = \log(z) - \sum_{k=1}^{\infty} \frac{1}{k} \left( \frac{w}{z} \right)^k, \quad |w| < |z|.$$

Substituting (4) into (3) and switching the order of summations yields that for any positive integer $p$ (5)

$$u_i = \operatorname{Re} \left[ c_0 \log(x_i) - \sum_{k=1}^{p-1} \frac{c_k}{k}  \frac{1}{x_i^k}  \right] + O \left(\frac{1}{2^p} \right), \quad i=1,2,\ldots,N,$$

where (6)

$$c_k = \sum_{j=1}^N y_j^k q_j, \quad k=0,1,\ldots,p-1. $$

If we truncate (5) to $p$ terms, we can compute approximations to $\{u_i\}_{i=1}^N$ by first computing the coefficients $\{c_k\}_{k=0}^{p-1}$ in $O(Np)$ operations.  Thus, we can choose $p=O(\log(1/\varepsilon))$ to get an approximation that is accurate to within $\varepsilon$ using only $O(\log(1/\varepsilon)N)$ operations, independent of the value of $\delta$. 

To handle the case where points are not physically separated, the FMM subdivides the domain into a hierarchy of boxes and series expansions are used to account for the interactions in boxes that are separated in space. A sketch of this is shown in Figure 2 (below). 

A square with larger squares on the outside, and increasingly smaller squares as they approach the center. All outside squares are densely filled with red dots, while 8 small squares surrounding the center are cyan, with the center small square in blue.
Figure 2. Sketch of one piece of the FMM. All boxes with red points interact with the blue points via a truncated Taylor series expansion of a fixed order. The boxes containing the cyan points require either further subdivision or direct summation.

We will not go into details here, but instead point out how these ideas connect to linear algebra. To see this, we note that the $N$-body problem (1) is equivalent to computing a matrix-vector product (7)

$$\mathbf{u} = \mathbf{G} \mathbf{q},$$

where $\mathbf{G}_{i,j} = G(x_i - y_j)$, $\mathbf{u}_i = u_i$, and $\mathbf{q}_j = q_j$ for $i,j=1,2,\ldots,N$. A fast algorithm for solving the $N$-body problem then amounts to a fast algorithm for performing a matrix-vector product.  In the case of the FMM, the key property that enables this is that certain off-diagonal blocks of $\mathbf{G}$ are approximately low-rank, which  is demonstrated by (5). 

A natural follow-up question is what other operations can we efficiently do with matrices of this form? For instance, is it possible to compute an approximate inverse to $\mathbf{G}$ in only $O(N)$ operations? Now, we are trying to reduce a calculation that normally requires $O(N^3)$ operations to $O(N)$ operations. Moreover, the approximate inverse must be able to be stored using only $O(N)$ floating point numbers.  In recent years, algorithms have been developed that do exactly this, answering the question in the affirmative in many cases. Still, there are many environments of practical importance where suitable algorithms are still not known.  Addressing this has been a focus of my research and requires blending analysis, linear algebra, and physics.
 

Media Resources

Cover image by ESA, solar orbiter and planet orbits Nov 18, 2020.

Primary Category

Tags