Solution Space Techniques for Faster Convergence, an introduction

Photo by Florian Rieder on Unsplash

I’ll take all three doors!

Author’s suggested metasolution to the Monty Hall problem

Many strategies in Machine Learning involve the iterative search of a solution space Ω. We begin with an initial solution ω₀ and update ωₙ to minimize an objective function f(x),

    \[\omega_n \rightarrow \text{argmin}_{\omega \in \Omega} \{ f(\omega) \}\,.\]

To find a global solution within reasonable time, we must employ a stochastic approach: choose ωₙ₊₁ randomly within a Ω-neighborhood of ωₙ while favoring f(ωₙ₊₁) < f(ωₙ).

Local minima in f(x)

The rate of convergence is determined locally at each point and likewise, the topography of f(x) constitutes a primary bottleneck in chasing the global optimum.

Let’s consider a physical example where we’ll have you do the random walking. Suppose there is a circuit breaker with twenty unknown switches {1,…,20} = Ω, and we want to identify the correct switch ω* for one particular outlet. It takes one minute to walk between the breaker and the outlet, so we are unhappy with checking all twenty switches.

Checking switches one at a time

To test a switch, we toggle its state between open and closed, and check whether the outlet has also been toggled. Clearly, no more than twenty flips of switches suffice to identify this outlet. Let this be our direct method, with an expected time to identify the outlet

    \[E(\tau_{direct}) = \frac{1}{20} \sum_{i=1}^{20} i = 1 + \frac{19}{20} + \cdots + \frac{1}{20} = 10.5\,.\]

Convergence to the correct switch is obvious, but we are checking the outlet more times than needed! Here our search involves a literal walk between independent states in the canonical space of switches Ω = {ω₁, …, ω₂₀}, and we land on the one ω* ∈ Ω that identifies the outlet through successive choices αₙ

    \[\omega_{\alpha_1} \rightarrow \omega_{\alpha_2} \rightarrow\cdots\rightarrow \omega^\ast \,.\]

It turns out that we can do much better by walking—no, put your feet away!—on an augmented space instead. With the same fundamental procedure of toggling switches and checking the outlet’s power, we now divide and conquer the switches in groups.

Checking switches in groups

Select among groupings of say, five each, and then dial in to a single switch once all but one group are eliminated.

We thus retain our algorithm, but by appending additional states to visit in the random walk we expect to take fewer steps overall. Here we divide and conquer, but the idea itself is broad. Generally we apply any technique that takes advantage of the relative structure of points we visit in the solution space.

Preconditioning in Linear Algebra

While the above idea is very broad and often used to speed up walks over complicated graphs, preconditioning in Linear Algebra offers a compact and concise example of why we augment the solution space. Consider the steepest descent algorithm in solving

    \[\begin{bmatrix}15 & 54 \\ 54 & 24\end{bmatrix} \begin{bmatrix}x_1 \\ x_2\end{bmatrix} = \begin{bmatrix}1090 \\ 1440\end{bmatrix}\]

Solving the equivalent system

    \[ \begin{bmatrix}15 & 54 \\ 54 & 24\end{bmatrix} \begin{bmatrix}15 & 0 \\ 0 & 24\end{bmatrix}  \begin{bmatrix}x_1^\ast \\ x_2^\ast\end{bmatrix} = \begin{bmatrix}1090 \\ 1440\end{bmatrix}\]

means we get as close to the true solution (20,15) in a single step as we did in two steps without scaling the solution space by \begin{bmatrix}15 & 0 \\ 0 & 24\end{bmatrix}.

    \[\begin{verbatim} Direct step =     1 r =         1090    1440 x =    15.0700   19.9089 step =     2 r =  -196.0618  148.4079 x =    20.7812   15.5859 step =     3 r =   -42.5732  -56.2435 x =    20.1926   14.8083     \end{verbatim}\]

    \[\begin{verbatim} Preconditioned step =     1 M-1 r =    77.8571   60.0000 x =    19.7315   15.2060 step =     2 M-1 r =    -0.5259    0.3981 x =    20.0018   15.0014 step =     3 M-1 r =    -0.0071   -0.0055 x =    20.0000   15.0000 \end{verbatim}\]

The direct and preconditioned variants, x₀ = 0

Convergence is immediately hastened due to the gradient at 0 fitting into the residual more closely under scaling than without. Tetris with gradients, if you will.

Now, returning to the problem of finding the breaker switch, we ultimately land on a solution ω* ∈ Ω ⊂ Ω*, only by stepping first through

    \[\Omega^\ast = \{ 1,2,\ldots, 20\} \cup \{ \langle 1-5 \rangle , \ldots , \langle 16-20\rangle\} \,.\]

In the worst case of checking each individual switch consecutively, we identify the switch in at most twenty steps; whereas with the grouped approach in Ω*

    \[\langle 1 - 5 \rangle \rightarrow \cdots \rightarrow \langle 16-20 \rangle \rightarrow 16 \rightarrow 17 \rightarrow \cdots \rightarrow 20 \,,\]

we check at most three groups and are left to verify only the remaining five independent switches. Convergence is thus guaranteed in nine steps with

    \[E(\tau_{grouped}) = \frac{1}{4} \sum_{i=1}^{4} i + \frac{1}{5} \sum_{i=1}^5 i = 2.5 + 3 = 5.5 \,.\]

Random Walks and Beyond

Beyond systems of equations and optimization problems, we find many examples of solution space techniques in Markov Chain Monte Carlo. In identifying the correct breaker switch, we found a better way to perform a random walk by introducing intermediary steps between entire groups of switches, driving down the expected time to solution.

Congressional districts in New York (Wikipedia)

In congressional redistricting analysis, plans are scrutinized according to the deviation from the stationary distribution. Beginning with an original assignment of towns to districts, we ask whether a proposed plan has been chosen arbitrarily. Specifically, upon simulating the assignment process to a degree that approaches stationarity, is it possible that the proposed plan has been chosen for a very particular purpose? Would it be likely to arise naturally through swaps between neighboring districts?

Next time we’ll introduce the metagraph as a means for speeding convergence of a random walk, show how to generate it from a starting state space Ω, and ooh at how it turns some np problems into :-p—see you later.

All other images are by the author.

Avatar

By Alexander Wei

BA, MS Mathematics, Tufts University

Leave a comment

Your email address will not be published. Required fields are marked *