I’ll take all three doors!
Author’s suggested metasolution to the Monty Hall problem
Many strategies in Machine Learning involve the iterative search of a solution space Ω. We begin with an initial solution ω₀ and update ωₙ to minimize an objective function f(x),
To find a global solution within reasonable time, we must employ a stochastic approach: choose ωₙ₊₁ randomly within a Ω-neighborhood of ωₙ while favoring f(ωₙ₊₁) < f(ωₙ).
The rate of convergence is determined locally at each point and likewise, the topography of f(x) constitutes a primary bottleneck in chasing the global optimum.
Let’s consider a physical example where we’ll have you do the random walking. Suppose there is a circuit breaker with twenty unknown switches {1,…,20} = Ω, and we want to identify the correct switch ω* for one particular outlet. It takes one minute to walk between the breaker and the outlet, so we are unhappy with checking all twenty switches.
To test a switch, we toggle its state between open and closed, and check whether the outlet has also been toggled. Clearly, no more than twenty flips of switches suffice to identify this outlet. Let this be our direct method, with an expected time to identify the outlet
Convergence to the correct switch is obvious, but we are checking the outlet more times than needed! Here our search involves a literal walk between independent states in the canonical space of switches Ω = {ω₁, …, ω₂₀}, and we land on the one ω* ∈ Ω that identifies the outlet through successive choices αₙ
It turns out that we can do much better by walking—no, put your feet away!—on an augmented space instead. With the same fundamental procedure of toggling switches and checking the outlet’s power, we now divide and conquer the switches in groups.
Select among groupings of say, five each, and then dial in to a single switch once all but one group are eliminated.
We thus retain our algorithm, but by appending additional states to visit in the random walk we expect to take fewer steps overall. Here we divide and conquer, but the idea itself is broad. Generally we apply any technique that takes advantage of the relative structure of points we visit in the solution space.
Preconditioning in Linear Algebra
While the above idea is very broad and often used to speed up walks over complicated graphs, preconditioning in Linear Algebra offers a compact and concise example of why we augment the solution space. Consider the steepest descent algorithm in solving
Solving the equivalent system
means we get as close to the true solution (20,15) in a single step as we did in two steps without scaling the solution space by .
The direct and preconditioned variants, x₀ = 0
Convergence is immediately hastened due to the gradient at 0 fitting into the residual more closely under scaling than without. Tetris with gradients, if you will.
Now, returning to the problem of finding the breaker switch, we ultimately land on a solution ω* ∈ Ω ⊂ Ω*, only by stepping first through
In the worst case of checking each individual switch consecutively, we identify the switch in at most twenty steps; whereas with the grouped approach in Ω*
we check at most three groups and are left to verify only the remaining five independent switches. Convergence is thus guaranteed in nine steps with
Random Walks and Beyond
Beyond systems of equations and optimization problems, we find many examples of solution space techniques in Markov Chain Monte Carlo. In identifying the correct breaker switch, we found a better way to perform a random walk by introducing intermediary steps between entire groups of switches, driving down the expected time to solution.
In congressional redistricting analysis, plans are scrutinized according to the deviation from the stationary distribution. Beginning with an original assignment of towns to districts, we ask whether a proposed plan has been chosen arbitrarily. Specifically, upon simulating the assignment process to a degree that approaches stationarity, is it possible that the proposed plan has been chosen for a very particular purpose? Would it be likely to arise naturally through swaps between neighboring districts?
Next time we’ll introduce the metagraph as a means for speeding convergence of a random walk, show how to generate it from a starting state space Ω, and ooh at how it turns some np problems into :-p—see you later.
All other images are by the author.