Grasping the Intuition of Algorithms

The Example of The Multiplicative Weights Update Algorithm

Nov 18, 2024

"Algorithms are like recipes. They tell you how to do something, but they don't tell you why." - Edsger W. Dijkstra

Algorithms are now ubiquitous in our everyday lives. They are guiding our choices, shaping our experiences, and influencing our decisions in ways we often don't even realize. From the moment we wake up to the time we drift off to sleep, algorithms are constantly operating in the background..

For people who are designers or developers of algorithms, rather than merely consuming them, the study of algorithms offer an amazing rich world of patterns of procedural (or computational) thinking which can be applied to other fields.

One such pattern was discussed in the post “Variation, Constraint and Selection (VCS) Pattern”

Algorithmic patterns bear resemblance to the patterns described by Christopher Alexander in his book “A Timeless Way of Building”. And these Alexander patterns in turn inspired “The Gang of Four’s” book “Design Patterns” in their seminal work for object-oriented designs.

"Each pattern is a three-part rule, which expresses a relation between a certain context, a problem, and a solution." - Christopher Alexander

When we study algorithms we find that some are very elegantly designed, characterized by an economical number of steps. Such examples are showcased in “Programming Pearls” by Jon Bentley or “Programming Gems” by Jeffrey E. F. Friedl and others. Here a program is to be understood as an algorithm implemented in some programming language. Regrettably, in the practical realm of Software Engineering, productivity is frequently gauged by lines of code, which can disadvantage elegant solutions.

Trade-offs are common in algorithm design. Sometimes time (faster algorithm) is traded off against space (using more memory), or readability vs performance, generality vs specialization, and accuracy vs speed.

In general, by studying algorithms we can gain intuitions and skills, including:

Problem-Solving Skills: Understanding how different algorithms approach problems can enhance your ability to think critically and devise solutions to complex issues.
Abstraction: Learning to abstract problems and identify patterns helps in designing algorithms that are generalizable and can be applied to a wide range of scenarios.
Algorithmic Paradigms: Familiarity with common paradigms like divide and conquer, dynamic programming, greedy algorithms, and backtracking provides a toolkit for tackling diverse problems.
Heuristic Thinking: Sometimes, exact solutions are computationally expensive, and heuristic methods are used to find approximate solutions. Studying algorithms can teach you how to develop and evaluate heuristics.
Probabilistic Algorithms: Techniques like random sampling, random walks, and randomized data structures are powerful tools in algorithm design. By studying probabilistic algorithms, you gain a broader perspective and learn to leverage randomness to create efficient and effective solutions.
Data Structures: Algorithms often rely on specific data structures to store and manipulate data efficiently. Studying algorithms deepens your understanding of these structures and their trade-offs.
Trade-offs: Recognizing the trade-offs between different approaches (e.g., time vs. space) is crucial for making informed decisions in algorithm design.
Complexity Analysis: Being able to analyze the complexity of an algorithm helps in predicting its performance and optimizing it.
Parallelism and Concurrency: Many modern algorithms are designed to run in parallel or concurrent environments, which can significantly improve performance. Understanding these concepts is increasingly important.

The Example of The Multiplicative Weights Update Algorithm

Following the general discussions, I would like to discuss a specific algorithm, the Multiplicative Weights Update Algorithm (or MWUA) which has emerged in various forms and under different names independently. For an overview of the algorithm's history, refer to its Wikipedia page.

It is more accurate to call MWUA a Meta-Algorithm (or a template or pattern) because it allows many minor variations), It belongs to the larger class of Sequential decision-making under uncertainty.

One application for motivation is the application to the Problem of Prediction from Expert Advice, is as follows:

Imagine you want to predict the stock market every day. You're not an expert, but you have access to several financial analysts (your "experts"). Each analyst gives you their prediction (buy, sell, or hold). Your goal is to make your own prediction and, over time, perform nearly as well as the best analyst, even though you don't know in advance which analyst will be the best.

The algorithm is very simple, it has only six lines of pseudo code (source Jeremy Kun The Reasonable Effectiveness of the Multiplicative Weights Update Algorithm. )

The General Formulation of the MWUA

The Multiplicative Weights Update Algorithm (MWUA) is a versatile algorithmic technique used for decision-making, prediction, game theory, and algorithm design. The algorithm iteratively updates a set of weights associated with a collection of objects. This process aims to identify the "best" object over time, even when the rewards or penalties associated with each object are uncertain or adversarial. Here is a breakdown of the algorithm:

● Initialization: Start with a set of objects, each having an initial weight of 1.

● Iterative Rounds: For a predetermined number of rounds, perform the following steps:

Object Selection: Randomly choose an object based on the current weights. Objects with higher weights are more likely to be chosen.
Outcome Observation: Observe the outcome or event associated with the chosen object
Reward Calculation: Calculate the reward (or penalty) based on the chosen object and the observed outcome
Weight Update: Multiply the weight of each object by a factor determined by the object's reward and a learning rate parameter. Higher rewards lead to larger multiplicative factors, increasing the weight.

The key to the MWUA's success lies in its ability to adjust to the feedback received. By iteratively updating the weights, the algorithm gradually favors objects that consistently yield higher rewards while diminishing the influence of those that perform poorly.

In the above case, the objects are the analyst experts, the rewards are the gain or loss from the decision made.

In pseudo code the a MWUA algorithm is a follows (two versions, exponential and multiplicative updating):

there are N experts, one learner who has access to all predictions of the experts
we assume the problem is repeated over T time steps (also called online learning or sequential decision making)
p_i(t): Prediction of expert i at time t
p(t): Learner's prediction at time t. This is a function of p_i(t). p(t) can be a weighted majority voting rule or a randomized selection of i with a likelihood proportional to their weights or some other function.
l(p, y): Loss function (bounded between 0 and 1 for simplicity) Gain is negative loss
The learner's goal is to make accurate predictions on a sequence of events, in other words, to minimize the cumulative total loss Σ l(p,y) over T. We will see that at the same time some version of the algorithm can also maximize entropy or diversity while minimizing loss.

In this case the Multiplicative Weights Update Algorithm (MWUA) for the prediction from expert advice problem is as follows (we give two versions, with exponential or multiplicative forms):

MWUA - Exponential Form:

Initialization: Assign initial weights w_i(1) = 1 for each expert i.
For each time step t = 1, 2, ..., T: a. Calculate expert weights:

      w_i(t) = exp(-η * Σ_{j=1}^{t-1} l(p_i(j), y(j)))

b. Normalize weights:

      W(t) = Σ_i w_i(t)
      v_i(t) = w_i(t) / W(t)

c. Learner's prediction: This can be done in various ways. One common approach is a weighted average of expert predictions (if the predictions are real-valued):

      p(t) = Σ_i v_i(t) * p_i(t)

      Or follow the expert with the highest weight (if predictions are discrete):
      ```
           p(t) = p_i*(t) where  i* = argmax_i v_i(t)
      ```

d. Observe outcome y(t) and incur loss l(p(t), y(t)).

repeat step 2a for exponential updating of weights

MWUA - Multiplicative Form:

Initialization: Assign initial weights w_i(1) = 1 for each expert i.
For each time step t = 1, 2, ..., T:
a. Normalize weights:

      W(t) = Σ_i w_i(t)
      v_i(t) = w_i(t) / W(t)

b. Learner's prediction: (Same as in the exponential form)

      p(t) = Σ_i v_i(t) * p_i(t)  // For real-valued predictions
      p(t) = p_i*(t) where  i* = argmax_i v_i(t) // For discrete predictions

c. Observe outcome y(t).
d. Update weights:

      w_i(t+1) = w_i(t) * (1 - η * l(p_i(t), y(t)))

repeat step 2a

The key difference of the two forms lies in their weight update:

The exponential form updates weights based on the cumulative loss up to the current time step, while the multiplicative form updates based on the loss in the current time step.

The Hedge algorithm is an example of MWUA using the exponential form

Mathematically the two versions are approximately equivalent, due to the fact that for small values of u, by Taylor expansion:

exp(u) ~~ 1+u

Henceforth we will call both the exponential and multiplicative forms as equivalent, and both are members of MWUA.

Conclusions

The Multiplicative Weights Update Algorithm (MWUA) is straightforward, and it has been demonstrated to perform effectively, as evidenced by Arora's work.

The most important intuition from the algorithm is the generality of the problem and the surprising and creative way it is applied in various fields. It can be viewed as a consensus algorithm, trying to combine various experts having often contradictory advice. The key lies in the multiplicative weights update formula of penalizing and rewarding experts.

One of the first applications was by Hannan in the context of repeated zero-sum games. Here the “experts” are the strategies chosen by a player. The opponent may be an adversary, and the goal is regret minimization.

Thomas Cover's paper on Universal Portfolios applies the principles of the Multiplicative Weights Update (MWU) algorithm to a fascinating problem: online portfolio selection. Imagine you're an investor who wants to distribute your wealth across a set of assets (stocks, bonds, etc.) and rebalance your portfolio daily. You don't know how the market will behave in the future, but you want to perform as well as the best constant rebalanced portfolio (CRP) in hindsight. A CRP maintains fixed proportions of wealth in each asset and rebalances daily to maintain those proportions.

An application to machine learning was studied by Freund and Schapire, in the context of boosting. Boosting is a technique to combine weak learners into a single "strong learner" with significantly improved performance. It's like the wisdom of the crowd, but carefully weighted. In the case (Adaboost), the “experts” are not the weak learners but training examples. AdaBoost increases the weights of the misclassified examples and decreases the weights of the correctly classified examples. This forces the next weak learner to focus more on the harder examples.

The MWUA is frequently utilized in various constrained optimization problems, as discussed in the paper by Arora et al. Can you guess what the “experts” here are?

In this context, the "experts" are not potential solutions themselves, but rather algorithms or methods that try to satisfy the individual constraints. Each expert is associated with a single constraint g_i(x). An expert's "prediction" is whether the current solution x satisfies its associated constraint or not.

The MWUA applied to optimization is unique, it is different from the Simplex kind of algorithm, or a gradient descent or an interior point method.

A very interesting recent article, The Game Theory of Life points out the parallels between natural evolution and the MWUA. Natural selection values not just fitness, but also genetic diversity (entropy). MWUA Hedging maintains a distribution across "experts" by not allocating all its weight to the currently top-performing expert. Instead, it hedges by distributing weight among several experts. By doing so, Hedging with Genotypes as "experts" preserves diversity and robustness.

Further reading:

Arora, Sanjeev Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164.
Jeremy Kun The Reasonable Effectiveness of the Multiplicative Weights Update Algorithm
Multiplicative weight update method, Wikipedia
Hannan J (1957) Approximation to Bayes risk in repeated play. Contributions to the Theory of Games, eds Dresher M, Tucker AW, Wolfe P (Princeton University Press, Princeton), Vol 3, pp 97–139.
Thomas M (1991) Cover. Universal portfolios. Math Finance 1(1):1–29.
Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. Computational Learning Theory (Springer, Berlin), pp 23–37.
The Game Theory of Life: An insight borrowed from computer science suggests that evolution values both fitness and diversity. A Quanta Magazine article describing the use of the method to evolutionary biology in a paper by Erick Chastain, Adi Livnat, Christos Papadimitriou, and Umesh Vazirani

Extended_Brain

Discussion about this post