Maillard Sampling for Interactive Machine Learning: Boltzmann Exploration Done Optimally
At the heart of interactive machine learning is the ability to determine which actions to take next to maximize information under given constraints. For example, recommender systems like to suggest products to users that will not only result in high click-through rates but also inform the system about the user's preference to better serve the user in the long run. For these problems (commonly referred to as 'bandit' problems), the PhD dissertation of Maillard (2013) proposed a less-known algorithm that we call Maillard sampling (MS) that can be viewed as a correction to a popular heuristic called Boltzmann exploration. In this talk, we claim that MS is a strong competitor to Thompson sampling, the industrial standard algorithm. We will show that the performance guarantee of MS matches that of Thompson sampling and showcase practical benefits of MS such as enabling computationally-efficient offline evaluation, which has potential to overthrow the throne of Thompson sampling in industry. Besides exciting future research directions enabled by MS, we will also discuss a side topic of other related developments in interactive machine learning including applications in post-fire debris flow prediction supported by Data Science Academy@UA. The research outcomes presented here are partly supported by RII@UA.