Treatment Allocations Based on Multi-Armed Bandit Strategies
In practice of medicine, multiple treatments are often available to treat individual patients. The task of identifying the best treatment for a specific patient is very challenging due to patient inhomogeneity. Multi-armed bandit with covariates provides a framework for designing effective treatment allocation rules in a way that integrates the learning from experimentation with maximizing the benefits to the patients along the process.
In this talk, we present new strategies to achieve asymptotically efficient or minimax optimal treatment allocations. Since many nonparametric and parametric methods in supervised learning may be applied to estimating the mean treatment outcome functions (in terms of the covariates) but guidance on how to choose among them is generally unavailable, we propose a model combining allocation strategy for adaptive performance and show its strong consistency. When the mean treatment outcome functions are smooth, rates of convergence can be studied to quantify the effectiveness of a treatment allocation rule in terms of the overall benefits the patients have received. A multi-stage randomized allocation with arm elimination algorithm is proposed to combine the flexibility in treatment outcome function modeling and a theoretical guarantee of the overall treatment benefits. Numerical results are given to demonstrate the performance of the new strategies.
The talk is based on joint work with Wei Qian
(Refreshments will be served.)