Generalized Linear Spectral Bandits
The goal in a Contextual Multi Arm Bandit problem is to maximize the learner's cumulative reward over successive rounds given context about each action. We aim to choose the arm that maximizes the cumulative reward, or equivalently, minimizes the regret, that is the difference between the optimal reward and the reward received for playing the arm chosen by the algorithm. We investigate Generalized Linear Models (GLM) for an Upper Confidence Bound based algorithm in the Spectral Bandit setting. In this setting our data is given as a graph with the hopes of achieving regret independent of the number of nodes in the graph. Generalized Linear bandits help us to better model real world applications via their nonlinear link function. However, by introducing a link function we have dependency on the non-linearity of our function, which can be shown to positively affect our regret for a specific choice of the link function.