Workshop 2: Speakers and Abstracts
All the videos of the lectures are available here!
You can download the PDF booklet that will be distributed at the workshop.
Convex bandit, part I |
|||
Abstract: In this two-parts lecture Ronen and I will discuss our solution to the decade-old problem of characterizing the minimax regret in bandit convex optimization. Part I will focus on the information theoretic aspects of the proof, while part II will be about a new geometric question on convex functions that we had to answer.
|
Nonparametric learning, scale invariance, and sketching techniques for online algorithms |
|||
Abstract: We describe recent results and work in progress on algorithmic techniques for online learning. First, we discuss a simple and modular approach to nonparametric learning in data streams. This method covers the input space using base classifiers that are locally trained. A good balance between model complexity and predictive accuracy is achieved by dynamically adapting the cover to the local complexity of the classification problem. Next, we consider the problem of designing online algorithms that are invariant to linear transformation of the data. As invariance relies upon maintaining second-order information, we show how the associated computational cost can be tamed via the use of sketching techniques without giving up regret guarantees.
|
Combinatorial bandits revisited |
|||
Abstract: We investigate stochastic and adversarial combinatorial multi-armed bandit problems. In the stochastic setting, we first derive problem-specific regret lower bounds, and analyze how these bounds scale with the dimension of the decision space. We then propose COMBUCB, algorithms that efficiently exploit the combinatorial structure of the problem, and derive finite-time upper bound on their regrets. These bounds improve over regret upper bounds of existing algorithms, and we show numerically that COMBUCB significantly outperforms any other algorithm. In the adversarial setting, we propose two simple algorithms, namely COMBEXP-1 and COMBEXP-2 for semi-bandit and bandit feedback, respectively. Their regrets have similar scaling as state-of-the-art algorithms, in spite of the simplicity of their implementation. More information at http://arxiv.org/abs/1502.03475.
|
Placing Dynamic Content in Local Caches |
|||
Abstract: In this talk we will propose a means of addressing a fundamental limitation for the adoption of caching for wireless access networks due to small population sizes. This shortcoming is due to two main challenges: (i) making timely estimates of varying content popularity and (ii) inferring popular content from small samples. We propose a framework that alleviates such limitations. To timely estimate varying popularity in a context of a single cache we propose an Age-Based Threshold (ABT) policy which caches all contents requested more times than a certain threshold related to the content age. We show that ABT is asymptotically hit rate optimal in the many contents regime, which allows us to obtain the first characterization of the optimal performance of a caching system in a dynamic context. We then address small sample sizes focusing on L local caches and one global cache. On one hand we show that the global cache learns L times faster by aggregating all requests from local caches, which improves hit rates. On the other hand, assuming correlations, aggregation washes out local characteristics which leads to a hit rate penalty. This motivates coordination mechanisms that combine global learning of popularity scores in clusters and LRU with prefetching.
|
Convex bandit, part II |
|||
Abstract: In this two-parts lecture Sébastien and I will discuss our solution to the decade-old problem of characterizing the minimax regret in bandit convex optimization. Part I will focus on the information theoretic aspects of the proof, while part II will be about a new geometric question on convex functions that we had to answer.
|
Optimizing ad-auction reserve prices when the outcome is measured on an aggregate basis |
|||
Abstract: Selling an ad-space through automated ad-auctions with an optimal reserve price can be a very challenging task when the outcome of a particular reserve price configuration is observed on an aggregate basis at a low frequency.
|
On Bayesian index policies for sequential resource allocation |
|||
Abstract: In the context of a parametric bandit model, I will investigate the interplay between frequentist and Bayesian regret and frequentist and Bayesian strategies. The focus will be on index policies inspired by a Bayesian view on the multi-armed bandit problem but that perform well with respect to the frequentist regret. The notion of index policy dates back to the strategy introduced by Gittins (1979) to solve the Bayesian, discounted, multi-armed bandit problem. We will see that the Gittins indices can also be useful in the non-discounted case. Moreover, some existing approximations of these (hard to compute) indices suggest alternative exploration rates that could be used in UCB-like algorithms. I will then talk about Bayes-UCB, a simple algorithm based on quantiles of posterior distributions on the means of the arms, that is (asymptotically) optimal for some classes of rewards distributions.
|
The moment-LP and moment-SOS approach in polynomial optimization |
|||
Abstract: We review basic properties of the moment-LP and moment-SDP hierarchies for polynomial optimization and compare them. We also illustrate how to use such a methodology in two applications outside optimization. More details in this abstract.
|
The Hidden World of Bandits |
|||
Abstract: Although the standard multi-armed bandit framework is by now very well understood and the available algorithms are optimal, in practice the learning performance is not always satisfactory. In order to alleviate this problem, a common approach is to assume the existence of a "structure" characterizing the shape of the bandit problem. Whenever such structure is known, the learning process can be significantly improved. On the other hand, when the structure is "hidden", one of the main challenges is how to learn it at the same time as minimizing the regret. While many forms of structures have been proposed (e.g., regularity of the reward function across arms), in this talk we show how modern tensor decomposition methods for latent variable models can be paired with standard UCB-like strategies to unveil the hidden structure at the same time as learning the best arm. In particular, we will review an application of tensor decomposition in the standard multi-armed bandit and in the more challenging POMDP problem.
|
Sampling convex bodies with Projected Monte Carlo |
|||
Abstract: Given a convex body K in high dimension n, we consider the Markov chain whose transition consists in adding a small Gaussian and projecting back on K. We shall show that this chain approches the uniform measure on K in at most n^7 steps. The proof extends to the case where a convex potential is added, allowing us to sample from a smooth log-concave density restricted to K. The talk is based on a joint work with Sébastien Bubeck and Ronen Eldan.
|
Subsampling, Concentration and Multi-armed bandits |
|||
Abstract: In the first part of this talk, we will provide novel concentration results for the setting of sampling without replacement from a finite population (sub-sampling), improving on the fundamental result of Serfling (1974), and extending it to obtain Bernstein and empirical Berstein concentration bounds. We will then move to the stochastic multi-armed bandit problem, a popular model of the exploration/exploitation trade-off in sequential decision making. We introduce a novel algorithm that is based on sub-sampling. Despite its simplicity, we show that the algorithm demonstrates excellent empirical performances against state-of-the-art algorithms, including Thompson sampling and KL-UCB. The algorithm is very flexible, and does need to know a set of reward distributions in advance nor the range of the rewards. We provide a detailed experimental study comparing the algorithm to the state of the art, the main intuition that explains the striking results, and conclude with a finite-time regret analysis for this algorithm in the simplified two-arm bandit setting.
|
The Baire Metric and Ultrametric for Linear Computational Time Hierarchical Clustering: Applications and Implementations |
|||
Abstract: The context of this work is search and discovery through clustering, i.e. unsupervised classification. Case studies of the clustering of large data sets are reviewed in astronomy and, in high dimensions, in chemistry. To have linear computational time hierarchical clustering, the Baire, or longest common prefix, metric, and simultaneously ultrametric, is used. For high dimensions, random projection is used. Since hierarchically clustered data can be scaled in one dimension, seriation or unidimensional scaling is an important objective. This talk will include comparison with other uses of random projection; other low dimensional mapping and clustering approaches that are related; and an overview of the (mainly) R software used.
|
Two problems in stochastic gradient descent: choosing the right learning rate and dealing with temporal dependencies (non-iid) |
|||
Abstract: We will present two ideas for dealing with two problems in online statistical learning: choosing the right step size for stochastic gradient descents and similar algorithms, and dealing with strong temporal dependencies. The performance (cumulated loss) of algorithms based on stochastic gradients is highly dependent on the chosen step size. Fortunately it is possible to approximate the derivative of cumulated loss with respect to the step size in an online fashion; the resulting gradient descent on the step size results in a reasonable practical performance even if the initial step size varies by several orders of magnitude, although no theoretical results is proved yet. Next, we will consider online learning of the parameters of a dynamical system, such as a hidden Markov model or a recurrent neural network. The usual online algorithms, such as the Kalman filter, usually scale like the third power of dimension. All the more fast algorithms are non-online and require to go back in time at some point (Bellman equations, backpropagation through time, forward-backward algorithm). We will present an online algorithm that maintains a random but *unbiased* estimate of the gradient of the state of the system with respect to its parameters, at a low algorithmic cost.
|
Highly-Smooth Zero-th Order Online Optimization |
|||
Abstract: In this joint work with Francis Bach, we consider online convex optimization with noisy zero-th order information, that is noisy function evaluations at any desired point. We focus on problems with high degrees of smoothness, such as online logistic regression. We show that as opposed to gradient-based algorithms, high-order smoothness may be used to improve estimation rates, with a precise dependence on the degree of smoothness and the dimension. In particular, we show that for infinitely differentiable functions, we recover the same dependence on sample size as gradient-based algorithms, with an extra dimension-dependent factor. This is done for convex and strongly-convex functions, with finite horizon and anytime algorithms.
|
Contextual Distributed Models for Sequences |
|||
Abstract: Generative probabilistic models have many practical applications for analyzing sequences, in e.g. natural language processing and web log mining. For such models, taking into account the context (e.g. the person who is writing) is important. While a purely probabilistic approach would introduce further random variables to capture the context, in distributed generative models like recurrent or convolutional neural networks, an exciting possibility is to use a distributed representation of the context, that modifies the generative process. This is the topic of this talk.
|
Constraint-based Reputation Modelling on microblogs |
|||
Abstract: In this talk we review modeling corporate Entities' Online e-Reputation based on PLS-pm approaches. Dimensions to analyze e-Reputation are set up by analyst as latent variables. Machine Learning (ML) Natural Language Processing (NLP) approaches are used to label large sets of text passages. For each Dimension, several estimations of the relationship with each text passage are computed as well as Opinion and Priority. Automatic path modeling algorithm explains Opinion or Priority scores based on selected Dimensions. Even though PLS-pm is based on standard Factor Analysis, it includes an iterative algorithm that makes it incompatible with objective function optimization. However there is an implicit bayesian latent probabilistic model that explains Model Robustness.
|
Robust sequential learning with applications to the forecasting of electricity consumption and of exchange rates |
|||
Abstract: Sometimes, you feel you're spoilt for choice: there are so many good predictors that you could use! Why select and focus on just one? I will review the framework of robust online aggregation (also known as prediction of individual sequences or online aggregation of expert advice). This setting explains how to combine base forecasts provided by ensemble methods. No stochastic modeling is needed and the performance achieved is comparable to the one of the best (constant convex combination of) base forecast(s). I will illustrate the technology on various data sets, including electricity consumption and exchange rates. More importantly, I will point out open issues, both on the theoretical and on the practical sides.
|