Speaker: Vianney Perchet, Ecole Normale Superieure, Paris
We consider revenue maximization in stochastic dynamic pricing where the distribution of buyers' private values is supported on an unknown set of points of unknown cardinality. This choice of setting is well-suited to model scenarios in which buyers are grouped in an unknown number of latent types, characterized by their private values for the good on sale. We focus on the construction of learning algorithm, using ideas and techniques from multi-armed bandit literature. We will show that if there are only two different possible, but unknown, private values, then the learning cost (called ``regret'') is not bigger than if the values are known (but not the demand at those points).We also consider the case of arbitrarily large number of private values, and show that logarithmic regret is only possible under some specific additional assumptions on the model and we provide such an algorithm.