Optimal Dynamic Mechanism Design in Multi-Armed Bandit Processes

This work addresses revenue-optimal dynamic mechanism design in environments where agents' types evolve over time based on their public and private information, within the context of multi-armed bandit processes.

Microsoft Research945 views01:16:09

🔥 Related Trending Topics

LIVE TRENDS

This video may be related to current global trending topics. Click any trend to explore more videos about what's hot right now!

THIS VIDEO IS TRENDING!

This video is currently trending in Thailand under the topic 'สภาพอากาศ'.

About this video

We consider the problem of revenue-optimal dynamic mechanism design in settings where agents' types evolve over time as a function of their (both public and private) experience with items that are auctioned repeatedly over an infinite horizon. A central question here is understanding what natural restrictions on the environment permit the design of optimal mechanisms (note that even in the simpler static setting, optimal mechanisms are characterized only under certain restrictions). We provide a structural characterization of a natural ``separable'' multi-armed bandit environment (where the evolution and incentive structure of the a-priori type is decoupled from the subsequent experience in a precise sense) where dynamic optimal mechanism design is possible. Here, we present the Virtual Index Mechanism, an optimal dynamic mechanism, which maximizes the (long term) virtual surplus using the classical Gittins algorithm. The mechanism optimally balances exploration and exploitation, taking incentives into account. We pay close attention to the applicability of our results to the (repeated) ad auctions used in sponsored search, where a given ad space is repeatedly allocated to advertisers. The value of an ad allocation to a given advertiser depends on multiple factors such as the probability that a user clicks on the ad, the likelihood that the user performs a valuable transaction (such as a purchase) on the advertiser's website and, ultimately, the value of that transaction. Furthermore, some of the private information is learned over time, for example, as the advertiser obtains better estimates of the likelihood of a transaction occurring. We provide a dynamic mechanism that extracts the maximum feasible revenue given the constraints imposed by the need to repeatedly elicit information. One interesting implication of our results is a certain revenue equivalence between public and private experience, in these separable environments. The optimal revenue is no less than if agents' private experience (which they are free to misreport, if they are not incentivized appropriately) were instead publicly observed by the mechanism.

Video Information

Views
945

Total views since publication

Likes
9

User likes and reactions

Duration
01:16:09

Video length

Published
Aug 17, 2016

Release date

Quality
sd

Video definition

Tags and Topics

This video is tagged with the following topics. Click any tag to explore more related content and discover similar videos:

Tags help categorize content and make it easier to find related videos. Browse our collection to discover more content in these categories.