## Figures from this paper

- figure 1
- figure 2
- figure 3
- figure 4

## Topics

Regret (opens in a new tab)Social Networks (opens in a new tab)Learning Algorithm (opens in a new tab)Adversarial Bandits (opens in a new tab)Network Structure (opens in a new tab)Multi-armed Bandit (opens in a new tab)

## 9 Citations

- Stephen PasterisAlberto Rumi M. Herbster
- 2024

Computer Science, Mathematics

The CBA algorithm is proposed, which exploits the assumption that one action corresponding to the learner's abstention from play, has no reward or loss on every trial, and is the first to achieve bounds on the expected cumulative reward for general confidence-rated predictors.

- Junghyun LeeLaura SchmidSeYoung Yun
- 2023

Computer Science

ArXiv

This work provides a rigorous regret analysis for the standard flooding protocol combined with the UCB policy, and proposes a new protocol called Flooding with Absorption (FWA), which is verified empirically that using FWA leads to significantly lower communication costs despite minimal regret performance loss compared to flooding.

- PDF

- Stephen PasterisChris HicksV. Mavroudis
- 2023

Computer Science, Mathematics

NeurIPS

The nearest neighbour rule is adapted to the contextual bandit problem and the algorithm is extremely efficient - having a per trial running time polylogarithmic in both the number of trials and actions, and taking only quasi-linear space.

- Juliette AchddouNicolò Cesa-BianchiPierre Laforgue
- 2024

Computer Science

AISTATS

The analysis shows that the regret of $\texttt{MT-CO}_2\texttt{OL}$ is never worse than the bound obtained when agents do not share information, and it is proved that the algorithm can be made differentially private with a negligible impact on the regret.

- Stephen PasterisChris HicksV. Mavroudis
- 2023

Computer Science, Mathematics

ArXiv

The adversarial contextual bandit problem in metric spaces is considered, designing an algorithm in which it can hold out any set of contexts when computing the authors' regret term and hence inherits its extreme computational efficiency.

- Highly Influenced[PDF]

- Nicolò Cesa-BianchiT. CesariR. D. Vecchia
- 2021

Computer Science, Mathematics

ArXiv

This work characterize regret in terms of the independence number of the strong product between the feedback graph and the communication network, which recovers as special cases many previously known bounds for distributed online learning with either expert or bandit feedback.

- Pierre LaforgueA. VecchiaNicolò Cesa-BianchiL. Rosasco
- 2022

Computer Science

ArXiv

AdaTask can be seen as a comparator-adaptive version of Follow-the-Regularized-Leader with a Mahalanobis norm potential, and a variational formulation of this potential reveals how AdaTask jointly learns the tasks and their structure.

- 1
- PDF

- Baojian ZhouYifan SunReza Babanezhad
- 2023

Computer Science, Mathematics

ICML

This work proves an effective regret of $\mathcal{O}(\sqrt{n^{1+\gamma}})$ when suitable parameterized graph kernels are chosen, and proposes an approximate algorithm FastONL enjoying regret based on this relaxation.

- Erhan BayraktarIbrahim EkrenXin Zhang
- 2022

Mathematics, Computer Science

ArXiv

This paper heuristically derive a limiting PDE on Wasserstein space which characterizes the asymptotic behavior of the regret of the forecaster and shows that the problem of obtaining regret bounds and efficient algorithms can be tackled by finding appropriate smooth sub/supersolutions of this parabolic PDE.

## 91 References

- L. E. CelisFarnood Salehi
- 2017

Computer Science, Economics

ArXiv

This paper provides algorithms for this setting, both for stochastic and adversarial bandits, and shows that their regret smoothly interpolates between the regret in the classical bandit setting and that of the full-information setting as a function of the neighbors' exploration.

- N. Cesa-BianchiC. GentileGiovanni Zappella
- 2013

Computer Science

NIPS

A global recommendation strategy which allocates a bandit algorithm to each network node (user) and allows it to "share" signals (contexts and payoffs) with the neghboring nodes, and derives two more scalable variants of this strategy based on different ways of clustering the graph nodes.

- 151 [PDF]

- Swapna BuccapatnamA. EryilmazN. Shroff
- 2013

Computer Science

52nd IEEE Conference on Decision and Control

The investigations in this work reveal the significant gains that can be obtained even through static network-aware policies, and proposes a randomized policy that explores actions for each user at a rate that is a function of her network position.

- 38
- PDF

- Zhi WangChicheng ZhangManish SinghL. RiekKamalika Chaudhuri
- 2021

Computer Science

AISTATS

An upper confidence bound-based algorithm is developed, RobustAgg ($epsilon), that adaptively aggregates rewards collected by different players and achieves instance-dependent regret guarantees that depend on the amenability of information sharing across players.

- 16 [PDF]

- Meng FangD. Tao
- 2014

Computer Science, Mathematics

KDD

This paper formalizes the networked bandit problem and proposes an algorithm that considers not only the selected arm, but also the relationships between arms, in that it decides an arm depending on integrated confidence sets constructed from historical data.

- 27

- X. XuFang DongYanghua LiShaojian HeX. Li
- 2020

Computer Science, Mathematics

AAAI

A contextual bandit problem is studied in a highly non-stationary environment and an efficient learning algorithm that is adaptive to abrupt reward changes is proposed and theoretical regret analysis is provided to show that a sublinear scaling of regret in the time length T is achieved.

- 25 [PDF]

- Aleksandrs Slivkins
- 2011

Mathematics, Computer Science

COLT

This work considers similarity information in the setting of contextual bandits, a natural extension of the basic MAB problem, and presents algorithms that are based on adaptive partitions, and take advantage of "benign" payoffs and context arrivals without sacrificing the worst-case performance.

- 438 [PDF]

- Zhi WangManish SinghChicheng ZhangL. RiekKamalika Chaudhuri
- 2020

Computer Science

This paper forms the -multi-player multi-armed bandit problem, and develops an upper confidence bound-based algorithm that adaptively aggregates rewards collected by different players, to be the first to develop such a scheme in a multi-player bandit learning setting.

- 6
- PDF

- Qingyun WuHuazheng WangQuanquan GuHongning Wang
- 2016

Computer Science

SIGIR

This paper develops a collaborative contextual bandit algorithm, in which the adjacency graph among users is leveraged to share context and payoffs among neighboring users while online updating, and rigorously proves an improved upper regret bound.

- 105
- PDF

- Abishek SankararamanA. GaneshS. Shakkottai
- 2019

Computer Science

Proc. ACM Meas. Anal. Comput. Syst.

A novel algorithm in which agents, whenever they choose, communicate only arm-ids and not samples, with another agent chosen uniformly and independently at random is developed, demonstrating that even a minimal level of collaboration among the different agents enables a significant reduction in per-agent regret.

- 71 [PDF]

...

...

## Related Papers

Showing 1 through 3 of 0 Related Papers