[PDF] A Gang of Adversarial Bandits | Semantic Scholar (2024)

Table of Contents

Figures from this paper Topics 9 Citations 91 References Related Papers

Figures from this paper

figure 1
figure 2
figure 3
figure 4

Topics

Regret (opens in a new tab)Social Networks (opens in a new tab)Learning Algorithm (opens in a new tab)Adversarial Bandits (opens in a new tab)Network Structure (opens in a new tab)Multi-armed Bandit (opens in a new tab)

9 Citations

Bandits with Abstention under Expert Advice

Stephen PasterisAlberto Rumi M. Herbster

Computer Science, Mathematics

2024

The CBA algorithm is proposed, which exploits the assumption that one action corresponding to the learner's abstention from play, has no reward or loss on every trial, and is the first to achieve bounds on the expected cumulative reward for general confidence-rated predictors.

Communication-Efficient Collaborative Heterogeneous Bandits in Networks

Junghyun LeeLaura SchmidSeYoung Yun

Computer Science

ArXiv

2023

This work provides a rigorous regret analysis for the standard flooding protocol combined with the UCB policy, and proposes a new protocol called Flooding with Absorption (FWA), which is verified empirically that using FWA leads to significantly lower communication costs despite minimal regret performance loss compared to flooding.

PDF

Nearest Neighbour with Bandit Feedback

Stephen PasterisChris HicksV. Mavroudis

Computer Science, Mathematics

NeurIPS

2023

The nearest neighbour rule is adapted to the contextual bandit problem and the algorithm is extremely efficient - having a per trial running time polylogarithmic in both the number of trials and actions, and taking only quasi-linear space.

1

Multitask Online Learning: Listen to the Neighborhood Buzz

Juliette AchddouNicolò Cesa-BianchiPierre Laforgue

Computer Science

AISTATS

2024

The analysis shows that the regret of $\texttt{MT-CO}_2\texttt{OL}$ is never worse than the bound obtained when agents do not share information, and it is proved that the algorithm can be made differentially private with a negligible impact on the regret.

See Also

How a Haitian gang is trying to turn itself into a militia 3 police officers killed on Sunday, bringing PNH’s death toll to 21 this year Insécurité au Nigeria : Mieux vaut marier ma fille plutôt que de voir des bandits l'emmener gratuitement - BBC News Afrique [PDF] A Gang of Bandits | Semantic Scholar

A Hierarchical Nearest Neighbour Approach to Contextual Bandits

Stephen PasterisChris HicksV. Mavroudis

Computer Science, Mathematics

ArXiv

2023

The adversarial contextual bandit problem in metric spaces is considered, designing an algorithm in which it can hold out any set of contexts when computing the authors' regret term and hence inherits its extreme computational efficiency.

Highly Influenced

Cooperative Online Learning with Feedback Graphs

Nicolò Cesa-BianchiT. CesariR. D. Vecchia

Computer Science, Mathematics

ArXiv

2021

This work characterize regret in terms of the independence number of the strong product between the feedback graph and the communication network, which recovers as special cases many previously known bounds for distributed online learning with either expert or bandit feedback.

2

AdaTask: Adaptive Multitask Online Learning

Pierre LaforgueA. VecchiaNicolò Cesa-BianchiL. Rosasco

Computer Science

ArXiv

2022

AdaTask can be seen as a comparator-adaptive version of Follow-the-Regularized-Leader with a Mahalanobis norm potential, and a variational formulation of this potential reveals how AdaTask jointly learns the tasks and their structure.

1
PDF

Fast Online Node Labeling for Very Large Graphs

Baojian ZhouYifan SunReza Babanezhad

Computer Science, Mathematics

ICML

2023

This work proves an effective regret of $\mathcal{O}(\sqrt{n^{1+\gamma}})$ when suitable parameterized graph kernels are chosen, and proposes an approximate algorithm FastONL enjoying regret based on this relaxation.

1

A PDE approach for regret bounds under partial monitoring

Erhan BayraktarIbrahim EkrenXin Zhang

Mathematics, Computer Science

ArXiv

2022

This paper heuristically derive a limiting PDE on Wasserstein space which characterizes the asymptotic behavior of the regret of the forecaster and shows that the problem of obtaining regret bounds and efficient algorithms can be tackled by finding appropriate smooth sub/supersolutions of this parabolic PDE.

2

91 References

See Also

Bandits kill two military personnel in Niger

L. E. CelisFarnood Salehi

Computer Science, Economics

ArXiv

2017

This paper provides algorithms for this setting, both for stochastic and adversarial bandits, and shows that their regret smoothly interpolates between the regret in the classical bandit setting and that of the full-information setting as a function of the neighbors' exploration.

2

A Gang of Bandits

N. Cesa-BianchiC. GentileGiovanni Zappella

Computer Science

NIPS

2013

A global recommendation strategy which allocates a bandit algorithm to each network node (user) and allows it to "share" signals (contexts and payoffs) with the neghboring nodes, and derives two more scalable variants of this strategy based on different ways of clustering the graph nodes.

151

Multi-armed bandits in the presence of side observations in social networks

Swapna BuccapatnamA. EryilmazN. Shroff

Computer Science

52nd IEEE Conference on Decision and Control

2013

The investigations in this work reveal the significant gains that can be obtained even through static network-aware policies, and proposes a randomized policy that explores actions for each user at a rate that is a function of her network position.

38
PDF

Multitask Bandit Learning through Heterogeneous Feedback Aggregation

Zhi WangChicheng ZhangManish SinghL. RiekKamalika Chaudhuri

Computer Science

AISTATS

2021

An upper confidence bound-based algorithm is developed, RobustAgg ($epsilon), that adaptively aggregates rewards collected by different players and achieves instance-dependent regret guarantees that depend on the amenability of information sharing across players.

16

Networked bandits with disjoint linear payoffs

Meng FangD. Tao

Computer Science, Mathematics

KDD

2014

This paper formalizes the networked bandit problem and proposes an algorithm that considers not only the selected arm, but also the relationships between arms, in that it decides an arm depending on integrated confidence sets constructed from historical data.

27

Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests

X. XuFang DongYanghua LiShaojian HeX. Li

Computer Science, Mathematics

AAAI

2020

A contextual bandit problem is studied in a highly non-stationary environment and an efficient learning algorithm that is adaptive to abrupt reward changes is proposed and theoretical regret analysis is provided to show that a sublinear scaling of regret in the time length T is achieved.

25

Contextual Bandits with Similarity Information

Aleksandrs Slivkins

Mathematics, Computer Science

COLT

2011

This work considers similarity information in the setting of contextual bandits, a natural extension of the basic MAB problem, and presents algorithms that are based on adaptive partitions, and take advantage of "benign" payoffs and context arrivals without sacrificing the worst-case performance.

438

Stochastic Multi-Player Bandit Learning from Player-Dependent Feedback

Zhi WangManish SinghChicheng ZhangL. RiekKamalika Chaudhuri

Computer Science

2020

This paper forms the -multi-player multi-armed bandit problem, and develops an upper confidence bound-based algorithm that adaptively aggregates rewards collected by different players, to be the first to develop such a scheme in a multi-player bandit learning setting.

6
PDF

Contextual Bandits in a Collaborative Environment

Qingyun WuHuazheng WangQuanquan GuHongning Wang

Computer Science

SIGIR

2016

This paper develops a collaborative contextual bandit algorithm, in which the adjacency graph among users is leveraged to share context and payoffs among neighboring users while online updating, and rigorously proves an improved upper regret bound.

105
PDF

Social Learning in Multi Agent Multi Armed Bandits

Abishek SankararamanA. GaneshS. Shakkottai

Computer Science

Proc. ACM Meas. Anal. Comput. Syst.

2019

A novel algorithm in which agents, whenever they choose, communicate only arm-ids and not samples, with another agent chosen uniformly and independently at random is developed, demonstrating that even a minimal level of collaboration among the different agents enables a significant reduction in per-agent regret.

71

...

...

Related Papers

Showing 1 through 3 of 0 Related Papers

[PDF] A Gang of Adversarial Bandits | Semantic Scholar (2024)

Top Articles

14 Secret Code Words You Might Want to Know

Albertsons Companies Inc Company Profile - Albertsons Companies Inc Overview

About us - Albertsons

The Holdovers Showtimes Near Regal Huebner Oaks

Poe Whispering Ice Trickster

Harrison County Wellness Alert: The Surprising Truth About Your Daily Vitamin Use. Doctors Explain

accident Archives - WXXV News 25

Used Lincoln Cars for Sale - Autotrader

Parc Soleil Drowning

What Is Pink Xanax? Uses And Effects - Bedrock Recovery Center Recovery Center

Six political cartoons that sum up the presidential debate

Latest Posts

EMDR therapie Amsterdam-noord - Bij trauma, ptss, angsten. Snel een afspraak.

Need for Speed™ Unbound Vol. 7.0.1 Patch Notes

Article information

Author: Nathanial Hackett

Last Updated: 2024-07-05T07:19:59+07:00

Views: 5780

Rating: 4.1 / 5 (52 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Nathanial Hackett

Birthday: 1997-10-09

Address: Apt. 935 264 Abshire Canyon, South Nerissachester, NM 01800

Phone: +9752624861224

Job: Forward Technology Assistant

Hobby: Listening to music, Shopping, Vacation, Baton twirling, Flower arranging, Blacksmithing, Do it yourself

Introduction: My name is Nathanial Hackett, I am a lovely, curious, smiling, lively, thoughtful, courageous, lively person who loves writing and wants to share my knowledge and understanding with you.