[PDF] A Gang of Adversarial Bandits | Semantic Scholar (2024)

Figures from this paper

  • figure 1
  • figure 2
  • figure 3
  • figure 4

Topics

Regret (opens in a new tab)Social Networks (opens in a new tab)Learning Algorithm (opens in a new tab)Adversarial Bandits (opens in a new tab)Network Structure (opens in a new tab)Multi-armed Bandit (opens in a new tab)

9 Citations

Bandits with Abstention under Expert Advice
    Stephen PasterisAlberto Rumi M. Herbster

    Computer Science, Mathematics

  • 2024

The CBA algorithm is proposed, which exploits the assumption that one action corresponding to the learner's abstention from play, has no reward or loss on every trial, and is the first to achieve bounds on the expected cumulative reward for general confidence-rated predictors.

Communication-Efficient Collaborative Heterogeneous Bandits in Networks
    Junghyun LeeLaura SchmidSeYoung Yun

    Computer Science

    ArXiv

  • 2023

This work provides a rigorous regret analysis for the standard flooding protocol combined with the UCB policy, and proposes a new protocol called Flooding with Absorption (FWA), which is verified empirically that using FWA leads to significantly lower communication costs despite minimal regret performance loss compared to flooding.

  • PDF
Nearest Neighbour with Bandit Feedback
    Stephen PasterisChris HicksV. Mavroudis

    Computer Science, Mathematics

    NeurIPS

  • 2023

The nearest neighbour rule is adapted to the contextual bandit problem and the algorithm is extremely efficient - having a per trial running time polylogarithmic in both the number of trials and actions, and taking only quasi-linear space.

Multitask Online Learning: Listen to the Neighborhood Buzz
    Juliette AchddouNicolò Cesa-BianchiPierre Laforgue

    Computer Science

    AISTATS

  • 2024

The analysis shows that the regret of $\texttt{MT-CO}_2\texttt{OL}$ is never worse than the bound obtained when agents do not share information, and it is proved that the algorithm can be made differentially private with a negligible impact on the regret.

A Hierarchical Nearest Neighbour Approach to Contextual Bandits
    Stephen PasterisChris HicksV. Mavroudis

    Computer Science, Mathematics

    ArXiv

  • 2023

The adversarial contextual bandit problem in metric spaces is considered, designing an algorithm in which it can hold out any set of contexts when computing the authors' regret term and hence inherits its extreme computational efficiency.

Cooperative Online Learning with Feedback Graphs
    Nicolò Cesa-BianchiT. CesariR. D. Vecchia

    Computer Science, Mathematics

    ArXiv

  • 2021

This work characterize regret in terms of the independence number of the strong product between the feedback graph and the communication network, which recovers as special cases many previously known bounds for distributed online learning with either expert or bandit feedback.

AdaTask: Adaptive Multitask Online Learning
    Pierre LaforgueA. VecchiaNicolò Cesa-BianchiL. Rosasco

    Computer Science

    ArXiv

  • 2022

AdaTask can be seen as a comparator-adaptive version of Follow-the-Regularized-Leader with a Mahalanobis norm potential, and a variational formulation of this potential reveals how AdaTask jointly learns the tasks and their structure.

  • 1
  • PDF
Fast Online Node Labeling for Very Large Graphs
    Baojian ZhouYifan SunReza Babanezhad

    Computer Science, Mathematics

    ICML

  • 2023

This work proves an effective regret of $\mathcal{O}(\sqrt{n^{1+\gamma}})$ when suitable parameterized graph kernels are chosen, and proposes an approximate algorithm FastONL enjoying regret based on this relaxation.

A PDE approach for regret bounds under partial monitoring
    Erhan BayraktarIbrahim EkrenXin Zhang

    Mathematics, Computer Science

    ArXiv

  • 2022

This paper heuristically derive a limiting PDE on Wasserstein space which characterizes the asymptotic behavior of the regret of the forecaster and shows that the problem of obtaining regret bounds and efficient algorithms can be tackled by finding appropriate smooth sub/supersolutions of this parabolic PDE.

91 References

    L. E. CelisFarnood Salehi

    Computer Science, Economics

    ArXiv

  • 2017

This paper provides algorithms for this setting, both for stochastic and adversarial bandits, and shows that their regret smoothly interpolates between the regret in the classical bandit setting and that of the full-information setting as a function of the neighbors' exploration.

A Gang of Bandits
    N. Cesa-BianchiC. GentileGiovanni Zappella

    Computer Science

    NIPS

  • 2013

A global recommendation strategy which allocates a bandit algorithm to each network node (user) and allows it to "share" signals (contexts and payoffs) with the neghboring nodes, and derives two more scalable variants of this strategy based on different ways of clustering the graph nodes.

Multi-armed bandits in the presence of side observations in social networks
    Swapna BuccapatnamA. EryilmazN. Shroff

    Computer Science

    52nd IEEE Conference on Decision and Control

  • 2013

The investigations in this work reveal the significant gains that can be obtained even through static network-aware policies, and proposes a randomized policy that explores actions for each user at a rate that is a function of her network position.

  • 38
  • PDF
Multitask Bandit Learning through Heterogeneous Feedback Aggregation
    Zhi WangChicheng ZhangManish SinghL. RiekKamalika Chaudhuri

    Computer Science

    AISTATS

  • 2021

An upper confidence bound-based algorithm is developed, RobustAgg ($epsilon), that adaptively aggregates rewards collected by different players and achieves instance-dependent regret guarantees that depend on the amenability of information sharing across players.

Networked bandits with disjoint linear payoffs
    Meng FangD. Tao

    Computer Science, Mathematics

    KDD

  • 2014

This paper formalizes the networked bandit problem and proposes an algorithm that considers not only the selected arm, but also the relationships between arms, in that it decides an arm depending on integrated confidence sets constructed from historical data.

  • 27
Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests
    X. XuFang DongYanghua LiShaojian HeX. Li

    Computer Science, Mathematics

    AAAI

  • 2020

A contextual bandit problem is studied in a highly non-stationary environment and an efficient learning algorithm that is adaptive to abrupt reward changes is proposed and theoretical regret analysis is provided to show that a sublinear scaling of regret in the time length T is achieved.

Contextual Bandits with Similarity Information
    Aleksandrs Slivkins

    Mathematics, Computer Science

    COLT

  • 2011

This work considers similarity information in the setting of contextual bandits, a natural extension of the basic MAB problem, and presents algorithms that are based on adaptive partitions, and take advantage of "benign" payoffs and context arrivals without sacrificing the worst-case performance.

Stochastic Multi-Player Bandit Learning from Player-Dependent Feedback
    Zhi WangManish SinghChicheng ZhangL. RiekKamalika Chaudhuri

    Computer Science

  • 2020

This paper forms the -multi-player multi-armed bandit problem, and develops an upper confidence bound-based algorithm that adaptively aggregates rewards collected by different players, to be the first to develop such a scheme in a multi-player bandit learning setting.

  • 6
  • PDF
Contextual Bandits in a Collaborative Environment
    Qingyun WuHuazheng WangQuanquan GuHongning Wang

    Computer Science

    SIGIR

  • 2016

This paper develops a collaborative contextual bandit algorithm, in which the adjacency graph among users is leveraged to share context and payoffs among neighboring users while online updating, and rigorously proves an improved upper regret bound.

  • 105
  • PDF
Social Learning in Multi Agent Multi Armed Bandits
    Abishek SankararamanA. GaneshS. Shakkottai

    Computer Science

    Proc. ACM Meas. Anal. Comput. Syst.

  • 2019

A novel algorithm in which agents, whenever they choose, communicate only arm-ids and not samples, with another agent chosen uniformly and independently at random is developed, demonstrating that even a minimal level of collaboration among the different agents enables a significant reduction in per-agent regret.

...

...

Related Papers

Showing 1 through 3 of 0 Related Papers

    [PDF] A Gang of Adversarial Bandits | Semantic Scholar (2024)
    Top Articles
    Latest Posts
    Article information

    Author: Nathanial Hackett

    Last Updated:

    Views: 5780

    Rating: 4.1 / 5 (52 voted)

    Reviews: 91% of readers found this page helpful

    Author information

    Name: Nathanial Hackett

    Birthday: 1997-10-09

    Address: Apt. 935 264 Abshire Canyon, South Nerissachester, NM 01800

    Phone: +9752624861224

    Job: Forward Technology Assistant

    Hobby: Listening to music, Shopping, Vacation, Baton twirling, Flower arranging, Blacksmithing, Do it yourself

    Introduction: My name is Nathanial Hackett, I am a lovely, curious, smiling, lively, thoughtful, courageous, lively person who loves writing and wants to share my knowledge and understanding with you.