batched bandits with 1-bit feedback

source: arxiv statistics ml: batched stochastic linear bandits with 1-bit communication constraints

level: research

this work looks at stochastic linear bandits with two practical constraints: actions are grouped into batches of equal size, and after each batch the learner receives just a single bit of feedback. the learner sends a batch of arm pulls to an agent, who observes the rewards and sends back one bit based on a rule the learner sets. the rule can use all past bits but not the raw rewards. this setup fills a gap between models that quantize every round and those with a total bit budget.

the main finding is a minimax lower bound on regret. even without noise, the 1-bit bottleneck forces at least ω(b min{d, log|a|}) regret, where b is the batch size, d is the dimension, and |a| is the number of arms. combined with standard statistical limits, this gives a general lower bound. the paper also provides an algorithm that nearly matches this bound, showing it is tight up to logarithmic factors.

the algorithm works by carefully designing the quantization rule for each batch to extract enough information from the single bit. it balances exploration and exploitation under severe communication limits. the results show that batching and 1-bit feedback together create a harder problem than either constraint alone, but efficient learning is still possible with the right strategy.

why it matters: this research helps design learning systems for settings with severe communication limits, such as iot devices or remote sensors, where sending full reward data is impossible.

source: arxiv statistics ml: batched stochastic linear bandits with 1-bit communication constraints