level: research
this paper studies dynamic pricing when customer valuations depend on context through an unknown scalar index. the model is semiparametric: a utility map combines context features into a one-dimensional index, and additive noise has an unknown distribution. the key insight is that the optimal price, as a function of this index, inherits smoothness from the noise tail. under mild conditions, the oracle price map is smooth enough to be approximated locally by polynomials.
the proposed algorithm, orbit, works in stages. it starts with a pilot index to group contexts into bins. within each active bin, it sets a benchmark price and then uses bandit convex optimization to learn a local polynomial approximation of the oracle price map inside a trust region. this coarse-to-fine approach balances exploration and exploitation, adapting to the unknown smoothness and noise distribution.
theoretical analysis shows that orbit achieves low regret, meaning it quickly learns to price near optimally. the regret bounds depend on the smoothness of the noise tail and the dimension of the context. the method is modular and can work with any reasonable pilot index, making it flexible for real-world applications where customer behavior is complex but can be summarized by a single score.
why it matters: this approach can help businesses set prices that adapt to customer segments without needing full demand models, using only contextual data and sales feedback.