This documents details the current and planned API for slots. Non-implemented features are noted as such.
- Set up N bandits with probabilities, p_i, and payouts, pay_i.
- Implement several MAB strategies, with kwargs as parameters, and consistent API.
- Allow for T trials.
- Continue with more trials (i.e. save state after trials).
- Values to save:
- Current choice
- number of trials completed for each arm
- scores for each arm
- average payout per arm (wins/trials?)
- Current regret. Regret = Trials*mean_max - sum^T_t=1(reward_t)
- See ref
- Use sane defaults.
- Be obvious and clean.
- For the time being handle only binary payouts.
# Using slots to determine the best of 3 variations on a live website. 3 is the default number of bandits and epsilon greedy is the default strategy.
mab = slots.MAB(3, live=True)
# Make the first choice randomly, record responses, and input reward
# 2 was chosen.
# Update online trial (input most recent result) until test criteria is met.
mab.online_trial(bandit=2,payout=1)
# Repsonse of mab.online_trial() is a dict of the form:
{'new_trial': boolean, 'choice': int, 'best': int}
# Where:
# If the criterion is met, new_trial = False.
# choice is the current choice of arm to try next.
# best is the current best estimate of the highest payout arm.
# Default: 3 bandits with random probabilities, p_i.
mab = slots.MAB()
# Set up 4 bandits with random p_i.
mab = slots.MAB(4)
# 4 bandits with specified p_i
mab = slots.MAB(probs = [0.2,0.1,0.4,0.1])
# Creating 3 bandits with histoprical payout data
mab = slots.MAB(3, hist_payouts = np.array([[0,0,1,...],
[1,0,0,...],
[0,0,0,...]]))
# Default: Epsilon-greedy, epsilon = 0.1, num_trials = 100
mab.run()
# Run chosen strategy with specified parameters and number of trials
mab.run(strategy = 'eps_greedy',params = {'eps':0.2}, trials = 10000)
# Run strategy, updating old trial data
# (NOT YET IMPLEMENTED)
mab.run(continue = True)
# Default: display number of bandits, probabilities and payouts
# (NOT YET IMPLEMENTED)
mab.bandits.info()
# Display info for bandit i
# (NOT YET IMPLEMENTED)
mab.bandits[i]
# Retrieve bandits' payouts, probabilities, etc
mab.bandits.payouts
mab.bandits.probs
# Retrieve count of bandits
# (NOT YET IMPLEMENTED)
mab.bandits.count
# Reset bandits to defaults
# (NOT YET IMPLEMENTED)
mab.bandits.reset()
# Set probabilities or payouts
# (NOT YET IMPLEMENTED)
mab.bandits.set_probs([0.1,0.05,0.2,0.15])
mab.bandits.set_hist_payouts([[1,1,0,0],[0,1,0,0]])
# Retrieve current "best" bandit
mab.best()
# Retrieve bandit probability estimates
# (NOT YET IMPLEMENTED)
mab.prob_est()
# Retrieve bandit probability estimate of bandit i
# (NOT YET IMPLEMENTED)
mab.est_prob(i)
# Retrieve bandit probability estimates
mab.est_probs()
# Retrieve current bandit choice
# (NOT YET IMPLEMENTED, use mab.choices[-1])
mab.current()
# Retrieve sequence of choices
mab.choices
# Retrieve probability estimate history
# (NOT YET IMPLEMENTED)
mab.prob_est_sequence
# Retrieve test strategy info (current strategy) -- a dict
# (NOT YET IMPLEMENTED)
mab.strategy_info()
- Epsilon-greedy
- Epsilon decreasing
- Softmax
- Softmax decreasing
- Upper credible bound
- Bayesian bandits