Ray[Tune] BayesOpt fails to evaluate more than 11 trials #28063

ladyluk · 2022-08-23T06:24:49Z

What happened + What you expected to happen

I was running BayesOpt with ASHA, MSR, FIFO and Hyperband for a project and the tuning jobs would all stop around the 10th-11th training jobs. I tried to verify my code by running the BayesOpt example from the RayTune site. The example tuning job from the RayTune site also terminates at trial 11 even though we expect it to run for 1000 trials. I was able to reproduce this with Ray 1.13.0 and Ray 2.0.0 I also tried the latest bayesian optimization library from #336 and v1.2.0.

== Status ==
Current time: 2022-08-23 05:43:24 (running for 00:00:50.57)
Memory usage on this node: 7.3/239.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/32 CPUs, 0/4 GPUs, 0.0/156.12 GiB heap, 0.0/70.9 GiB objects (0.0/1.0 accelerator_type:V100)
Current best trial: 79a0bf82 with mean_loss=-9.949748743718594 and parameters={'steps': 100, 'width': 20.0, 'height': -100.0}
Result logdir: /home/xxxx/ray_results/bayesopt_exp
Number of trials: 11/1000 (11 TERMINATED)
+--------------------+------------+--------------------+-----------+----------+----------+--------+------------------+--------------+-----------------+
| Trial name         | status     | loc                |    height |    width |     loss |   iter |   total time (s) |   iterations |   neg_mean_loss |
|--------------------+------------+--------------------+-----------+----------+----------+--------+------------------+--------------+-----------------|
| objective_63086a0e | TERMINATED | 10.2.73.199:99979  |  -25.092  | 19.0143  | -2.45636 |    100 |          10.5551 |           99 |         2.45636 |
| objective_63f34150 | TERMINATED | 10.2.73.199:100076 |   46.3988 | 11.9732  |  4.72354 |    100 |          10.5741 |           99 |        -4.72354 |
| objective_63f6ca28 | TERMINATED | 10.2.73.199:100078 |  -68.7963 |  3.11989 | -6.56602 |    100 |          10.6194 |           99 |         6.56602 |
| objective_63fa095e | TERMINATED | 10.2.73.199:100084 |  -88.3833 | 17.3235  | -8.78036 |    100 |          10.6112 |           99 |         8.78036 |
| objective_6a46020e | TERMINATED | 10.2.73.199:100644 |   20.223  | 14.1615  |  2.09312 |    100 |          10.7032 |           99 |        -2.09312 |
| objective_6b207c36 | TERMINATED | 10.2.73.199:100689 |  -95.8831 | 19.3982  | -9.53651 |    100 |          10.8742 |           99 |         9.53651 |
| objective_6b303d7e | TERMINATED | 10.2.73.199:100700 |   66.4885 |  4.24678 |  6.88118 |    100 |          10.8819 |           99 |        -6.88118 |
| objective_6b382d54 | TERMINATED | 10.2.73.199:100720 |  -63.635  |  3.66809 | -6.09551 |    100 |          11.1148 |           99 |         6.09551 |
| objective_71811efa | TERMINATED | 10.2.73.199:101191 |  -39.1516 | 10.4951  | -3.81983 |    100 |          10.6469 |           99 |         3.81983 |
| objective_727ffbe6 | TERMINATED | 10.2.73.199:101289 |  -13.611  |  5.82458 | -1.19064 |    100 |          10.5061 |           99 |         1.19064 |
| objective_79a0bf82 | TERMINATED | 10.2.73.199:101848 | -100      | 20       | -9.94975 |    100 |          11.0142 |           99 |         9.94975 |
+--------------------+------------+--------------------+-----------+----------+----------+--------+------------------+--------------+-----------------+


2022-08-23 05:43:24,656 INFO tune.py:759 -- Total run time: 51.90 seconds (50.57 seconds for the tuning loop).
Best hyperparameters found were:  {'steps': 100, 'width': 20.0, 'height': -100.0}

Versions / Dependencies

Ray 1.13.0 and 2.0.0
Bayesian-Optimization 1.2.0 and latest install from github #336
Amazon Linux 2
python 3.7.10

Ray 2.0.0
Bayesian-Optimization 1.2.0 and latest install from github #336
macOS Monterey 12.5
python 3.10.4

Ray 1.12.1 and 1.13.0
Bayesian-Optimization 1.2.0 and latest install from github #336
centos stream
python 3.9.13

Reproduction script

from ray import tune
from ray.tune.suggest.bayesopt import BayesOptSearch
from ray.tune.suggest import ConcurrencyLimiter
import time

def evaluate(step, width, height):
    time.sleep(0.1)
    return (0.1 + width * step / 100) ** (-1) + height * 0.1

def objective(config):
    for step in range(config["steps"]):
        score = evaluate(step, config["width"], config["height"])
        tune.report(iterations=step, mean_loss=score)

algo = BayesOptSearch(utility_kwargs={"kind": "ucb", "kappa": 2.5, "xi": 0.0})
algo = ConcurrencyLimiter(algo, max_concurrent=4)

num_samples = 1000

search_space = {
    "steps": 100,
    "width": tune.uniform(0, 20),
    "height": tune.uniform(-100, 100),
}

analysis = tune.run(
    objective,
    search_alg=algo,
    metric="mean_loss",
    mode="min",
    name="bayesopt_exp",
    num_samples=num_samples,
    config=search_space,
)

print("Best hyperparameters found were: ", analysis.best_config)

Issue Severity

High: It blocks me from completing my task.

The text was updated successfully, but these errors were encountered:

krfricke · 2022-08-23T18:52:31Z

Hi @ladyluk,

TLDR: set random_search_steps to something higher, or consider using a harder problem.

BayesOpt internally fits a gaussian process in order to generate new trial configurations. Your search space is very small, so it converges pretty fast to an optimal result (height = -100 and width = 20). After that, it just samples the same (optimal) configuration again and again - and those duplicates are skipped, because it doesn't make sense to train a model with the same parameters twice.

If you increase random_search_steps it will sample more truly random configurations before using the GP to sample the configurations. This will increase the number of trials you see.

Please note that this doesn't really make sense with your example as you find the optimal solution in < 10 samples, so it is actually a very efficient way. So instead you should probably just run this on a more complicated problem where the GP does not converge in 2-3 iterations.

krfricke · 2022-09-06T15:59:25Z

Closing this since it seems to be a configuration issue.

ladyluk added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Aug 23, 2022

krfricke added question Just a question :) tune Tune-related issues and removed bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Aug 23, 2022

krfricke self-assigned this Aug 23, 2022

xwjiang2010 mentioned this issue Aug 24, 2022

[<Ray component: Core|RLlib|etc...>] #28062

Closed

krfricke closed this as completed Sep 6, 2022

fabio-ciani mentioned this issue Oct 22, 2024

[Tune] Slow tasks halt tuning #48121

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ray[Tune] BayesOpt fails to evaluate more than 11 trials #28063

Ray[Tune] BayesOpt fails to evaluate more than 11 trials #28063

ladyluk commented Aug 23, 2022 •

edited

Loading

krfricke commented Aug 23, 2022

krfricke commented Sep 6, 2022

Ray[Tune] BayesOpt fails to evaluate more than 11 trials #28063

Ray[Tune] BayesOpt fails to evaluate more than 11 trials #28063

Comments

ladyluk commented Aug 23, 2022 • edited Loading

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

krfricke commented Aug 23, 2022

krfricke commented Sep 6, 2022

ladyluk commented Aug 23, 2022 •

edited

Loading