Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add wrappers for Nevergrad #560

Open
janosg opened this issue Mar 9, 2025 · 12 comments · May be fixed by #576 or #579
Open

Add wrappers for Nevergrad #560

janosg opened this issue Mar 9, 2025 · 12 comments · May be fixed by #576 or #579
Labels

Comments

@janosg
Copy link
Member

janosg commented Mar 9, 2025

Wrap the gradient free optimizers from nevergrad in optimagic.

Nevergrad implements the following algorithms:

  • NgIohTuned is "meta"-optimizer which adapts to the provided settings (budget, number of workers, parametrization) and should therefore be a good default.
  • TwoPointsDE is excellent in many cases, including very high num_workers.
  • PortfolioDiscreteOnePlusOne is excellent in discrete settings of mixed settings when high precision on parameters is not relevant; it's possibly a good choice for hyperparameter choice.
  • OnePlusOne is a simple robust method for continuous parameters with num_workers < 8.
  • CMA is excellent for control (e.g. neurocontrol) when the environment is not very noisy (num_workers ~50 ok) and when the budget is large (e.g. 1000 x the dimension).
  • TBPSA is excellent for problems corrupted by noise, in particular overparameterized (neural) ones; very high num_workers ok).
  • PSO is excellent in terms of robustness, high num_workers ok.
  • ScrHammersleySearchPlusMiddlePoint is excellent for super parallel cases (fully one-shot, i.e. num_workers = budget included) or for very multimodal cases (such as some of our MLDA problems); don't use softmax with this optimizer.
  • RandomSearch is the classical random search baseline; don't use softmax with this optimizer.

In the long run we want to wrap all of them, but if you are tackling this as your first issue you should focus on one. In that case please coment below which optimizer you are going to work on so we don't duplicate efforts.

@AashifAmeer
Copy link

AashifAmeer commented Mar 10, 2025

Hi @janosg , Under the Nevergrad library which optimization algorithm you would suggest me to include here, One plus one? OR CMA-ES , Random search , PSO?

@janosg
Copy link
Member Author

janosg commented Mar 11, 2025

Hi @AashifAmeer, I you can use any optimizer. Start with the one that seems easiest to do. Long run we want all of them!

@janosg janosg added the good first issue Good for newcomers label Mar 11, 2025
@janosg
Copy link
Member Author

janosg commented Mar 12, 2025

@AashifAmeer I extended the issue description a bit so more people can work simultaneously on the issue. Please comment below which optimizer you picked.

@gulshan-123
Copy link

gulshan-123 commented Apr 1, 2025

Hi @janosg , I want to work on this issue. I will work on OnePlusOne algorithm. I have read the documentation from nevergrad. So, the nevergrad wrapper is expected to similar that of scipy algorithms, right? From where should I start working?

@janosg
Copy link
Member Author

janosg commented Apr 1, 2025

This should help you to get started: https://optimagic.readthedocs.io/en/latest/how_to/how_to_add_optimizers.html

@gulshan-123 gulshan-123 linked a pull request Apr 2, 2025 that will close this issue
@r3kste r3kste linked a pull request Apr 7, 2025 that will close this issue
@r3kste
Copy link

r3kste commented Apr 7, 2025

PSO is excellent in terms of robustness, high num_workers ok.

Hi. I have opened #579 which implements a wrapper for the PSO algorithm.

@janosg
Copy link
Member Author

janosg commented Apr 11, 2025

@gulshan-123 and @r3kste, thanks for your PRs. In both PRs we currently have the problem that they way you implement parallelism is not compatible with our history collection. I first thought that the only way around this is to use the lower level ask-and-tell inferface, but I think there is a better solution you can try.

Background

Optimagic's InternalOptimizationProblem automatically adds history collection to the objective function. The collected history forms the basis of om.criterion_plot and other important visualizations. For history collection to work in a parallel case, it is important to use problem.batch_fun (which automatically parallelizes) instead of just using multiprocessing on problem.fun.

Implementation Idea

Nevergrad has the batch_mode and executor arguments for parallelization. An example is here. The executor needs to be compatible with the Executor interface from concurrent.futures.

I think the simplest way to make history collection work with nevergrad is to implement a custom Executor that calls problem.batch_fun internally.

@r3kste
Copy link

r3kste commented Apr 11, 2025

Hello @janosg. Thanks for the feedback.

we currently have the problem that they way you implement parallelism is not compatible with our history collection.

I want to point out that in #579, I actually set disable_history=True for PSO. This is because PSO is a swarm algorithm, and it seems like internally nevergrad calls the objective for each particle in the population. Due to this, I believe that maintaining history for PSO doesn't make much sense, similar to the case of other optimizers like scipy_brute and scipy_differential_evolution.

As disable_history=True, I believe there is no need of the custom Executor implementation. I would love to hear your thoughts on this.

@janosg
Copy link
Member Author

janosg commented Apr 11, 2025

We actually want to collect histories for all optimizers. The main benefit of history collection is to compare how efficient different optimizers are, i.e. how fast (in terms of function evaluations) they make progress.

There are a few optimizers where we have not found ways to do it yet, but in nevergrad it seems to be quite easy, so we definitely want to do it.

This would for example allow users to compare the efficiency of PSO with a random search algorithm.

@gulshan-123
Copy link

gulshan-123 commented Apr 12, 2025

Image

Hi @janosg,

I have gone through the minimize function in the Optimizer class of Nevergrad (link). From my understanding, to enable parallelism, we need to implement a custom executor, as you had suggested.

However, I noticed that since minimize performs optimization by "asking" for one point at a time (line 723, 728), it might be more appropriate to internally call fun instead of batch_fun.

If we wish to utilize batch_fun, we would likely need to rely on the ask and tell interface to collect multiple recommendations and then evaluate them in batch (previously suggested approach).

Also, I am thinking whether we can directly use ThreadPoolExecutor instead of custom executor, as fun will save the history when passed in executor.submit().

@janosg
Copy link
Member Author

janosg commented Apr 12, 2025

@gulshan-123 Isn't the batch_mode argument there to solve exactly this problem?

With batch_mode=True it will ask the optimizer for num_workers points to evaluate, run the evaluations, then update the optimizer with the num_workers function outputs, and repeat until the budget is all spent.

And no, you cannot just put fun into a ThreadPoolExecutor as history collection would not work if fun is called in multiple processes. That's the reason why batch_fun exists.

@r3kste
Copy link

r3kste commented Apr 12, 2025

Hello. Chiming in to hopefully clear up any confusion.

Isn't the batch_mode argument there to solve exactly this problem?

With the way that the minimize() function is implemented in nevergrad, regardless of batch_mode it always 'submits' exactly one point to the executor to run the objective_function() on.

With batch_mode=True it will ask the optimizer for num_workers points to evaluate, run the evaluations, then update the optimizer with the num_workers function outputs, and repeat until the budget is all spent.

This statement is correct, but the issue here is that it doesn't actually 'collect' the points to send it as a batch, rather it submits each point (from the batch of num_workers points) one-by-one.

Solution

Due to the above mentioned reasons, I believe we cannot use the minimize() interface of the optimizer. Instead, we could use the ask() and tell() interface of the optimizer:

while optimizer.num_ask < optimizer.budget:
    x_list = [optimizer.ask() for _ in range(optimizer.num_workers)]
    losses = problem.batch_fun(
        [x.value[0][0] for x in x_list], n_cores=self.n_cores
    )
    for x, loss in zip(x_list, losses, strict=True):
        optimizer.tell(x, loss)

recommendation = optimizer.provide_recommendation()

I would like to know your thoughts on this, and whether I missed out on anything.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
4 participants